Muscle-Based Imitation Learning¶

Experimental

Support for the FlyMimic musculoskeletal body model is experimental. The API may change in future releases. Only the left-front leg is muscle-driven in the current model; other legs are passive or locked. Not all features available for the default NeuroMechFly model are currently supported (e.g. per-leg ground-contact sensors are absent).

FlyGym supports muscle-actuated fly simulations, in which the standard joint actuators are replaced by biophysically realistic Hill-type muscles. This opens the door to studying neuromuscular control, movement biomechanics, and motor learning — areas where the mechanical properties of muscles (force–velocity relationships, passive elasticity, activation dynamics) shape how movements are generated and controlled.

Imitation learning is a natural application of this framework: given motion-capture recordings of real fly kinematics, a policy can be trained to activate muscles so that the simulated limb tracks the recorded trajectory. The same approach generalises to any muscle-actuated limb or behavior. What ships today is a working implementation for the Drosophila left-front (LF) leg, reproducing the FlyMimic result inside FlyGym.

The musculoskeletal body model follows the same composition convention as NeuroMechFly and FlyBody — the fly and world classes live in flygym.compose:

Component	What it provides
`flygym.compose.MusculoskeletalFly` / `MusculoskeletalWorld`	The musculoskeletal fly + world, used exactly like `NeuroMechFly`/`FlatGroundWorld`.
`flygym.compose.build_musculoskeletal_simulation`	One-call factory (plus the guarded GPU/MuJoCo-Warp helpers `check_mjwarp_compatibility` and `build_musculoskeletal_gpu_simulation`).
`flygym_demo.muscle_imitation`	The imitation-learning stack: mocap clips (bundled under `assets/mocap/`), the dataset loader (`MoCapDataset`), and a Gymnasium environment (`ImitationEnv`) with the FlyMimic tracking reward.

FlyGym's core only ships the musculoskeletal body model; the mocap clips and the imitation-learning code live entirely in the flygym_demo.muscle_imitation demo submodule, which is also the runnable example.

Plain flygym (CPU) — not GPU-accelerated

build_musculoskeletal_simulation and make_imitation_env return a plain flygym.Simulation backed by a single CPU MuJoCo world — not flygym.warp.GPUSimulation. Everything on this page, including training, runs on plain flygym.

Dataset¶

The motion-capture clips in flygym_demo/muscle_imitation/assets/mocap/ are recorded Drosophila left-front-leg kinematics, stored as NumPy arrays at a 500 Hz control rate.

Each clip {id} provides four arrays:

Array	Shape	Meaning	Units
`qpos/{id}.npy`	`(T, J)`	LF-leg joint angles	rad
`qvel/{id}.npy`	`(T, J)`	LF-leg joint velocities	rad/s
`xipos/{id}.npy`	`(T, 4, 3)`	3D positions of 4 tracked bodies	mm
`xivel/{id}.npy`	`(T, 4, 3)`	3D velocities of those bodies	mm/s

The 4 tracked bodies are LFFemur, LFTibia, LFTarsus1, LFTarsus5 (claw).

One clip ships with FlyGym — 0002 (225 frames, 7 joint DoFs), FlyMimic's own default. Its body trajectories match the bundled model, so the full reward range is available. Its 7 qpos columns map, in order, to:

col	MJCF joint
0	`joint_LFCoxa_yaw`
1	`joint_LFCoxa_pitch`
2	`joint_LFCoxa_roll`
3	`joint_LFTrochanter_yaw`
4	`joint_LFTrochanter_pitch`
5	`joint_LFTrochanter_roll`
6	`joint_LFTibia_pitch`

The mapping is keyed by qpos width (TRACKED_JOINT_NAMES_BY_NCOLS in flygym_demo.muscle_imitation.data) and ImitationEnv selects it from the clip's width, so observation/action shapes adapt automatically (the shipped clip → 45-dim obs).

The musculoskeletal model¶

The model is assets/model/musculoskeletal/best_combined_arm_damping_stiff_cvt3.xml (+ STL meshes), converted from an OpenSim model with MyoConverter. It has 73 bodies, 15 Hill-type muscle actuators on the LF leg, and 15 spatial tendons.

Each muscle is a MuJoCo general actuator (dyntype/gaintype/biastype = muscle) acting through a spatial tendon routed via attachment sites on the thorax and LF-leg segments.

How it differs from FlyGym's default rigid-body fly:

Aspect	FlyGym default	FlyMimic muscle model
LF-leg links	`coxa → trochanterfemur (fused) → tibia → tarsus1..5`	`LFCoxa → LFTrochanter → LFFemur → LFTibia → LFTarsus1..5`
Actuation	joint position/torque actuators	15 Hill-type muscles (LF leg) via spatial tendons
Passive joints	spring/damper from config	`stiffness = 0.4` + per-joint spring reference angles
Other legs	all six actuated	LF muscle-driven; RF locked to 0; LM/LH passive
Base	thorax free-floating	thorax tethered (anchored to world)
Sensors	vision, contact, proprioception	proprioception + body kinematics; vision optional (see Environment, reward, and sensors)

build_musculoskeletal_simulation() loads this model and returns a standard flygym.Simulation, so the rest of FlyGym works against it unchanged.

Environment, reward, and sensors¶

Environment — `flygym_demo.muscle_imitation.ImitationEnv`¶

A Gymnasium environment wrapping the muscle simulation:

Action — 15 muscle activations in [0, 1].
Observation — tracked joint qpos + qvel, muscle activations, muscle forces, and a time-left scalar.
Step — applies the activations, advances the physics by one control step (500 Hz over a 10 kHz physics timestep), and advances the mocap frame by one.

Reward¶

Per step, against the corresponding mocap frame (the FlyMimic motion-imitation reward, with pose_w = 5, vel_w = 3):

qpos_rew = exp(-pose_w * ‖target_qpos - actual_qpos‖₂)
qvel_rew = exp(-vel_w  * ‖target_qvel - actual_qvel‖₂)
xpos_rew = exp(-pose_w * mean_b ‖target_xpos_b - actual_xpos_b‖₂)
reward   = clip((qpos_rew + xpos_rew + qvel_rew) / 3, 0, 1)

In training mode an episode ends early if the reward drops below rew_threshold (default 0.01) or the clip ends.

Results & reproducibility¶

We reproduced FlyMimic's imitation-learning result in FlyGym. Training a PPO policy on clip 0002 with FlyMimic's own hyperparameters (stable-baselines3, lr = 1e-5, gamma = 0.99, ReLU [512, 512, 256] actor/critic) drives the mean episode reward steadily upward, and the muscle-actuated LF leg learns to track the reference kinematics — the reward formula and weights match FlyMimic exactly (see the Reward section), and the reward ceiling on this clip is ~1.0.

Per-step reward climbs from the random-activation baseline (~0.06) to ~0.21, with episode length growing in step (the policy both tracks better and holds the pose longer) and no collapse. The trained leg motion can be inspected by rendering a rollout from a checkpoint (see the example script).

Reproduce:

# quick check: random-policy rollout (no training dependencies)
uv run python -m flygym_demo.muscle_imitation --no-train --video-path random.mp4

# train a policy with logging + checkpointing, then record a video of it
uv run python -m flygym_demo.muscle_imitation \
    --clip 0002 --total-timesteps 15000000 --learning-rate 1e-5 \
    --log-dir runs/0002 --video-path runs/0002/rollout.mp4

# render a previously-saved policy without retraining
uv run python -m flygym_demo.muscle_imitation --no-train \
    --model-path runs/0002/final_model.zip --video-path rollout.mp4

Training writes all artifacts under --log-dir (default runs/<clip>):

Artifact	Contents
`monitor.csv`	Per-episode reward + length (`stable_baselines3` `Monitor`; read with `pandas`).
`tb/`	TensorBoard event files — view with `tensorboard --logdir runs/<clip>/tb` (`rollout/ep_rew_mean`, `ep_len_mean`, …). TensorBoard is optional; without it the CSV is still written.
`checkpoints/ppo_muscle_*_steps.zip`	Periodic checkpoints (`--checkpoint-freq`, default every 50k steps; `0` disables).
`final_model.zip`	The policy at the end of training.

To watch the learned behaviour, --video-path runs a deterministic rollout in test mode (full clip, no early termination), rendering FlyMimic's world camera (--camera, default scene; --camera-res H W) to an mp4. The same env can be driven from Python via flygym_demo.muscle_imitation.record_rollout / load_policy.

Training is CPU-only on most workstations. At higher learning rates, set PPO target_kl ≈ 0.05 and keep the best checkpoint by periodic evaluation to avoid late instability.

API¶

from flygym.compose import build_musculoskeletal_simulation
from flygym_demo.muscle_imitation import ImitationConfig, ImitationEnv, MoCapDataset

sim, fly = build_musculoskeletal_simulation()  # Simulation backed by the muscle model
env = ImitationEnv(
    sim, fly_name=fly.name,
    dataset=MoCapDataset.default(),
    config=ImitationConfig(clip="0002"),
)

obs, _ = env.reset()
for _ in range(200):
    action = env.action_space.sample()    # 15 muscle activations in [0, 1]
    obs, reward, terminated, truncated, info = env.step(action)

build_musculoskeletal_simulation() is a convenience wrapper over the standard FlyGym composition flow, identical to how you'd build a NeuroMechFly scene:

from flygym import Simulation
from flygym.compose import MusculoskeletalFly, MusculoskeletalWorld

fly = MusculoskeletalFly()
world = MusculoskeletalWorld(fly)
sim = Simulation(world)

Build the environment in one call:

from flygym_demo.muscle_imitation import make_imitation_env
env = make_imitation_env(config=ImitationConfig(clip="0002"))

Train (logging + checkpointing) and record a video programmatically:

from flygym_demo.muscle_imitation import (
    TrainConfig, train, make_imitation_env, load_policy, record_rollout,
)

# Writes monitor.csv, tb/, checkpoints/, final_model.zip under runs/0002
model, final_path = train("runs/0002", config=TrainConfig(clip="0002",
                                                          total_timesteps=200_000))

# Roll the trained policy out and save an mp4 (test mode = full clip)
env = make_imitation_env(config=ImitationConfig(clip="0002", test=True))
record_rollout(env, load_policy(final_path), "rollout.mp4")

Inspect or drive the model directly:

sim, fly = build_musculoskeletal_simulation(add_vision=True)
fly.muscle_names                 # the 15 muscle actuator names
sim.get_joint_angles(fly.name)   # proprioception, body kinematics, ...

Citation¶

If you use the musculoskeletal model in your research, please cite the FlyMimic paper in addition to FlyGym:

Ozdil, P. G., Ning, C., Phelps, J. S., Wang-Chen, S., Elisha, G., Blanke, A., Ijspeert, A., & Ramdya, P. (2026). Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster. ICLR 2026. arXiv:2509.06426

@inproceedings{Ozdil2026,
  title={Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster},
  author={Ozdil, Pembe Gizem and Ning, Chuanfang and Phelps, Jasper S and Wang-Chen, Sibo and Elisha, Guy and Blanke, Alexander and Ijspeert, Auke and Ramdya, Pavan},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://arxiv.org/abs/2509.06426},
}