Muscle-Based Imitation Learning¶
Experimental
Support for the FlyMimic musculoskeletal body model is experimental.
The API may change in future releases. Only the left-front leg is
muscle-driven in the current model; other legs are passive or locked.
Not all features available for the default NeuroMechFly model are
currently supported (e.g. per-leg ground-contact sensors are absent).
FlyGym supports muscle-actuated fly simulations, in which the standard joint actuators are replaced by biophysically realistic Hill-type muscles. This opens the door to studying neuromuscular control, movement biomechanics, and motor learning — areas where the mechanical properties of muscles (force–velocity relationships, passive elasticity, activation dynamics) shape how movements are generated and controlled.
Imitation learning is a natural application of this framework: given motion-capture recordings of real fly kinematics, a policy can be trained to activate muscles so that the simulated limb tracks the recorded trajectory. The same approach generalises to any muscle-actuated limb or behavior. What ships today is a working implementation for the Drosophila left-front (LF) leg, reproducing the FlyMimic result inside FlyGym.
The musculoskeletal body model follows the same composition convention as NeuroMechFly and FlyBody — the fly and world classes live in flygym.compose:
| Component | What it provides |
|---|---|
flygym.compose.MusculoskeletalFly / MusculoskeletalWorld |
The musculoskeletal fly + world, used exactly like NeuroMechFly/FlatGroundWorld. |
flygym.compose.build_musculoskeletal_simulation |
One-call factory (plus the guarded GPU/MuJoCo-Warp helpers check_mjwarp_compatibility and build_musculoskeletal_gpu_simulation). |
flygym_demo.muscle_imitation |
The imitation-learning stack: mocap clips (bundled under assets/mocap/), the dataset loader (MoCapDataset), and a Gymnasium environment (ImitationEnv) with the FlyMimic tracking reward. |
FlyGym's core only ships the musculoskeletal body model; the mocap clips and the imitation-learning code live entirely in the flygym_demo.muscle_imitation demo submodule, which is also the runnable example.
Plain flygym (CPU) — not GPU-accelerated
build_musculoskeletal_simulation and make_imitation_env return a plain
flygym.Simulation backed by a single CPU MuJoCo world — not
flygym.warp.GPUSimulation. Everything on this page, including training,
runs on plain flygym.
Dataset¶
The motion-capture clips in flygym_demo/muscle_imitation/assets/mocap/ are recorded Drosophila left-front-leg kinematics, stored as NumPy arrays at a 500 Hz control rate.
Each clip {id} provides four arrays:
| Array | Shape | Meaning | Units |
|---|---|---|---|
qpos/{id}.npy |
(T, J) |
LF-leg joint angles | rad |
qvel/{id}.npy |
(T, J) |
LF-leg joint velocities | rad/s |
xipos/{id}.npy |
(T, 4, 3) |
3D positions of 4 tracked bodies | mm |
xivel/{id}.npy |
(T, 4, 3) |
3D velocities of those bodies | mm/s |
The 4 tracked bodies are LFFemur, LFTibia, LFTarsus1, LFTarsus5 (claw).
One clip ships with FlyGym — 0002 (225 frames, 7 joint DoFs), FlyMimic's own default. Its body trajectories match the bundled model, so the full reward range is available. Its 7 qpos columns map, in order, to:
| col | MJCF joint |
|---|---|
| 0 | joint_LFCoxa_yaw |
| 1 | joint_LFCoxa_pitch |
| 2 | joint_LFCoxa_roll |
| 3 | joint_LFTrochanter_yaw |
| 4 | joint_LFTrochanter_pitch |
| 5 | joint_LFTrochanter_roll |
| 6 | joint_LFTibia_pitch |
The mapping is keyed by qpos width (TRACKED_JOINT_NAMES_BY_NCOLS in flygym_demo.muscle_imitation.data) and ImitationEnv selects it from the clip's width, so observation/action shapes adapt automatically (the shipped clip → 45-dim obs).
The musculoskeletal model¶
The model is assets/model/musculoskeletal/best_combined_arm_damping_stiff_cvt3.xml (+ STL meshes), converted from an OpenSim model with MyoConverter. It has 73 bodies, 15 Hill-type muscle actuators on the LF leg, and 15 spatial tendons.
Each muscle is a MuJoCo general actuator (dyntype/gaintype/biastype = muscle) acting through a spatial tendon routed via attachment sites on the thorax and LF-leg segments.
How it differs from FlyGym's default rigid-body fly:
| Aspect | FlyGym default | FlyMimic muscle model |
|---|---|---|
| LF-leg links | coxa → trochanterfemur (fused) → tibia → tarsus1..5 |
LFCoxa → LFTrochanter → LFFemur → LFTibia → LFTarsus1..5 |
| Actuation | joint position/torque actuators | 15 Hill-type muscles (LF leg) via spatial tendons |
| Passive joints | spring/damper from config | stiffness = 0.4 + per-joint spring reference angles |
| Other legs | all six actuated | LF muscle-driven; RF locked to 0; LM/LH passive |
| Base | thorax free-floating | thorax tethered (anchored to world) |
| Sensors | vision, contact, proprioception | proprioception + body kinematics; vision optional (see Environment, reward, and sensors) |
build_musculoskeletal_simulation() loads this model and returns a standard flygym.Simulation, so the rest of FlyGym works against it unchanged.
Environment, reward, and sensors¶
Environment — flygym_demo.muscle_imitation.ImitationEnv¶
A Gymnasium environment wrapping the muscle simulation:
- Action — 15 muscle activations in
[0, 1]. - Observation — tracked joint qpos + qvel, muscle activations, muscle forces, and a time-left scalar.
- Step — applies the activations, advances the physics by one control step (500 Hz over a 10 kHz physics timestep), and advances the mocap frame by one.
Reward¶
Per step, against the corresponding mocap frame (the FlyMimic motion-imitation
reward, with pose_w = 5, vel_w = 3):
qpos_rew = exp(-pose_w * ‖target_qpos - actual_qpos‖₂)
qvel_rew = exp(-vel_w * ‖target_qvel - actual_qvel‖₂)
xpos_rew = exp(-pose_w * mean_b ‖target_xpos_b - actual_xpos_b‖₂)
reward = clip((qpos_rew + xpos_rew + qvel_rew) / 3, 0, 1)
In training mode an episode ends early if the reward drops below rew_threshold (default 0.01) or the clip ends.
Results & reproducibility¶
We reproduced FlyMimic's imitation-learning result in FlyGym. Training a PPO
policy on clip 0002 with FlyMimic's own hyperparameters (stable-baselines3,
lr = 1e-5, gamma = 0.99, ReLU [512, 512, 256] actor/critic) drives the
mean episode reward steadily upward, and the muscle-actuated LF leg learns to
track the reference kinematics — the reward formula and weights match FlyMimic
exactly (see the Reward section), and the reward ceiling on this clip is ~1.0.
Per-step reward climbs from the random-activation baseline (~0.06) to ~0.21, with episode length growing in step (the policy both tracks better and holds the pose longer) and no collapse. The trained leg motion can be inspected by rendering a rollout from a checkpoint (see the example script).
Reproduce:
# quick check: random-policy rollout (no training dependencies)
uv run python -m flygym_demo.muscle_imitation --no-train --video-path random.mp4
# train a policy with logging + checkpointing, then record a video of it
uv run python -m flygym_demo.muscle_imitation \
--clip 0002 --total-timesteps 15000000 --learning-rate 1e-5 \
--log-dir runs/0002 --video-path runs/0002/rollout.mp4
# render a previously-saved policy without retraining
uv run python -m flygym_demo.muscle_imitation --no-train \
--model-path runs/0002/final_model.zip --video-path rollout.mp4
Training writes all artifacts under --log-dir (default runs/<clip>):
| Artifact | Contents |
|---|---|
monitor.csv |
Per-episode reward + length (stable_baselines3 Monitor; read with pandas). |
tb/ |
TensorBoard event files — view with tensorboard --logdir runs/<clip>/tb (rollout/ep_rew_mean, ep_len_mean, …). TensorBoard is optional; without it the CSV is still written. |
checkpoints/ppo_muscle_*_steps.zip |
Periodic checkpoints (--checkpoint-freq, default every 50k steps; 0 disables). |
final_model.zip |
The policy at the end of training. |
To watch the learned behaviour, --video-path runs a deterministic rollout in
test mode (full clip, no early termination), rendering FlyMimic's world camera
(--camera, default scene; --camera-res H W) to an mp4. The same env can be
driven from Python via flygym_demo.muscle_imitation.record_rollout /
load_policy.
Training is CPU-only on most workstations. At higher
learning rates, set PPO target_kl ≈ 0.05 and keep the best checkpoint by
periodic evaluation to avoid late instability.
API¶
from flygym.compose import build_musculoskeletal_simulation
from flygym_demo.muscle_imitation import ImitationConfig, ImitationEnv, MoCapDataset
sim, fly = build_musculoskeletal_simulation() # Simulation backed by the muscle model
env = ImitationEnv(
sim, fly_name=fly.name,
dataset=MoCapDataset.default(),
config=ImitationConfig(clip="0002"),
)
obs, _ = env.reset()
for _ in range(200):
action = env.action_space.sample() # 15 muscle activations in [0, 1]
obs, reward, terminated, truncated, info = env.step(action)
build_musculoskeletal_simulation() is a convenience wrapper over the standard
FlyGym composition flow, identical to how you'd build a NeuroMechFly scene:
from flygym import Simulation
from flygym.compose import MusculoskeletalFly, MusculoskeletalWorld
fly = MusculoskeletalFly()
world = MusculoskeletalWorld(fly)
sim = Simulation(world)
Build the environment in one call:
from flygym_demo.muscle_imitation import make_imitation_env
env = make_imitation_env(config=ImitationConfig(clip="0002"))
Train (logging + checkpointing) and record a video programmatically:
from flygym_demo.muscle_imitation import (
TrainConfig, train, make_imitation_env, load_policy, record_rollout,
)
# Writes monitor.csv, tb/, checkpoints/, final_model.zip under runs/0002
model, final_path = train("runs/0002", config=TrainConfig(clip="0002",
total_timesteps=200_000))
# Roll the trained policy out and save an mp4 (test mode = full clip)
env = make_imitation_env(config=ImitationConfig(clip="0002", test=True))
record_rollout(env, load_policy(final_path), "rollout.mp4")
Inspect or drive the model directly:
sim, fly = build_musculoskeletal_simulation(add_vision=True)
fly.muscle_names # the 15 muscle actuator names
sim.get_joint_angles(fly.name) # proprioception, body kinematics, ...
Citation¶
If you use the musculoskeletal model in your research, please cite the FlyMimic paper in addition to FlyGym:
Ozdil, P. G., Ning, C., Phelps, J. S., Wang-Chen, S., Elisha, G., Blanke, A., Ijspeert, A., & Ramdya, P. (2026). Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster. ICLR 2026. arXiv:2509.06426
@inproceedings{Ozdil2026,
title={Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster},
author={Ozdil, Pembe Gizem and Ning, Chuanfang and Phelps, Jasper S and Wang-Chen, Sibo and Elisha, Guy and Blanke, Alexander and Ijspeert, Auke and Ramdya, Pavan},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://arxiv.org/abs/2509.06426},
}