Design Notes¶
Reward design¶
The locomotion reward balances forward progress with posture quality and gait regularity.
| Term | Weight | Purpose |
|---|---|---|
track_lin_vel_x_exp |
+4.8 | Track commanded forward velocity |
forward_progress |
+6.0 | Reward actual forward motion |
alive_bonus |
+0.6 | Keep the episode alive |
upright |
+1.6 | Maintain torso stability |
height |
+0.9 | Hold a consistent base height |
hip_alternation |
+2.0 | Encourage left-right alternation |
knee_flexion |
+0.8 | Maintain meaningful knee lift |
feet_air_time |
+1.5 | Reward feet leaving the ground |
yaw_rate |
-0.5 | Penalize spinning |
lateral_velocity |
-0.3 | Penalize sideways drift |
knee_symmetry |
-1.0 | Penalize limping asymmetry |
backward_velocity |
-2.8 | Penalize backward motion |
stall_penalty |
-4.6 | Penalize standing still under a move command |
action_rate |
-0.004 | Smooth actions |
joint_pos_limits |
-0.05 | Stay within joint limits |
undesired_knee_contacts |
-1.0 | Discourage knee-ground collisions |
Why use a reference gait¶
Instead of predicting full joint trajectories from scratch, the policy predicts residual joint targets on top of a sinusoidal reference gait.
This gives three benefits:
- Lower exploration burden in early training.
- More interpretable gait structure.
- Faster convergence toward a usable walking policy.
Reference gait parameters¶
| Parameter | Value | Meaning |
|---|---|---|
gait_period |
0.72 s | Full gait cycle duration |
stance_ratio |
0.55 | Portion of time spent in stance |
hip_pitch_amplitude |
0.45 rad | Hip swing magnitude |
knee_pitch_amplitude |
0.60 rad | Knee flexion magnitude during swing |
swing_knee_scale |
1.35 | Extra swing-phase knee lift |
scale |
0.12 | Residual correction budget |
Recent design iterations¶
Fixing circling behavior¶
A policy can look like it is moving forward while actually rotating if the reward only sees body-frame x velocity. To prevent that, the environment now penalizes yaw rate and strengthens lateral-velocity punishment.
Fixing limping and weak leg lift¶
Two issues showed up during iteration:
- one leg dominated the gait,
- the swing leg did not lift enough.
The response was to:
- increase
hip_alternation, - add explicit knee symmetry punishment,
- increase
feet_air_time, - and raise the reference gait amplitudes.
Improving compatibility with newer rsl-rl¶
The project also updated actor distribution handling and export logic to stay compatible with rsl-rl >= 5.0, including migration away from deprecated stochastic configuration fields.
Design philosophy¶
The repository aims for a practical middle ground:
- simple enough to inspect and tune,
- structured enough to scale beyond a toy example,
- and explicit enough that reward shaping decisions remain understandable.
Note
The current design is intentionally task-focused. It prioritizes stable forward locomotion and clear gait structure over building a large multi-task benchmark surface.