This repository implements End-to-End Position-Based Locomotion using NVIDIA Isaac Lab.
It serves as a reproduction and extension of the concepts presented in the paper:
Advanced Skills by Learning Locomotion and Local Navigation End-to-End > Nikita Rudin, David Hoeller, Marko Bjelonic, and Marco Hutter > arXiv:2209.12827
Unlike standard locomotion tasks that track a commanded velocity (
Instead of velocity commands, the robot receives a 3D goal position relative to its current state.
-
Polar Sampling: To ensure uniform coverage around the robot, goals are sampled using polar coordinates (
$r, \theta$ ) and converted to Cartesian.- Radius
$r \in [1.0, 5.0]$ meters. - The goal is always spawned at a fixed height relative to the floor (
$z=0.5$ ).
- Radius
- Time Awareness: The policy is explicitly conditioned on the remaining time in the episode, allowing it to learn "pacing" strategies (e.g., rushing if time is low, moving carefully if time is ample).
The reward system is designed to avoid over-constraining the motion (e.g., no strict velocity tracking penalty).
-
task_reward(Sparse-ish): A dense signal$r = \frac{1}{1 + ||error||^2}$ that is only activated during the final seconds of the episode (e.g., last 1.0s). This forces the robot to be at the goal at the end, but allows exploration during the episode. -
explore(Dense): A cosine-similarity reward ($\frac{\mathbf{v} \cdot \mathbf{d}}{||\mathbf{v}|| ||\mathbf{d}||}$ ) that encourages moving in the general direction of the goal at all times. -
stalling(Gated Penalty): Penalizes the robot for standing still ($|v| < 0.1$ ) while far from the goal.- Auto-Gating: This penalty automatically deactivates once the agent learns the task (average task reward > 0.5), preventing it from interfering with fine-tuned terminal maneuvering.
terrain_levels_pos: Terrain difficulty increases based on success (distance to goal < 0.5m) rather than distance walked. This ensures the robot only faces harder terrain once it can reliably navigate to targets on easier terrain.
The core logic is located in source/IsaacLab_Terrains/IsaacLab_Terrains/tasks/manager_based/locomotion/position:
position/
├── config/
│ ├── anymal_c/ # Robot-specific configurations
│ │ ├── flat_env_cfg.py
│ │ └── rough_env_cfg.py
│ └── ...
├── mdp/ # Markov Decision Process components
│ ├── commands.py # UniformPose3dPolarCommand, TimeRemainingCommand
│ ├── observations.py # Custom observation handlers
│ ├── rewards.py # get_to_pos_in_time, exploration_incentive, stalling_penalty
│ ├── curriculums.py # terrain_levels_pos (Goal-based curriculum)
│ └── ...
└── position_env_cfg.py # Main Environment Configuration



