Make no-overlap a built-in solver behavior [V2 against develop]#589
Make no-overlap a built-in solver behavior [V2 against develop]#589cvolkcvolk merged 27 commits intodevelopfrom
Conversation
## Summary Explain the resume flag
…572) ## Summary This is a doc change requested by QA in https://nvbugspro.nvidia.com/bug/6063011 It clarifies that evaluated newton trained model using physx is expected to completely fail the dexsuite task.
## Summary Subprocess-spawning tests hang indefinitely on CI. ## Causes & Fixes ### Problems From Lab: 1. Lab reports "AppLauncher doesnt quit properly after app.close(), app.quit() doesn't help either." 2. Cold startup times for tests using IS can be upwards of 10 min on Lab CI machines. Above issues apply to us, because tests hang during sub-process tests section, between the end of last test and the beginning of the next test. See detailed logs and analysis from reproducing locally [here](#568) ### Fixes 1. `SimulationApp` Force Exit: Skips `app.close()` (which can hang indefinitely in Kit's shutdown path) when the env var `ISAACLAB_ARENA_FORCE_EXIT_ON_COMPLETE=1` is set. Calls a new `_kill_child_processes()` helper that walks `/proc` to `SIGKILL` all direct children before doing `os._exit(0)`, preventing orphaned Kit processes from holding GPU resources. 2. `run_subprocess` has a configuarable wall-clock timeouts and process isolation, such that when needed, it could trigger the force exit path above. 3. Add wall-clock timing and logging inside the SimulationApp start method. Keep track of how much startup time is taking on CI. ## Minor fixes 1. Add timing stats into pytest cmds such that it reports the slowests test func at the end of each section. 2. Parametrize multi-config tests: Convert nested for-loops in `test_zero_action_policy_kitchen_pick_and_place` (6 configs) and `test_zero_action_policy_gr1_open_microwave` (3 configs) into `@pytest.mark.parametrize.` Each config gets its own timeout, pass/fail, and timing. 3. Reduce num_envs in gr00t eval_runner test to speed up. ### Local validation With the repro script #568, I do not have local stalling. Log for more details. [repro_20260410_041313.log](https://github.com/user-attachments/files/26620524/repro_20260410_041313.log) ### CI Before -- timeout <img width="1219" height="170" alt="image" src="https://github.com/user-attachments/assets/2f9eabb2-403d-4257-bd84-4da508de7d00" /> ### CI After <img width="1219" height="170" alt="image" src="https://github.com/user-attachments/assets/dbaf2a7d-e3a4-4ad2-85a4-389eae962c1d" /> <img width="1198" height="472" alt="image" src="https://github.com/user-attachments/assets/8a24f1aa-4bcb-4030-b075-09f3885673c2" /> ## TODOs - test_camera_observations takes 10mins to start the app due to Kit cold start. Experimenting with a warm start before tests process here #565 - Kit itself intermittently deadlocks during startup — not because of orphans, but because Kit's internal thread synchronization fails on low-CPU runners. Experimenting with retry here #570
## Summary Install missing arena package into NGC docker. ## Detailed description - We forgot to install our new package `isaaclab_arena_examples` into the docker image. - This was masked in CI due to mounting a branch and correctly installing there. Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary - Fix the `eval_config.json` example in the DexSuite Kuka Allegro Lift evaluation docs to match the actual `eval_runner.py` schema (`jobs` array with `name`, `arena_env_args`, `policy_type`, `policy_config_dict`). Signed-off-by: Clemens Volk <cvolk@nvidia.com> Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary Doc fix to https://nvbugspro.nvidia.com/bug/6062848, Readme updates. ## Detailed description - Policy training docs: Added a "Compute Requirements" section (GPU VRAM + system RAM guidance) to all three workflow tutorials (static_manipulation, sequential_static_manipulation, locomanipulation) and fixed the "an an" typo. - Arena-in-your-repo docs: Created an index.rst landing page for the section and updated docs/index.rst to use it instead of listing the three sub-pages individually. - README: Added a link to the "Installing IsaacLab-Arena in Your Repository" guide in the "Publishing Your Own Benchmark" section.
## Summary As CI seems to run smoothly agin, bring back previously disabled tests.
## Problem IsaacLab-Arena needs a tabletop manipulation task where the G1 robot uses the WBC-AGILE locomotion policy to pick up an apple and place it on a plate, while balancing in place. Ref: ISAAC-12630 ## Solution Add a new `G1AgileTabletopAppleToPlateEnvironment` that wires the `G1WBCAgileJointEmbodiment` (from PR #489) with the existing `PickAndPlaceTask`, a Seattle Lab table scene, and appropriate object assets. ## Changes - **`isaaclab_arena_environments/g1_agile_tabletop_apple_to_plate_environment.py`** — New environment class: G1 robot at (-0.4, 0, 0) facing a table with an apple (pick object) and a clay plate (target). Uses `G1WBCAgileJointEmbodiment` for balance + upper body control. 30-second episodes. Supports `--object`, `--embodiment`, `--teleop_device` CLI args. - **`isaaclab_arena_environments/cli.py`** — Register the new environment in the `ExampleEnvironments` dict. - **`isaaclab_arena/tests/test_g1_agile_tabletop_apple_to_plate.py`** — Two tests: (1) initial state is not terminated (apple starts away from plate), (2) teleporting apple onto plate triggers success termination. Uses correct base-height command (0.75) to keep the robot stable. ## Testing - [x] New unit tests added (2 tests) - [x] Linters pass locally (black, flake8, isort, pyupgrade, codespell, license headers) - [ ] CI pipeline (tests require Isaac Sim Docker with GPU) ## Notes - Object positions (apple, plate, robot) are based on Seattle Lab table geometry and G1 arm reach. May need visual tuning in simulator. - No new task class needed — the existing `PickAndPlaceTask` handles contact-sensor success detection, object-dropped termination, and metrics. - Self-review caught and fixed a test issue: the initial-state test was sending zero base-height commands, which would cause the robot to squat. Fixed to use 0.75 (matching established pattern from `test_g1_wbc_embodiment.py`). --- *Generated by [autodev](https://github.com/anthropics/claude-code) — Claude Code* --------- Signed-off-by: Lionel Gulich <lgulich@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary Rework the concepts documentation to eliminate AI slop. --------- Signed-off-by: Clemens Volk <cvolk@nvidia.com> Co-authored-by: Clemens Volk <cvolk@nvidia.com> Co-authored-by: isaaclab-review-bot[bot] <270793704+isaaclab-review-bot[bot]@users.noreply.github.com> Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Motivation When building tasks, users often need to restrict object placement to a sub-region of a surface -- for example, only within the robot's reachable workspace. `On(table)` allows placement anywhere on the table, and `AtPosition` pins to a single point. There was no way to constrain to a region or set bounds on individual axes. ## Summary - New `PositionLimits` unary relation that constrains object position in world coordinates. Supports full ranges (box), single bounds (half-plane), or a mix per axis. - New `UnaryRelation` base class so `get_spatial_relations()` automatically includes any new unary relation without updating isinstance checks. - `PositionLimitsLossStrategy` using `linear_band_loss` (both bounds) and `single_boundary_linear_loss` (single bound). - Registered in solver strategies with slope=100.0 (matching `AtPosition`). - Fixed `_print_unary_relation_debug` to work with any unary relation type. ## Usage ```python # Full box constraint (reachable region) apple.add_relation(On(table)) apple.add_relation(PositionLimits(x_min=-0.3, x_max=0.3, y_min=-0.2, y_max=0.2)) # Single bound (half-plane) apple.add_relation(PositionLimits(x_min=0.5)) # Mix apple.add_relation(PositionLimits(x_min=-0.3, x_max=0.3, y_min=-0.2)) ``` ## Test plan - [x] 12 PositionLimits tests pass (11 strategy-level, 1 solver integration) - [x] All relation/placer tests pass - [x] Pre-commit checks pass --------- Signed-off-by: Clemens Volk <cvolk@nvidia.com> Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary Test code owners by only adding myself. ## Detailed description - We had an incident where a couple of bots with organization access collaborated to push an unreviewed change to main. - This is an attempt to prevent this in the future.
## Summary Complete the list code owners. ## Detailed description - Follows successful test #584
## Summary Fix CODEOWNERS specification. ## Detailed description - Mutiple lines indicate tha the last line overrides the previous. - This is not what was intended. - Fix.
Replace the explicit NoCollision relation with a built-in no-overlap mechanism in RelationSolver, controlled by a single clearance_m parameter on RelationSolverParams. - Refactor NoCollisionLossStrategy to a standalone class (not a RelationLossStrategy) that accepts clearance_m directly - Add clearance_m field (default 0.01) to RelationSolverParams with __post_init__ validation - Add _compute_no_overlap_loss to RelationSolver that computes pairwise overlap loss for all non-anchor objects automatically - Delete NoCollision class from relations.py - Remove _add_pairwise_no_collision from ArenaEnvBuilder - Remove min_separation_m from ObjectPlacerParams; validation now uses solver_params.clearance_m - Update _validate_no_overlap to skip anchor-anchor pairs - Update tests, notebooks, and docs Signed-off-by: Clemens Volk <cvolk@nvidia.com>
## Summary Bring IsaacLab issue templates into Isaac Lab - Arena ## Detailed description - Gives users a structure for bug reports and feature requests.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR refactors no-overlap collision avoidance from an explicit user-facing NoCollision relation into a built-in solver behavior controlled by a single clearance_m parameter on RelationSolverParams. The design is sound — it removes a leaky abstraction and simplifies the user API. The implementation is largely correct, with a few items worth addressing.
Design Assessment
Design is sound. Moving no-overlap from a user-injected relation to built-in solver behavior is the right call. The old approach leaked an implementation detail (NoCollision + ArenaEnvBuilder._add_pairwise_no_collision) into the public API. The new approach:
- Eliminates the old deduplication bug (the removed TODO about computing loss twice for bidirectional
NoCollisionrelations) - Centralizes no-overlap control in one place (
clearance_m) - Correctly uses
.detach()on the other object's position to prevent simultaneous gradient push on both objects in a pair, improving optimization stability
The NoCollisionLossStrategy being decoupled from RelationLossStrategy (no longer inheriting from it) is appropriate since it's now an internal solver concern, not a user-configurable strategy.
Findings
Details in inline comments below.
Test Coverage
- Coverage: Adequate
- Test quality: Good
Well-tested: NoCollisionLossStrategy unit tests cover separation, touching, overlap, slope scaling, volume formula, and multi-env batched input. Solver integration tests verify objects end up separated and results are reproducible. The removed test_no_collision_loss_scales_with_relation_weight is correctly dropped.
Gaps worth noting (non-blocking):
- No test for
clearance_mvalidation (__post_init__raisingValueErrorfor negative values) - No test for anchor-anchor skip in
_validate_no_overlap - No explicit test verifying solver computes no-overlap loss between non-anchor and anchor objects
CI Status
Pre-commit check is queued.
Verdict
COMMENT
Well-motivated refactoring that simplifies the public API. Core logic is correct. See inline comments for specific items to address.
Greptile SummaryThis PR moves no-overlap enforcement from an explicit
Confidence Score: 4/5Safe to merge for default configurations; P1 docstring/behavioral mismatch should be resolved before recommending users raise solver.clearance_m above On.clearance_m. One P1 finding: clearance_m docstring says 'non-anchor objects' but applies to all pairs including anchor-non-anchor, silently overriding On.clearance_m when the solver clearance is larger. The default config (both = 0.01 m) is unaffected, but any user following the docs to set a larger solver clearance will get wrong Z heights for On-related objects. isaaclab_arena/relations/relation_solver_params.py (clearance_m docstring), isaaclab_arena/relations/relation_solver.py (_compute_no_overlap_loss anchor loop), isaaclab_arena/tests/test_no_collision_loss.py (test_solver_respects_clearance_m missing Z assertion) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["RelationSolver.solve()"] --> B["_compute_total_loss()"]
B --> C["Per-object relation losses\n(On, NextTo, AtPosition …)"]
B --> D["_compute_no_overlap_loss()"]
D --> E["non-anchor vs anchor\nclearance_m applied\n⚠ also overrides On.clearance_m"]
D --> F["non-anchor vs non-anchor\nforward pass: grad → child\nreverse pass: grad → other"]
C --> G["total_loss.mean()"]
E --> G
F --> G
G --> H{loss < convergence_threshold?}
H -- yes --> I["Return solved positions"]
H -- no --> J["optimizer.step()"]
J --> B
I --> K["ObjectPlacer._validate_placement()"]
K --> L["_validate_on_relations()"]
K --> M["_validate_no_overlap()\nmargin = clearance_m - 1e-6\nskips On-pairs and anchor-anchor"]
Reviews (3): Last reviewed commit: "Merge branch 'develop' into cvolk/no-col..." | Re-trigger Greptile |
Co-authored-by: isaaclab-review-bot[bot] <270793704+isaaclab-review-bot[bot]@users.noreply.github.com>
- Remove min_separation_m from docs code example (parameter deleted) - Fix max_placement_attempts default in docs (5 -> 10) - Add test_solver_respects_clearance_m (non-default clearance_m) - Add test_negative_clearance_m_raises (ValueError validation) - Add test_validation_accepts_on_parent_overlap (On pairs skipped) - Add test_validation_rejects_non_anchor_overlap (rejection path) - Update stale NoCollision docstrings in example smoke tests Signed-off-by: Clemens Volk <cvolk@nvidia.com>
## Summary This is to fix teleop crashing https://nvbugspro.nvidia.com/bug/6066640 The root cause is isaacteleop has a regression in latest 1.2.xxx. 1.1.x should be the latest stable version to use and Teleop team will push patches to fix the issues on Teeleop side. Teleop on arena side verified to work after rebuilding the arena docker with this change. Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary Remove server client from v0.2 release docs ## Detailed description - We plan on reworking the server client to fully support it in v0.3 - The current implementation of the server-client, and it's documentation, are only half supported. - Remove the documentation references to the server-client and aim for full support in `v0.3` - Address [6072205](https://nvbugspro.nvidia.com/bug/6072205)
Resolve conflicts between batch solver changes (this branch) and main's PositionLimits relation, UnaryRelation base class, docs rework, and success_proximity_max_distance. Update PositionLimitsLossStrategy to match batch-aware pattern. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Resolve conflicts between batch solver changes (this branch) and develop's PositionLimits relation and UnaryRelation base class. Drop NoCollision from imports (now built-in). Take develop's more general debug printing for unary relations. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
alexmillane
left a comment
There was a problem hiding this comment.
Great! Couple of nits. Thanks for the cleanup.
- Fix double-counting of no-overlap loss between non-anchor pairs by iterating unique pairs only (i, j where j > i) - Add state.device property to RelationSolverState - Use assert instead of if-then-raise in RelationSolverParams - Add comment explaining why no-collision slope is 10000 - Fix _compute_no_overlap_loss docstring - Comment out unguarded module-level call in notebook - Update stale "NoCollision" references in notebook docstrings - Add assert-over-raise coding style guideline to AGENTS.md Signed-off-by: Clemens Volk <cvolk@nvidia.com>
RelationSolverParams uses assert for the clearance_m check, which raises AssertionError. The test incorrectly expected ValueError. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
The no-collision loss only computed the forward direction for each non-anchor pair, so only the first object received gradient to move apart. The second object's position was always detached. When the first object was pinned against the On(parent) boundary, the solver could not achieve the full clearance_m separation. Compute the loss in both directions so both objects receive gradient. This fixes test_solver_respects_clearance_m which required 5 cm clearance but only achieved ~4.6 cm. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Supersedes #573 (rebased onto
developdue to batch solver divergence).Motivation
No-overlap between objects was managed via explicit
NoCollisionrelations thatArenaEnvBuilderauto-injected. This was an implementation detail leaking into the user API, and the solver had no built-in collision avoidance.Summary
RelationSolver, controlled by a singleclearance_mparameter onRelationSolverParams(default 1cm).NoCollisionclass removed from the public API.ArenaEnvBuilder._add_pairwise_no_collision()workaround removed.ObjectPlacerParams.min_separation_mremoved; validation unified to useclearance_m.Test plan