Skip to content

Make no-overlap a built-in solver behavior [V2 against develop]#589

Merged
cvolkcvolk merged 27 commits intodevelopfrom
cvolk/no-collision-solver-builtin-v2
Apr 15, 2026
Merged

Make no-overlap a built-in solver behavior [V2 against develop]#589
cvolkcvolk merged 27 commits intodevelopfrom
cvolk/no-collision-solver-builtin-v2

Conversation

@cvolkcvolk
Copy link
Copy Markdown
Collaborator

@cvolkcvolk cvolkcvolk commented Apr 13, 2026

Supersedes #573 (rebased onto develop due to batch solver divergence).

Motivation

No-overlap between objects was managed via explicit NoCollision relations that ArenaEnvBuilder auto-injected. This was an implementation detail leaking into the user API, and the solver had no built-in collision avoidance.

Summary

  • No-overlap is now a built-in behavior of RelationSolver, controlled by a single clearance_m parameter on RelationSolverParams (default 1cm).
  • The solver computes pairwise no-overlap loss for all object pairs automatically (except anchor-anchor).
  • NoCollision class removed from the public API.
  • ArenaEnvBuilder._add_pairwise_no_collision() workaround removed.
  • ObjectPlacerParams.min_separation_m removed; validation unified to use clearance_m.
  • Stale docs and comments updated.

Test plan

  • All 62 relation/placer tests pass
  • Pre-commit checks pass

viiik-inside and others added 14 commits April 10, 2026 02:41
## Summary
Explain the resume flag
…572)

## Summary
This is a doc change requested by QA in
https://nvbugspro.nvidia.com/bug/6063011
It clarifies that evaluated newton trained model using physx is expected
to completely fail the dexsuite task.
## Summary
Subprocess-spawning tests hang indefinitely on CI.

## Causes & Fixes

### Problems

From Lab:

1. Lab reports "AppLauncher doesnt quit properly after app.close(),
app.quit() doesn't help either."
2. Cold startup times for tests using IS can be upwards of 10 min on Lab
CI machines.

Above issues apply to us, because tests hang during sub-process tests
section, between the end of last test and the beginning of the next
test. See detailed logs and analysis from reproducing locally
[here](#568)

### Fixes

1. `SimulationApp` Force Exit: Skips `app.close()` (which can hang
indefinitely in Kit's shutdown path) when the env var
`ISAACLAB_ARENA_FORCE_EXIT_ON_COMPLETE=1` is set.
Calls a new `_kill_child_processes()` helper that walks `/proc` to
`SIGKILL` all direct children before doing `os._exit(0)`, preventing
orphaned Kit processes from holding GPU resources.

2. `run_subprocess` has a configuarable wall-clock timeouts and process
isolation, such that when needed, it could trigger the force exit path
above.

3. Add wall-clock timing and logging inside the SimulationApp start
method. Keep track of how much startup time is taking on CI.

## Minor fixes

1. Add timing stats into pytest cmds such that it reports the slowests
test func at the end of each section.

2. Parametrize multi-config tests: Convert nested for-loops in
`test_zero_action_policy_kitchen_pick_and_place` (6 configs) and
`test_zero_action_policy_gr1_open_microwave` (3 configs) into
`@pytest.mark.parametrize.` Each config gets its own timeout, pass/fail,
and timing.

3. Reduce num_envs in gr00t eval_runner test to speed up.

### Local validation
With the repro script
#568, I do not have
local stalling.
Log for more details.

[repro_20260410_041313.log](https://github.com/user-attachments/files/26620524/repro_20260410_041313.log)


### CI Before -- timeout
<img width="1219" height="170" alt="image"
src="https://github.com/user-attachments/assets/2f9eabb2-403d-4257-bd84-4da508de7d00"
/>

### CI After
<img width="1219" height="170" alt="image"
src="https://github.com/user-attachments/assets/dbaf2a7d-e3a4-4ad2-85a4-389eae962c1d"
/>
<img width="1198" height="472" alt="image"
src="https://github.com/user-attachments/assets/8a24f1aa-4bcb-4030-b075-09f3885673c2"
/>

## TODOs
- test_camera_observations takes 10mins to start the app due to Kit cold
start. Experimenting with a warm start before tests process here
#565
- Kit itself intermittently deadlocks during startup — not because of
orphans, but because Kit's internal thread synchronization fails on
low-CPU runners. Experimenting with retry here
#570
## Summary
Install missing arena package into NGC docker.

## Detailed description
- We forgot to install our new package `isaaclab_arena_examples` into
the docker image.
- This was masked in CI due to mounting a branch and correctly
installing there.

Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary
- Fix the `eval_config.json` example in the DexSuite Kuka Allegro Lift
evaluation docs to match the actual `eval_runner.py` schema (`jobs`
array with `name`, `arena_env_args`, `policy_type`,
`policy_config_dict`).

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary
Doc fix to https://nvbugspro.nvidia.com/bug/6062848, Readme updates.

## Detailed description
- Policy training docs: Added a "Compute Requirements" section (GPU VRAM
+ system RAM guidance) to all three workflow tutorials
(static_manipulation, sequential_static_manipulation, locomanipulation)
and fixed the "an an" typo.
- Arena-in-your-repo docs: Created an index.rst landing page for the
section and updated docs/index.rst to use it instead of listing the
three sub-pages individually.
- README: Added a link to the "Installing IsaacLab-Arena in Your
Repository" guide in the "Publishing Your Own Benchmark" section.
## Summary
As CI seems to run smoothly agin, bring back previously disabled tests.
## Problem

IsaacLab-Arena needs a tabletop manipulation task where the G1 robot
uses the WBC-AGILE locomotion policy to pick up an apple and place it on
a plate, while balancing in place.

Ref: ISAAC-12630

## Solution

Add a new `G1AgileTabletopAppleToPlateEnvironment` that wires the
`G1WBCAgileJointEmbodiment` (from PR #489) with the existing
`PickAndPlaceTask`, a Seattle Lab table scene, and appropriate object
assets.

## Changes

-
**`isaaclab_arena_environments/g1_agile_tabletop_apple_to_plate_environment.py`**
— New environment class: G1 robot at (-0.4, 0, 0) facing a table with an
apple (pick object) and a clay plate (target). Uses
`G1WBCAgileJointEmbodiment` for balance + upper body control. 30-second
episodes. Supports `--object`, `--embodiment`, `--teleop_device` CLI
args.
- **`isaaclab_arena_environments/cli.py`** — Register the new
environment in the `ExampleEnvironments` dict.
- **`isaaclab_arena/tests/test_g1_agile_tabletop_apple_to_plate.py`** —
Two tests: (1) initial state is not terminated (apple starts away from
plate), (2) teleporting apple onto plate triggers success termination.
Uses correct base-height command (0.75) to keep the robot stable.

## Testing

- [x] New unit tests added (2 tests)
- [x] Linters pass locally (black, flake8, isort, pyupgrade, codespell,
license headers)
- [ ] CI pipeline (tests require Isaac Sim Docker with GPU)

## Notes

- Object positions (apple, plate, robot) are based on Seattle Lab table
geometry and G1 arm reach. May need visual tuning in simulator.
- No new task class needed — the existing `PickAndPlaceTask` handles
contact-sensor success detection, object-dropped termination, and
metrics.
- Self-review caught and fixed a test issue: the initial-state test was
sending zero base-height commands, which would cause the robot to squat.
Fixed to use 0.75 (matching established pattern from
`test_g1_wbc_embodiment.py`).

---
*Generated by [autodev](https://github.com/anthropics/claude-code) —
Claude Code*

---------

Signed-off-by: Lionel Gulich <lgulich@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary

Rework the concepts documentation to eliminate AI slop.

---------

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: isaaclab-review-bot[bot] <270793704+isaaclab-review-bot[bot]@users.noreply.github.com>
Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Motivation

When building tasks, users often need to restrict object placement to a
sub-region of a surface -- for example, only within the robot's
reachable workspace. `On(table)` allows placement anywhere on the table,
and `AtPosition` pins to a single point. There was no way to constrain
to a region or set bounds on individual axes.

## Summary
- New `PositionLimits` unary relation that constrains object position in
world coordinates. Supports full ranges (box), single bounds
(half-plane), or a mix per axis.
- New `UnaryRelation` base class so `get_spatial_relations()`
automatically includes any new unary relation without updating
isinstance checks.
- `PositionLimitsLossStrategy` using `linear_band_loss` (both bounds)
and `single_boundary_linear_loss` (single bound).
- Registered in solver strategies with slope=100.0 (matching
`AtPosition`).
- Fixed `_print_unary_relation_debug` to work with any unary relation
type.

## Usage

```python
# Full box constraint (reachable region)
apple.add_relation(On(table))
apple.add_relation(PositionLimits(x_min=-0.3, x_max=0.3, y_min=-0.2, y_max=0.2))

# Single bound (half-plane)
apple.add_relation(PositionLimits(x_min=0.5))

# Mix
apple.add_relation(PositionLimits(x_min=-0.3, x_max=0.3, y_min=-0.2))
```

## Test plan
- [x] 12 PositionLimits tests pass (11 strategy-level, 1 solver
integration)
- [x] All relation/placer tests pass
- [x] Pre-commit checks pass

---------

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary
Test code owners by only adding myself.

## Detailed description
- We had an incident where a couple of bots with organization access
collaborated to push an unreviewed change to main.
- This is an attempt to prevent this in the future.
## Summary
Complete the list code owners.

## Detailed description
- Follows successful test #584
## Summary
Fix CODEOWNERS specification.

## Detailed description
- Mutiple lines indicate tha the last line overrides the previous.
- This is not what was intended.
- Fix.
Replace the explicit NoCollision relation with a built-in no-overlap
mechanism in RelationSolver, controlled by a single clearance_m
parameter on RelationSolverParams.

- Refactor NoCollisionLossStrategy to a standalone class (not a
  RelationLossStrategy) that accepts clearance_m directly
- Add clearance_m field (default 0.01) to RelationSolverParams with
  __post_init__ validation
- Add _compute_no_overlap_loss to RelationSolver that computes
  pairwise overlap loss for all non-anchor objects automatically
- Delete NoCollision class from relations.py
- Remove _add_pairwise_no_collision from ArenaEnvBuilder
- Remove min_separation_m from ObjectPlacerParams; validation now
  uses solver_params.clearance_m
- Update _validate_no_overlap to skip anchor-anchor pairs
- Update tests, notebooks, and docs

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@cvolkcvolk cvolkcvolk changed the title Make no-overlap a built-in solver behavior [Draft against develop] Make no-overlap a built-in solver behavior Apr 13, 2026
## Summary
Bring IsaacLab issue templates into Isaac Lab - Arena

## Detailed description
- Gives users a structure for bug reports and feature requests.
@cvolkcvolk cvolkcvolk changed the title [Draft against develop] Make no-overlap a built-in solver behavior Make no-overlap a built-in solver behavior [V2 against develop] Apr 13, 2026
@cvolkcvolk cvolkcvolk marked this pull request as ready for review April 13, 2026 13:17
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR refactors no-overlap collision avoidance from an explicit user-facing NoCollision relation into a built-in solver behavior controlled by a single clearance_m parameter on RelationSolverParams. The design is sound — it removes a leaky abstraction and simplifies the user API. The implementation is largely correct, with a few items worth addressing.

Design Assessment

Design is sound. Moving no-overlap from a user-injected relation to built-in solver behavior is the right call. The old approach leaked an implementation detail (NoCollision + ArenaEnvBuilder._add_pairwise_no_collision) into the public API. The new approach:

  • Eliminates the old deduplication bug (the removed TODO about computing loss twice for bidirectional NoCollision relations)
  • Centralizes no-overlap control in one place (clearance_m)
  • Correctly uses .detach() on the other object's position to prevent simultaneous gradient push on both objects in a pair, improving optimization stability

The NoCollisionLossStrategy being decoupled from RelationLossStrategy (no longer inheriting from it) is appropriate since it's now an internal solver concern, not a user-configurable strategy.

Findings

Details in inline comments below.

Test Coverage

  • Coverage: Adequate
  • Test quality: Good

Well-tested: NoCollisionLossStrategy unit tests cover separation, touching, overlap, slope scaling, volume formula, and multi-env batched input. Solver integration tests verify objects end up separated and results are reproducible. The removed test_no_collision_loss_scales_with_relation_weight is correctly dropped.

Gaps worth noting (non-blocking):

  • No test for clearance_m validation (__post_init__ raising ValueError for negative values)
  • No test for anchor-anchor skip in _validate_no_overlap
  • No explicit test verifying solver computes no-overlap loss between non-anchor and anchor objects

CI Status

Pre-commit check is queued.

Verdict

COMMENT

Well-motivated refactoring that simplifies the public API. Core logic is correct. See inline comments for specific items to address.

Comment thread isaaclab_arena/relations/relation_solver.py
Comment thread isaaclab_arena/relations/relation_solver.py Outdated
Comment thread isaaclab_arena/relations/relation_solver.py Outdated
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

This PR moves no-overlap enforcement from an explicit NoCollision relation that ArenaEnvBuilder injected into the solver as a built-in, controlled by a single clearance_m parameter on RelationSolverParams. The NoCollision class, _add_pairwise_no_collision(), and ObjectPlacerParams.min_separation_m are all removed.

  • RelationSolverParams.clearance_m docstring says the margin applies to "every pair of non-anchor objects", but _compute_no_overlap_loss also applies it to every non-anchor ↔ anchor pair. When solver.clearance_m > On.clearance_m, the high-slope no-overlap loss (10 000) completely overrides the On strategy (slope 100), placing On(anchor) objects at solver.clearance_m above the surface rather than the specified On.clearance_m. test_solver_respects_clearance_m exercises this exact scenario (clearance_m=0.05, On.clearance_m=0.01) but never asserts the resulting Z position.

Confidence Score: 4/5

Safe to merge for default configurations; P1 docstring/behavioral mismatch should be resolved before recommending users raise solver.clearance_m above On.clearance_m.

One P1 finding: clearance_m docstring says 'non-anchor objects' but applies to all pairs including anchor-non-anchor, silently overriding On.clearance_m when the solver clearance is larger. The default config (both = 0.01 m) is unaffected, but any user following the docs to set a larger solver clearance will get wrong Z heights for On-related objects.

isaaclab_arena/relations/relation_solver_params.py (clearance_m docstring), isaaclab_arena/relations/relation_solver.py (_compute_no_overlap_loss anchor loop), isaaclab_arena/tests/test_no_collision_loss.py (test_solver_respects_clearance_m missing Z assertion)

Important Files Changed

Filename Overview
isaaclab_arena/relations/relation_solver_params.py Adds clearance_m parameter; docstring claims 'non-anchor objects' only but implementation applies clearance to all pairs including anchor-non-anchor, conflicting with On.clearance_m when values differ.
isaaclab_arena/relations/relation_solver.py Implements built-in no-overlap via _compute_no_overlap_loss; anchor-non-anchor pairs are correctly included, but the associated params docstring says 'non-anchor only', creating a subtle Z-height override for On-related objects.
isaaclab_arena/tests/test_no_collision_loss.py Good unit coverage for NoCollisionLossStrategy; test_solver_respects_clearance_m uses clearance_m=0.05 with On.clearance_m=0.01 but never asserts Z position, missing the clearance-override bug.
isaaclab_arena/relations/object_placer.py Validation logic correctly skips On-pairs and anchor-anchor pairs; _validate_no_overlap margin is derived from solver.clearance_m and is internally consistent for default configs.
isaaclab_arena/relations/relation_loss_strategies.py New standalone NoCollisionLossStrategy computes 3D volume-overlap loss with clearance expansion on the parent bbox; logic is sound and the multiplicative product correctly yields zero loss when objects are Z-separated.
isaaclab_arena/relations/relations.py NoCollision class cleanly removed; remaining relation classes (On, NextTo, IsAnchor, etc.) unchanged and correct.
isaaclab_arena/environments/arena_env_builder.py _add_pairwise_no_collision() workaround removed cleanly; new _solve_relations() delegates entirely to ObjectPlacer with no stale imports or references.
isaaclab_arena/relations/object_placer_params.py min_separation_m removed; ObjectPlacerParams now delegates clearance configuration to solver_params.clearance_m.
isaaclab_arena_environments/gr1_table_multi_object_no_collision_environment.py Updated to use built-in solver no-overlap; no longer injects NoCollision relations manually. Clean.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["RelationSolver.solve()"] --> B["_compute_total_loss()"]
    B --> C["Per-object relation losses\n(On, NextTo, AtPosition …)"]
    B --> D["_compute_no_overlap_loss()"]
    D --> E["non-anchor vs anchor\nclearance_m applied\n⚠ also overrides On.clearance_m"]
    D --> F["non-anchor vs non-anchor\nforward pass: grad → child\nreverse pass: grad → other"]
    C --> G["total_loss.mean()"]
    E --> G
    F --> G
    G --> H{loss < convergence_threshold?}
    H -- yes --> I["Return solved positions"]
    H -- no --> J["optimizer.step()"]
    J --> B
    I --> K["ObjectPlacer._validate_placement()"]
    K --> L["_validate_on_relations()"]
    K --> M["_validate_no_overlap()\nmargin = clearance_m - 1e-6\nskips On-pairs and anchor-anchor"]
Loading

Reviews (3): Last reviewed commit: "Merge branch 'develop' into cvolk/no-col..." | Re-trigger Greptile

Comment thread isaaclab_arena/relations/relation_solver.py
Comment thread isaaclab_arena/relations/relation_solver.py Outdated
cvolkcvolk and others added 2 commits April 13, 2026 15:49
Co-authored-by: isaaclab-review-bot[bot] <270793704+isaaclab-review-bot[bot]@users.noreply.github.com>
- Remove min_separation_m from docs code example (parameter deleted)
- Fix max_placement_attempts default in docs (5 -> 10)
- Add test_solver_respects_clearance_m (non-default clearance_m)
- Add test_negative_clearance_m_raises (ValueError validation)
- Add test_validation_accepts_on_parent_overlap (On pairs skipped)
- Add test_validation_rejects_non_anchor_overlap (rejection path)
- Update stale NoCollision docstrings in example smoke tests

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Copy link
Copy Markdown
Collaborator

@zhx06 zhx06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Approved.

qianl-nv and others added 3 commits April 13, 2026 12:37
## Summary
This is to fix teleop crashing https://nvbugspro.nvidia.com/bug/6066640
The root cause is isaacteleop has a regression in latest 1.2.xxx. 1.1.x
should be the latest stable version to use and Teleop team will push
patches to fix the issues on Teeleop side.
Teleop on arena side verified to work after rebuilding the arena docker
with this change.

Co-authored-by: Xinjie Yao <xyao@nvidia.com>
## Summary
Remove server client from v0.2 release docs

## Detailed description
- We plan on reworking the server client to fully support it in v0.3
- The current implementation of the server-client, and it's
documentation, are only half supported.
- Remove the documentation references to the server-client and aim for
full support in `v0.3`
- Address [6072205](https://nvbugspro.nvidia.com/bug/6072205)
Resolve conflicts between batch solver changes (this branch)
and main's PositionLimits relation, UnaryRelation base class,
docs rework, and success_proximity_max_distance. Update
PositionLimitsLossStrategy to match batch-aware pattern.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
This reverts commit c2c9bab, reversing
changes made to ca7caa4.
Resolve conflicts between batch solver changes (this branch)
and develop's PositionLimits relation and UnaryRelation base
class. Drop NoCollision from imports (now built-in). Take
develop's more general debug printing for unary relations.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Copy link
Copy Markdown
Collaborator

@alexmillane alexmillane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Couple of nits. Thanks for the cleanup.

Comment thread docs/pages/concepts/concept_object_placement.rst
Comment thread isaaclab_arena/environments/arena_env_builder.py
Comment thread isaaclab_arena/relations/object_placer.py
Comment thread isaaclab_arena/relations/relation_solver.py Outdated
Comment thread isaaclab_arena/relations/relation_solver_params.py Outdated
- Fix double-counting of no-overlap loss between non-anchor
  pairs by iterating unique pairs only (i, j where j > i)
- Add state.device property to RelationSolverState
- Use assert instead of if-then-raise in RelationSolverParams
- Add comment explaining why no-collision slope is 10000
- Fix _compute_no_overlap_loss docstring
- Comment out unguarded module-level call in notebook
- Update stale "NoCollision" references in notebook docstrings
- Add assert-over-raise coding style guideline to AGENTS.md

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@cvolkcvolk cvolkcvolk enabled auto-merge (squash) April 14, 2026 09:56
RelationSolverParams uses assert for the clearance_m check, which
raises AssertionError.  The test incorrectly expected ValueError.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
The no-collision loss only computed the forward direction for each
non-anchor pair, so only the first object received gradient to move
apart. The second object's position was always detached. When the first
object was pinned against the On(parent) boundary, the solver could not
achieve the full clearance_m separation.

Compute the loss in both directions so both objects receive gradient.
This fixes test_solver_respects_clearance_m which required 5 cm
clearance but only achieved ~4.6 cm.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Comment thread isaaclab_arena/relations/relation_solver_params.py
@cvolkcvolk cvolkcvolk merged commit ecd4d06 into develop Apr 15, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants