Skip to content

Jeff/fix/rosnav8#1791

Draft
jeff-hykin wants to merge 128 commits intodevfrom
jeff/fix/rosnav8
Draft

Jeff/fix/rosnav8#1791
jeff-hykin wants to merge 128 commits intodevfrom
jeff/fix/rosnav8

Conversation

@jeff-hykin
Copy link
Copy Markdown
Member

@jeff-hykin jeff-hykin commented Apr 16, 2026

Pre-requisite PRs

Will be perpetually re-testing hardware after merges/changes to ^

631af1e tested for 2 hours on the g1 (entire battery! instantly responsive after 2hrs)
c54578a044f892fe7f231b468f2f3512adf4282a (later) hardware tested and sim tested.

Problem

Robot blueprints each duplicated viewer backend selection logic (foxglove vs rerun vs rerun-web vs rerun-connect), making it error-prone to add new backends or change behavior. The RerunBridge also lacked a WebSocket server for dimos-viewer keyboard/click events in --connect mode.

Solution

  • port over the whole nav stack

Breaking Changes

  • None

How to Test

git checkout jeff/fix/rosnav8

# last hardware tested commit
git checkout c54578a044f892fe7f231b468f2f3512adf4282a

# simulation
uv run dimos run unitree-g1-nav-sim

# real
uv run dimos run unitree-g1-nav-onboard

Contributor License Agreement

  • I have read and approved the CLA.

Twelve fixes from Paul-style code review of #1780:

native_module.py:
- _maybe_build: use did_change(update=False) + post-success update_cache so
  failed builds never poison the rebuild cache
- delete stray "Source files changed" log line that fired on every call
- stop(): per-instance _stop_lock; capture proc/watchdog refs under the
  lock, do signal/wait/join outside the lock to avoid the watchdog-self-
  join deadlock; second caller no-ops via _stopping
- _read_log_stream: receive pid as a parameter instead of reaching into
  self._process.pid (TOCTOU with stop())
- _child_preexec_linux + ctypes hoisted to module scope under a
  sys.platform guard; no more re-import per start()
- _clear_nix_executable: gracefully handle cwd=None (skip the walk
  entirely), use .resolve() comparison so a symlinked cwd still
  terminates the walk, refuse to walk past cwd's tree

change_detect.py:
- fold max_file_size into _hash_files digest so different thresholds
  against the same cache_name can't corrupt each other
- new _locked_cache_file context manager — flock the .hash file directly
  in "a+" mode; no more orphan .lock sidecars accumulating in the cache
  dir; did_change/update_cache/clear_cache all share the helper

Tests:
- new test_should_rebuild_true_bypasses_change_check for the explicit
  "always rebuild" path
- new test_failed_build_does_not_mark_cache_clean for the update=False
  retry semantics

Build system:
- tiledb darwin overlay: filter patches by name instead of dropping all
  of them, so a future security patch from nixpkgs survives
- livox_sdk_config.hpp: honor $TMPDIR on Darwin instead of hardcoding /tmp

Perf script:
- compute cache_name inline (using the same inspect-based source_file
  pattern) instead of calling the inlined _build_cache_name method

All 26 tests across test_change_detect.py + test_native_module.py +
test_native_rebuild.py pass. ruff + mypy clean. mid360_native and
fastlio2_native still build end-to-end on aarch64-darwin. Linux drvPaths
for libpqxx/tiledb/pcl/vtk verified unchanged by the overlay.
Delete dimos/visualization/constants.py and consolidate all rerun
defaults/types into rerun/config.py per review feedback (no diluted
ownership). Also wire web_port through to rr.serve_web_viewer() so it
actually sets the port, and guard Blueprint import behind TYPE_CHECKING.
- Move visualization constants to dimos/visualization/rerun/constants.py
- Replace CmdVelMux with MovementManager (tele_cmd_vel_scaling, no timers)
- WebSocket server: use global_config.listen_host, self._loop, ViewerMsg TypedDicts
- Bridge: web_port config wired to rr.serve_web_viewer, Archetype behind TYPE_CHECKING
- Clean up tests: fixtures for cleanup, pytest.approx, unsub, dedupe wait_for_server
- Move inline imports to top level (socket, psutil, urlparse, cast, get_args, etc.)
- Remove dead code (_query_pose, CmdVelScaling, debug logging)
Conflicts resolved:
- python_worker.py: kept rosnav8's _WorkerState refactor
- pyproject.toml: kept both open3d-unofficial-arm and rpyc deps
- uv.lock: re-locked
…es compliance

FastLio2 (stop only), FarPlanner, LocalPlanner, PathFollower,
TarePlanner, TerrainAnalysis all need explicit start/stop so the
AST-based test_module_has_start_and_stop check can find them.
- Update FarPlannerConfig test defaults (sensor_range 30.0, is_static_env True)
- Add _query_pose method to MovementManager for TF-based pose lookup
- Remove section marker comments from far_planner and local_planner configs
- Exclude .ignore.enhance from section marker test scan
winit now fails immediately without a display (before WS connect) and
hangs with a display (GUI loop blocks before printing URL). Test cannot
work reliably with the current viewer binary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants