Skip to content

Jeff/fix/rconnect2#1784

Open
jeff-hykin wants to merge 41 commits intodevfrom
jeff/fix/rconnect2
Open

Jeff/fix/rconnect2#1784
jeff-hykin wants to merge 41 commits intodevfrom
jeff/fix/rconnect2

Conversation

@jeff-hykin
Copy link
Copy Markdown
Member

@jeff-hykin jeff-hykin commented Apr 13, 2026

NOTE: this will become ready-for-review once this dimos-viewer PR is merged

Problem(s)

  1. dimos-viewer with --connect doesn't work for remote connections.
  2. there's no way to remap what dimos-viewer publishes on. It publishes to a hardcoded LCM path of cmd_vel which ends up making stream-renaming a pain since everything else must be renamed to know if the cmd_vel is coming from the viewer or from a planner/module.
  3. We do the "if global_config == rerun" in multiple places

Solution

Use websockets instead of LCM for the viewer.

  • A dimos module starts a websocket server listening for clicked_point and tele_cmd_vel
  • The dimos-viewer tries to connect to a websocket and publishes clicked_point and tele_cmd_vel on there.
  • Theres a consolidated vis_module to de-dup the visualization logic

Breaking Changes

None

How to Test

You'll need the jeff/fix/connect branch of dimos-viewer compiled and python installed into dimos:

# Build and install dimos-viewer (requires pixi)
cd <path-to-dimos-viewer>
git checkout jeff/fix/connect
pixi run build
uv pip install target/wheels/dimos_viewer-*.whl --force-reinstall
# Unit + E2E tests
uv run pytest dimos/visualization/rerun/test_websocket_server.py dimos/visualization/rerun/test_viewer_ws_e2e.py -v --timeout=30 -k "not test_viewer_ws_client_connects"

Alternatively

# Terminal 1: start the websocket server
python -c "
from dimos.visualization.rerun.websocket_server import RerunWebSocketServer
import threading
server = RerunWebSocketServer(port=3030)
server.clicked_point.subscribe(lambda pt: print(f'[CLICK] {pt.x:.3f},{pt.y:.3f},{pt.z:.3f}'))
server.tele_cmd_vel.subscribe(lambda tw: print(f'[TWIST] {tw}'))
server.start()
threading.Event().wait()
"

# Terminal 2: start the viewer
dimos-viewer --connect rerun+http://localhost:9877/proxy --ws-url ws://127.0.0.1:3030/ws

Click in the 3D viewport and use WASD keys — should see [CLICK] and [TWIST] in terminal 1.

Contributor License Agreement

  • I have read and approved the CLA.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

This PR replaces LCM-based viewer communication with a WebSocket protocol for remote dimos-viewer connections, introduces a vis_module factory to consolidate repeated viewer-backend selection logic across blueprints, adds MovementManager for teleop/nav velocity arbitration, and cleans up SIGINT handling in worker processes.

  • P1 — MovementManager.start() drops subscription handles: clicked_point, nav_cmd_vel, and tele_cmd_vel subscriptions are never passed to register_disposable, so they cannot be cleaned up when stop() is called. Callbacks will continue firing on a stopped module and may publish to torn-down output streams.

Confidence Score: 4/5

Safe to merge once the MovementManager subscription-cleanup bug is fixed; all other findings are P2 or previously addressed.

One P1 bug in the new MovementManager module: subscription handles are discarded in start(), preventing cleanup on stop() and risking stale publishes to stopped streams. All other prior P1/P0 concerns from previous review threads are addressed or acknowledged.

dimos/navigation/smart_nav/modules/movement_manager/movement_manager.py

Important Files Changed

Filename Overview
dimos/navigation/smart_nav/modules/movement_manager/movement_manager.py New module: teleop/nav velocity mux and click-to-goal relay. P1 bug: start() drops all subscription handles, preventing cleanup on stop().
dimos/visualization/rerun/websocket_server.py New module: WebSocket server that receives dimos-viewer click/twist events and publishes them as DimOS streams. Clean structure; prior concerns (silent bind failure, _server_ready not reset) are noted in previous threads.
dimos/visualization/vis_module.py New factory function that consolidates viewer-backend selection (rerun, foxglove, none) into one place, replacing repeated if/elif chains across blueprints.
dimos/visualization/rerun/bridge.py Significant refactor: removes viewer_mode in favour of rerun_open/rerun_web, lazy-imports rerun SDK, replaces lru_cache with per-instance cache to avoid self leak, and always uses serve_grpc + optional spawned viewer.
dimos/visualization/rerun/constants.py New lightweight constants module (ViewerBackend, RerunOpenOption, port constants) extracted to avoid pulling in the full Rerun SDK on import.
dimos/core/coordination/python_worker.py Workers now ignore SIGINT so the coordinator can orchestrate shutdown via the pipe without BrokenPipeErrors; KeyboardInterrupt handlers removed as redundant.
dimos/robot/unitree/go2/blueprints/smart/unitree_go2.py Adds MovementManager to the smart nav blueprint so that teleop/nav arbitration is now part of the standard Go2 stack.
dimos/robot/cli/dimos.py Applies global_config CLI overrides before blueprint loading (workaround for eagerly-evaluated vis_module calls); rewrites rerun-bridge CLI to use RerunBridgeModule directly instead of removed run_bridge().

Sequence Diagram

sequenceDiagram
    participant V as dimos-viewer
    participant WS as RerunWebSocketServer
    participant MM as MovementManager
    participant RB as RerunBridgeModule
    participant P as ReplanningAStarPlanner
    participant E as WavefrontFrontierExplorer

    V->>WS: WS click {x,y,z}
    WS->>MM: clicked_point (PointStamped)
    MM->>P: goal (PointStamped)

    V->>WS: WS twist {linear,angular}
    WS->>MM: tele_cmd_vel (Twist)
    MM->>P: stop_movement (Bool)
    MM->>E: stop_movement (Bool)
    MM->>MM: cmd_vel (scaled Twist)

    V->>WS: WS stop
    WS->>MM: tele_cmd_vel (Twist.zero())

    P->>MM: nav_cmd_vel (Twist)
    Note over MM: suppressed if teleop active
    MM->>MM: cmd_vel (nav Twist, after cooldown)

    V-->>RB: rerun+http gRPC (visualization data)
Loading

Reviews (6): Last reviewed commit: "update docs" | Re-trigger Greptile

Comment thread dimos/visualization/rerun/websocket_server.py
Comment thread dimos/visualization/rerun/constants.py
Comment thread dimos/visualization/rerun/test_websocket_server.py Outdated
Comment thread dimos/visualization/rerun/websocket_server.py Outdated
@jeff-hykin jeff-hykin enabled auto-merge (squash) April 13, 2026 18:22
@jeff-hykin jeff-hykin mentioned this pull request Apr 17, 2026
6 tasks
RERUN_OPEN_DEFAULT: RerunOpenOption = "native"
RERUN_ENABLE_WEB = True
RERUN_GRPC_PORT = 9876
RERUN_WEB_PORT = 9090
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't bind to 9090, this is an extremely common port (our china VPN already uses for example, TOR is on 9090)

Copy link
Copy Markdown
Member Author

@jeff-hykin jeff-hykin Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • pre-existing (I just moved the variable)
  • this doesn't set the port (its just for docker to know about the existing Rerun default port)

I think changing the code to set the port to a non-default port (and then testing everything again) is outside the scope of this PR, but I agree we should do it

Copy link
Copy Markdown
Contributor

@leshy leshy Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026-04-23_17-47-34.mp4

so IDK changes this PR introduces but dimos run doesn't work for me on this branch while dev works fine. This looks like your clicking server listen port, not rerun?

2026-04-23_16-40

dev doesn't listen on 9090 for me, is there a way to rebind your listen port via some config?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized whats happening, this would probably be a problem if you try to use the rerun web on dev. The web server was started in the background when native wasn't explicitly set - which I'm not going to debate right now so I just changed that to not start it.

Also the port should be properly configurable now.

ViewerBackend: TypeAlias = Literal["rerun", "foxglove", "none"]
RerunOpenOption: TypeAlias = Literal["none", "web", "native", "both"]

RERUN_OPEN_DEFAULT: RerunOpenOption = "native"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't rerun defaults belong to visualization/rerun in a standard config dict?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because they need to be imported by global_config and we don't want to import rerun just to run dimos --help

Copy link
Copy Markdown
Contributor

@leshy leshy Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rerun bridge.py actually doesn't import rerun until start() is called so this can be imported freely

(at least it isn't supposed to, I see Sam added two imports at the top but those can be guarded by a if TYPE_CHECKING since they are just for typing)

generally if extracting (which is not needed here) I'd rather extract the config into an external file rerun/config.py then start creating constants.py files with dilluted ownership

Copy link
Copy Markdown
Member Author

@jeff-hykin jeff-hykin Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're saying move class RerunConfig() to config.py it makes a circular import:
global_config.py → rerun/config.py → core/module.py (ModuleConfig) → global_config.py

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved the imports, made RERUN_WEB_PORT, but I think I'm gonna leave this this as-is since this was more or less pre-existing and this PR is really starting to expand.
We already need to clean up the terrible rerun_config later so we can clean it up with that.

@jeff-hykin
Copy link
Copy Markdown
Member Author

jeff-hykin commented Apr 21, 2026

Sorry I forgot to port the project.toml change from rosnav8 back to this branch. Ready now though

Comment thread dimos/visualization/rerun/websocket_server.py Outdated
Comment thread dimos/visualization/rerun/websocket_server.py Outdated
Comment thread dimos/visualization/rerun/websocket_server.py Outdated
Comment thread dimos/visualization/rerun/websocket_server.py Outdated
jeff-hykin added a commit that referenced this pull request Apr 23, 2026
Add entries for all remaining review comments:
- leshy comments on constants.py (3114946798, 3114953830)
- greptile automated comments (3074737748, 3074737821, 3074737877, 3074737977)
- Update entries for newly fixed comments (mod->module, inline import, __init__/__new__)
- Remove duplicate entry for comment 3120880870
# point DYLD_LIBRARY_PATH at the real libpython directory.
executable = sys.executable if sys.platform != "darwin" else "mjpython"
env = os.environ.copy()
if sys.platform == "darwin":
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem specific to newer mac's. (this got in the way of me testing this branch)

Comment thread pyproject.toml
Comment thread dimos/core/global_config.py
Comment thread dimos/visualization/rerun/test_viewer_ws_e2e.py Outdated
Comment on lines +63 to +67
def start(self) -> None:
super().start()
self.clicked_point.subscribe(self._on_click)
self.nav_cmd_vel.subscribe(self._on_nav)
self.tele_cmd_vel.subscribe(self._on_teleop)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Subscriptions never unregistered on stop()

The three subscribe() calls in start() each return an unsubscribe callable, but all three return values are discarded. The rest of the codebase wraps these in Disposable and passes them to register_disposable (e.g. ReplanningAStarPlanner and WavefrontFrontierExplorer in this same PR) so super().stop() can clean them up.

Without registration, _on_click, _on_nav, and _on_teleop will continue to fire after stop() is called. Because stop() calls super().stop() which tears down the output streams, any late-arriving message will try to publish() on a stopped cmd_vel / goal / stop_movement stream — and may raise or silently produce stale commands.

# start() should be:
@rpc
def start(self) -> None:
    super().start()
    self.register_disposable(Disposable(self.clicked_point.subscribe(self._on_click)))
    self.register_disposable(Disposable(self.nav_cmd_vel.subscribe(self._on_nav)))
    self.register_disposable(Disposable(self.tele_cmd_vel.subscribe(self._on_teleop)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants