Skip to content

feat(perception): monocular obstacle avoidance via optical flow 1775#1908

Draft
jhengyilin wants to merge 3 commits intodimensionalOS:devfrom
jhengyilin:feature/optical-flow-obstacle-avoidance-1775
Draft

feat(perception): monocular obstacle avoidance via optical flow 1775#1908
jhengyilin wants to merge 3 commits intodimensionalOS:devfrom
jhengyilin:feature/optical-flow-obstacle-avoidance-1775

Conversation

@jhengyilin
Copy link
Copy Markdown

@jhengyilin jhengyilin commented Apr 23, 2026

feat(perception): monocular obstacle avoidance via optical flow τ-estimation

Branch: feature/optical-flow-obstacle-avoidance-1775main


Summary

Implements OpticalFlowModule — real-time Time-to-Contact (τ) estimation from
sparse optical flow on a single RGB camera. No depth sensor, no calibration, no
GPU. Designed as a complement to LiDAR proximity sensing, not a replacement.
Closes

Answers to issue

"Can optical flow be used for real-time obstacle avoidance?"
Yes, as a complement to LiDAR — not a replacement. It detects fast-approaching
obstacles via τ = 1/divergence with 93.9% precision when it fires, but recall
is 0.295, so it only catches ~30% of danger frames. It is blind to stationary
obstacles. Recommended: pair with LiDAR for static proximity; use optical flow
for dynamic and fast-approaching obstacles.

"How much does optical flow correlate with LiDAR?"
Weakly. Spearman r = +0.173 — correct direction but too noisy for a calibrated
timer. Use as a ranked urgency signal, not a precise TTC.

"What keypoint strategies? What does ORB-SLAM use?"
ORB-SLAM uses Oriented FAST for rotation-invariant BRIEF descriptor matching.
That orientation step is irrelevant for Lucas-Kanade, which tracks image patches
directly and has no orientation input. Plain FAST is the correct choice here.

Problem

The Go2 uses LiDAR for obstacle proximity but cannot answer "how fast is it
approaching?" This adds a camera-based early-warning signal for fast-approaching
or dynamic obstacles, complementing LiDAR rather than replacing it.

What Was Built

dimos/perception/optical_flow/
├── __init__.py
├── tac_estimator.py           ← divergence → τ math 
├── optical_flow_module.py     ← dimos Module: camera in, signals out
└── backends/lucas_kanade.py   ← FAST corners + Lucas-Kanade tracking

scripts/eval_optical_flow_gated.py   ← odom-gated evaluation vs LiDAR
scripts/visualize_optical_flow.py    ← live Rerun viewer (annotated frames + τ plot)
RESULTS.md                           ← theory, methodology, full results

Pipeline at a glance:

flowchart LR
    A["Camera RGB frame"] --> B["FAST keypoint detection"]
    B --> C["Lucas-Kanade tracking"]
    C --> D["Flow vectors (dx, dy)"]
    D --> E["5×5 grid divergence"]
    E --> F["τ = 1 / divergence"]
    F --> G{"τ < threshold?"}
    G -- yes --> H["DANGER signal"]
    G -- no --> I["CLEAR"]
    J["IMU ω (yaw rate)"] --> K{"|ω| > 0.3?"}
    K -- yes --> L["GATED — suppress danger"]
    K -- no --> G
Loading

Module Interface

Stream Direction Type Description
color_image In Image RGB/BGR camera feed
angular_velocity In Any Yaw rate rad/s — optional
danger_signal Out Bool True when min τ < threshold AND not turning
tac_grid Out Any 5×5 float array of per-cell τ values
flow_visualization Out Image Debug overlay: flow vectors + grid

When angular_velocity is connected and |ω| > omega_max (default 0.3 rad/s),
danger_signal is forced False. If unused, _last_omega stays 0 and the gate
is transparent — backward compatible.

Evaluation Results

Latency: ~2.5 ms/frame

τ Correlation with LiDAR TTC (82 odom-gated frames: v_fwd > 0.15 m/s, |ω| < 0.15 rad/s)

Pearson r Spearman r RMSE
+0.144 +0.173 3.31

Correct sign — closer obstacle → smaller τ — but magnitude is noisy; use τ as a
ranked urgency signal, not a precise timer.

Binary Detection (GT: lidar_dist < 1.5 m, GT positive rate: 84%)

τ threshold Precision Recall F1
3.0 (default) 0.939 0.295 0.449
5.8 (max F1) 0.918 0.533 0.675

At τ=3.0, 93.9% of alarms are confirmed by LiDAR. Tune tau_threshold per use-case.

Live visualization. scripts/visualize_optical_flow.py replays
unitree_office_walk through the same backend and streams annotated frames,
the tau grid, and a min_tau time-series into a Rerun viewer — scrub any
frame to see which cell fired and why.

Known Limitations

Limitation Consequence Mitigation
Stationary obstacles produce no flow Blind to static walls at constant distance LiDAR handles static proximity
τ value noisy (RMSE ≈ 3 s) Not a precise collision timer Ranked alarm; wire tac_grid to planner
Low-texture surfaces yield few keypoints Reduced grid coverage Edge-enhancing preprocessing

Files Changed

New — production code: dimos/perception/optical_flow/ (5 files)
New — evaluation: scripts/eval_optical_flow_gated.py, RESULTS.md
New — demo: scripts/visualize_optical_flow.py (Rerun viewer)
No existing files modified.

  Monocular Time-to-Contact detection via FAST keypoints + Lucas-Kanade flow.
  Follows dimos Config pattern; rotation-gated via optional angular_velocity
  stream. Closes dimensionalOS#1775.
  Eval: P=0.939 at τ=3.0 on unitree_office_walk. Visualization streams annotated frames +
  τ timeline to Rerun.
@leshy
Copy link
Copy Markdown
Contributor

leshy commented Apr 24, 2026

can you elaborate on why a mobile robot cannot avoid stationary obstacles with optical flow?

how can I test this? how did you test this? for example,

"How much does optical flow correlate with LiDAR?"
Weakly. Spearman r = +0.173 — correct direction but too noisy for a calibrated

can you show me how to reproduce this number?

@jhengyilin
Copy link
Copy Markdown
Author

Testing

I evaluated the module offline against LiDAR ground truth on the unitree_office_walk dataset (51s indoor sequence). The eval script is at scripts/eval_optical_flow_gated.py — it runs every frame through the optical flow backend, filters for frames where the robot is moving forward (speed > 0.15 m/s) and not turning (|ω| < 0.15 rad/s) to isolate pure translation, then uses the LiDAR distance on those frames as ground truth to test optical flow against. It computes:

  • Correlation — Spearman r = +0.173 between optical flow τ and LiDAR TTC
  • Binary detection — Precision = 0.939, Recall = 0.295 at τ=3.0 (we take Ground Truth as LiDAR distance < 1.5m)
  • Latency — ~2.5 ms/frame

To reproduce, first place the unitree_office_walk dataset folder under data/, then run from repo root:

Evaluation:

PYTHONPATH=.venv/lib/python3.12/site-packages/rerun_sdk \
    uv run python3 scripts/eval_optical_flow_gated.py

Visualization (opens a Rerun viewer where you can scrub the timeline and see flow arrows, τ grid, and DANGER/CLEAR labels frame-by-frame):

PYTHONPATH=.venv/lib/python3.12/site-packages/rerun_sdk \
    uv run python3 scripts/visualize_optical_flow.py

Results land in results/optical_flow_gated.json. Full methodology and numbers are in RESULTS.md.


Stationary obstacles and optical flow

A mobile robot can avoid stationary obstacles with optical flow — as long as two conditions are met:

  1. Relative motion exists — the robot is moving toward the obstacle. The robot's own forward motion produces an expanding flow field in the image (pixels diverge outward from the focus of expansion), which gives a finite τ = 1/divergence. The obstacle doesn't need to move on its own.

  2. The obstacle has trackable texture — FAST needs to detect keypoints on the obstacle's surface (edges, corners, texture). A featureless surface may yield too few keypoints for a reliable divergence estimate in that grid cell.

The only case where optical flow truly can't help is when there is zero relative motion between the camera and the obstacle — e.g., the robot is stationary, or moving parallel to a wall at constant distance. No pixel displacement means zero divergence and no alarm. That's the specific case where LiDAR is still needed for proximity sensing.

@leshy
Copy link
Copy Markdown
Contributor

leshy commented Apr 24, 2026

  • you need to understand all code written and all decisions made.
  • you can do visual inspection, you don't need to do fancy LIDAR corelations, should be obvious if optical flow is detecting obstacles that are visible on screen or not
  • if doing fancy LIDAR corelation - make sure it's actually correct - so visualize expected obstacles on camera image (according to lidar) small timestamp differences between odom and video, the way you take into account camera lens distortions are critical (but this is not neccessary to do at all for the first pass, just if doing this, make sure it's correct, it's better to have no data then wrong data)
  • do we need this grid at all? I think there should be nicer algo that gives you a gradiant of rate of divergence of keypoints across an image? it's ok if not easy etc, just asking as I assume there is such a thing
  • keypoint detection algo used is heavy and sparse, you want light (easy to compute) and non-sparse keypoints. idk if this is a diff setting for your existing keypoint algo, or entirely differnet algo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants