Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Optical Flow Obstacle Avoidance — Results

Evaluation of optical flow τ-estimation against LiDAR ground truth on
`unitree_office_walk`. Full reference: theory, methodology, results, limitations.

---

## Theory

**Physical basis (Lee 1976):** when a camera moves toward a surface, pixels
expand radially outward from the Focus of Expansion. The rate of expansion —
divergence — is inversely proportional to Time-to-Contact (continuous form):
`τ = 1 / divergence`.

**Discrete implementation (this pipeline):** per-cell divergence is estimated
from tracked flow as `mean(flow_x)/cell_w + mean(flow_y)/cell_h`, and
`τ_cell = 1 / div`. Flow is in pixels-per-frame with no dt normalization, so
τ is not literally seconds — it is proportional to physical TTC up to a
frame-rate × grid-geometry factor. Scale-free in the sense that closer/faster
approach → smaller τ at any speed or object size; thresholds are tuned
empirically rather than derived from `distance/speed`.

**What breaks it:** rotation produces global image shift indistinguishable
from expansion, so the formula is only valid during pure forward translation.

**Algorithm steps:**

1. **Keypoint detection — FAST (Rosten 2006):** scans for pixels with a
surrounding ring of 16 pixels all brighter or all darker. Up to 300 per
frame, refreshed every 10 frames. ORB-SLAM uses the same FAST corners for
map-point extraction; it adds an orientation estimate (intensity centroid)
for BRIEF descriptor matching — irrelevant here, since LK tracks image
patches and has no orientation input. Plain FAST is the correct choice.

2. **Lucas-Kanade flow:** for each keypoint, LK minimizes sum-of-squared
intensity differences in a 21×21 window (3-level pyramid) to locate where
the patch moved in the next frame.

3. **Grid divergence:** image divided into a 5×5 cell grid. Per cell:
`τ = 1/div` if `div > 0` (approaching); `NaN` if `div ≤ 0` (static or
receding). Danger fires if `min(τ) < tau_threshold`.

4. **Threshold calibration:** τ is a proxy for Time-to-Contact — smaller τ
means a closer/faster-approaching obstacle — but the numeric value depends
on frame rate and cell geometry, so it is not literally seconds. The
default `tau_threshold = 3.0` was chosen from the PR sweep below: it lies
on the high-precision shoulder of the curve (P=0.939 at τ=3.0 on this
dataset). Tune per-deployment from the PR curve, not from a physical
distance/speed formula.

5. **Rotation gate (live module):** if `angular_velocity` stream is connected
and `|ω| > omega_max` (default 0.3 rad/s), `danger_signal` is suppressed.
If unused, `_last_omega` stays 0 — gate is transparent.

---

## Experiment Setup

### Dataset

`unitree_office_walk` — 51 s indoor sequence, Go2 drives repeatedly toward
walls and re-orients (96% forward-translation motion). τ-estimation is only
valid during forward translation, making this the appropriate dataset for
evaluation.

### Odom Gate

| Condition | Value | Rationale |
|-----------|-------|-----------|
| v\_fwd > 0.15 m/s | forward speed | robot must be translating toward something |
| \|ω\_yaw\| < 0.15 rad/s | yaw rate | eliminates rotation-contaminated frames |

**125 gated frames out of 390 total (33%).** Forward speed on gated frames:
p25=0.36 · p50=0.45 · p75=0.56 m/s.

**Ground truth:** LiDAR minimum forward distance in body frame (world frame
transformed using per-frame odom quaternion). Binary GT: `dist < 1.5 m`.

---

## Results

**Latency:** ~2.5 ms/frame — < 4% of 77 ms frame budget at 13 Hz.

### Correlation — Does τ Track Real TTC?

`lidar_ttc = min_fwd_dist / v_fwd` compared to optical flow `min_tau`.

| Metric | Value |
|--------|-------|
| Paired frames | 82 |
| Pearson r | +0.144 |
| Spearman r | +0.173 |
| RMSE | 3.31 s |
| MAE | 2.52 s |

Weak positive correlation — correct sign (smaller τ → closer obstacle), noisy
magnitude. Per-cell divergence mixes approach signal with parallax and tracker
drift. Use τ as a **ranked urgency signal**, not a calibrated timer.

### Binary Detection

GT positive rate: 84% (indoor walls are always nearby on forward frames).

| τ threshold | Purpose | Precision | Recall | F1 |
|-------------|---------|-----------|--------|----|
| **3.0 s (default)** | **Production** | **0.939** | **0.295** | **0.449** |
| 5.8 s | Max recall | 0.918 | 0.533 | 0.675 |

At τ=3.0, 93.9% of alarms are confirmed by LiDAR. Recall of ~30% reflects
the thresholded nature of the signal — many GT-close frames produce τ values
just above the decision boundary (noisy divergence) and are missed. Raising
the threshold to 5.8 lifts recall to 0.53 at a small precision cost.
Precision stays above 0.90 for all τ ≥ 1.5 across the operating range
(AUC-PR = 0.479; below the 0.84 baseline because 84% of gated frames are
GT-positive — precision at each recall level is the meaningful metric).

---

## Known Limitations

| Limitation | Consequence | Mitigation |
|------------|-------------|------------|
| Stationary obstacles produce no flow | Blind to static walls at constant distance | LiDAR for static proximity |
| τ noisy (RMSE ≈ 3 s) | Not a precise collision timer | Ranked alarm; wire `tac_grid` to planner |
| Rotation contaminates signal | Global image shift = false divergence | Odom gate in eval; `angular_velocity` gate in module |
| Low-texture surfaces | Few keypoints → sparse grid coverage | Edge-enhancing preprocessing |

**Deployment guidance:** keep the default `tau_threshold = 3.0` for high
precision, or raise toward 5–6 for higher recall; retune from the PR sweep
on your own data rather than from a physical distance/speed formula. Wire
`angular_velocity` from IMU for automatic turn suppression. Use as supplement
to LiDAR — not a replacement.

---

## Verdict

Optical flow τ-estimation is a viable **complement** to LiDAR — not a standalone
replacement. When the alarm fires it is correct 93.9% of the time (P=0.939 at
τ=3.0), but it only catches ~30% of danger frames (R=0.295). Low recall is
inherent: stationary obstacles produce no flow, and noisy divergence near the
threshold causes missed detections close to the decision boundary. Deploy
alongside LiDAR: LiDAR for static proximity, optical flow for fast-approaching
and dynamic obstacles. Real-time at ~2.5 ms/frame (< 4% of frame budget),
with a single empirically tuned parameter (`tau_threshold`).
Empty file.
Empty file.
107 changes: 107 additions & 0 deletions dimos/perception/optical_flow/backends/lucas_kanade.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import cv2
import numpy as np
from typing import Optional

from dimos.perception.optical_flow.tac_estimator import divergence_to_tac


class LucasKanadeBackend:
"""
FAST keypoints + Lucas-Kanade sparse optical flow.
Computes a Time-to-Contact proxy (tau) via flow divergence in an NxN grid.
Based on Lee (1976): a closer/faster-approaching obstacle produces larger
image-expansion and thus smaller tau. Scale-free; no depth calibration.

Note: the tau value is monotonically related to physical TTC but is not
literally seconds — it depends on frame rate and cell geometry. Use
tau_threshold as an empirically tuned urgency parameter. The default 3.0
was chosen from the PR-curve sweep on unitree_office_walk (P=0.939 at the
production operating point).
"""

def __init__(self, max_keypoints=300, grid_size=5,
tau_threshold=3.0, refresh_interval=10,
fast_threshold=20):
self.max_keypoints = max_keypoints
self.grid_size = grid_size
self.tau_threshold = tau_threshold
self.refresh_interval = refresh_interval
self.lk_params = dict(
winSize=(21, 21),
maxLevel=3,
criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03),
)
self.fast = cv2.FastFeatureDetector_create(
threshold=fast_threshold, nonmaxSuppression=True
)
self.prev_gray: Optional[np.ndarray] = None
self.prev_pts: Optional[np.ndarray] = None
self.frame_count: int = 0

def compute(self, frame_bgr: np.ndarray) -> Optional[dict]: # type: ignore[type-arg]
"""
Args:
frame_bgr: HxWx3 uint8 BGR numpy array
Returns:
dict: {tac_grid, danger, flow_pts, min_tau}
None on first frame.
"""
gray = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2GRAY)

if self.prev_gray is None or self.prev_pts is None or len(self.prev_pts) == 0:
self.prev_gray = gray
self._refresh(gray)
return None

curr_pts, status, _ = cv2.calcOpticalFlowPyrLK(
self.prev_gray, gray, self.prev_pts, None, **self.lk_params
)

good_mask = status.ravel() == 1
good_prev = self.prev_pts[good_mask].reshape(-1, 2)
good_curr = curr_pts[good_mask].reshape(-1, 2)
flow_vecs = good_curr - good_prev

self.frame_count += 1
if self.frame_count % self.refresh_interval == 0 or len(good_prev) < 50:
self._refresh(gray)
else:
self.prev_pts = good_curr.reshape(-1, 1, 2)
self.prev_gray = gray

if len(good_prev) < 10:
return None

h, w = gray.shape
tac_grid = self._tac_grid(good_prev, flow_vecs, h, w)
min_tau = float(np.nanmin(tac_grid)) if not np.all(np.isnan(tac_grid)) else np.inf

return {
"tac_grid": tac_grid,
"danger": min_tau < self.tau_threshold,
"flow_pts": (good_prev, good_curr),
"min_tau": min_tau,
}

def _refresh(self, gray: np.ndarray) -> None:
kps = sorted(self.fast.detect(gray, None),
key=lambda k: k.response, reverse=True)[:self.max_keypoints]
self.prev_pts = (
np.array([[k.pt] for k in kps], dtype=np.float32) if kps else None
)

def _tac_grid(self, pts: np.ndarray, flows: np.ndarray, h: int, w: int) -> np.ndarray:
div_grid = np.full((self.grid_size, self.grid_size), np.nan, dtype=np.float32)
cell_h = h / self.grid_size
cell_w = w / self.grid_size
for i in range(self.grid_size):
for j in range(self.grid_size):
mask = (
(pts[:, 0] >= j * cell_w) & (pts[:, 0] < (j+1) * cell_w) &
(pts[:, 1] >= i * cell_h) & (pts[:, 1] < (i+1) * cell_h)
)
cf = flows[mask]
if len(cf) < 3:
continue
div_grid[i, j] = np.mean(cf[:, 0]) / cell_w + np.mean(cf[:, 1]) / cell_h
return divergence_to_tac(div_grid)
Loading