dimensionalOS · jhengyilin · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/RESULTS.md b/RESULTS.md
@@ -0,0 +1,146 @@
+# Optical Flow Obstacle Avoidance — Results
+
+Evaluation of optical flow τ-estimation against LiDAR ground truth on
+`unitree_office_walk`. Full reference: theory, methodology, results, limitations.
+
+---
+
+## Theory
+
+**Physical basis (Lee 1976):** when a camera moves toward a surface, pixels
+expand radially outward from the Focus of Expansion. The rate of expansion —
+divergence — is inversely proportional to Time-to-Contact (continuous form):
+`τ = 1 / divergence`.
+
+**Discrete implementation (this pipeline):** per-cell divergence is estimated
+from tracked flow as `mean(flow_x)/cell_w + mean(flow_y)/cell_h`, and
+`τ_cell = 1 / div`. Flow is in pixels-per-frame with no dt normalization, so
+τ is not literally seconds — it is proportional to physical TTC up to a
+frame-rate × grid-geometry factor. Scale-free in the sense that closer/faster
+approach → smaller τ at any speed or object size; thresholds are tuned
+empirically rather than derived from `distance/speed`.
+
+**What breaks it:** rotation produces global image shift indistinguishable
+from expansion, so the formula is only valid during pure forward translation.
+
+**Algorithm steps:**
+
+1. **Keypoint detection — FAST (Rosten 2006):** scans for pixels with a
+   surrounding ring of 16 pixels all brighter or all darker. Up to 300 per
+   frame, refreshed every 10 frames. ORB-SLAM uses the same FAST corners for
+   map-point extraction; it adds an orientation estimate (intensity centroid)
+   for BRIEF descriptor matching — irrelevant here, since LK tracks image
+   patches and has no orientation input. Plain FAST is the correct choice.
+
+2. **Lucas-Kanade flow:** for each keypoint, LK minimizes sum-of-squared
+   intensity differences in a 21×21 window (3-level pyramid) to locate where
+   the patch moved in the next frame.
+
+3. **Grid divergence:** image divided into a 5×5 cell grid. Per cell:
+   `τ = 1/div` if `div > 0` (approaching); `NaN` if `div ≤ 0` (static or
+   receding). Danger fires if `min(τ) < tau_threshold`.
+
+4. **Threshold calibration:** τ is a proxy for Time-to-Contact — smaller τ
+   means a closer/faster-approaching obstacle — but the numeric value depends
+   on frame rate and cell geometry, so it is not literally seconds. The
+   default `tau_threshold = 3.0` was chosen from the PR sweep below: it lies
+   on the high-precision shoulder of the curve (P=0.939 at τ=3.0 on this
+   dataset). Tune per-deployment from the PR curve, not from a physical
+   distance/speed formula.
+
+5. **Rotation gate (live module):** if `angular_velocity` stream is connected
+   and `|ω| > omega_max` (default 0.3 rad/s), `danger_signal` is suppressed.
+   If unused, `_last_omega` stays 0 — gate is transparent.
+
+---
+
+## Experiment Setup
+
+### Dataset
+
+`unitree_office_walk` — 51 s indoor sequence, Go2 drives repeatedly toward
+walls and re-orients (96% forward-translation motion). τ-estimation is only
+valid during forward translation, making this the appropriate dataset for
+evaluation.
+
+### Odom Gate
+
+| Condition | Value | Rationale |
+|-----------|-------|-----------|
+| v\_fwd > 0.15 m/s | forward speed | robot must be translating toward something |
+| \|ω\_yaw\| < 0.15 rad/s | yaw rate | eliminates rotation-contaminated frames |
+
+**125 gated frames out of 390 total (33%).** Forward speed on gated frames:
+p25=0.36 · p50=0.45 · p75=0.56 m/s.
+
+**Ground truth:** LiDAR minimum forward distance in body frame (world frame
+transformed using per-frame odom quaternion). Binary GT: `dist < 1.5 m`.
+
+---
+
+## Results
+
+**Latency:** ~2.5 ms/frame — < 4% of 77 ms frame budget at 13 Hz.
+
+### Correlation — Does τ Track Real TTC?
+
+`lidar_ttc = min_fwd_dist / v_fwd` compared to optical flow `min_tau`.
+
+| Metric | Value |
+|--------|-------|
+| Paired frames | 82 |
+| Pearson r | +0.144 |
+| Spearman r | +0.173 |
+| RMSE | 3.31 s |
+| MAE | 2.52 s |
+
+Weak positive correlation — correct sign (smaller τ → closer obstacle), noisy
+magnitude. Per-cell divergence mixes approach signal with parallax and tracker
+drift. Use τ as a **ranked urgency signal**, not a calibrated timer.
+
+### Binary Detection
+
+GT positive rate: 84% (indoor walls are always nearby on forward frames).
+
+| τ threshold | Purpose | Precision | Recall | F1 |
+|-------------|---------|-----------|--------|----|
+| **3.0 s (default)** | **Production** | **0.939** | **0.295** | **0.449** |
+| 5.8 s | Max recall | 0.918 | 0.533 | 0.675 |
+
+At τ=3.0, 93.9% of alarms are confirmed by LiDAR. Recall of ~30% reflects
+the thresholded nature of the signal — many GT-close frames produce τ values
+just above the decision boundary (noisy divergence) and are missed. Raising
+the threshold to 5.8 lifts recall to 0.53 at a small precision cost.
+Precision stays above 0.90 for all τ ≥ 1.5 across the operating range
+(AUC-PR = 0.479; below the 0.84 baseline because 84% of gated frames are
+GT-positive — precision at each recall level is the meaningful metric).
+
+---
+
+## Known Limitations
+
+| Limitation | Consequence | Mitigation |
+|------------|-------------|------------|
+| Stationary obstacles produce no flow | Blind to static walls at constant distance | LiDAR for static proximity |
+| τ noisy (RMSE ≈ 3 s) | Not a precise collision timer | Ranked alarm; wire `tac_grid` to planner |
+| Rotation contaminates signal | Global image shift = false divergence | Odom gate in eval; `angular_velocity` gate in module |
+| Low-texture surfaces | Few keypoints → sparse grid coverage | Edge-enhancing preprocessing |
+
+**Deployment guidance:** keep the default `tau_threshold = 3.0` for high
+precision, or raise toward 5–6 for higher recall; retune from the PR sweep
+on your own data rather than from a physical distance/speed formula. Wire
+`angular_velocity` from IMU for automatic turn suppression. Use as supplement
+to LiDAR — not a replacement.
+
+---
+
+## Verdict
+
+Optical flow τ-estimation is a viable **complement** to LiDAR — not a standalone
+replacement. When the alarm fires it is correct 93.9% of the time (P=0.939 at
+τ=3.0), but it only catches ~30% of danger frames (R=0.295). Low recall is
+inherent: stationary obstacles produce no flow, and noisy divergence near the
+threshold causes missed detections close to the decision boundary. Deploy
+alongside LiDAR: LiDAR for static proximity, optical flow for fast-approaching
+and dynamic obstacles. Real-time at ~2.5 ms/frame (< 4% of frame budget),
+with a single empirically tuned parameter (`tau_threshold`).
diff --git a/dimos/perception/optical_flow/__init__.py b/dimos/perception/optical_flow/__init__.py
diff --git a/dimos/perception/optical_flow/backends/__init__.py b/dimos/perception/optical_flow/backends/__init__.py
diff --git a/dimos/perception/optical_flow/backends/lucas_kanade.py b/dimos/perception/optical_flow/backends/lucas_kanade.py
@@ -0,0 +1,107 @@
+import cv2
+import numpy as np
+from typing import Optional
+
+from dimos.perception.optical_flow.tac_estimator import divergence_to_tac
+
+
+class LucasKanadeBackend:
+    """
+    FAST keypoints + Lucas-Kanade sparse optical flow.
+    Computes a Time-to-Contact proxy (tau) via flow divergence in an NxN grid.
+    Based on Lee (1976): a closer/faster-approaching obstacle produces larger
+    image-expansion and thus smaller tau. Scale-free; no depth calibration.
+
+    Note: the tau value is monotonically related to physical TTC but is not
+    literally seconds — it depends on frame rate and cell geometry. Use
+    tau_threshold as an empirically tuned urgency parameter. The default 3.0
+    was chosen from the PR-curve sweep on unitree_office_walk (P=0.939 at the
+    production operating point).
+    """
+
+    def __init__(self, max_keypoints=300, grid_size=5,
+                 tau_threshold=3.0, refresh_interval=10,
+                 fast_threshold=20):
+        self.max_keypoints    = max_keypoints
+        self.grid_size        = grid_size
+        self.tau_threshold    = tau_threshold
+        self.refresh_interval = refresh_interval
+        self.lk_params = dict(
+            winSize=(21, 21),
+            maxLevel=3,
+            criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03),
+        )
+        self.fast = cv2.FastFeatureDetector_create(
+            threshold=fast_threshold, nonmaxSuppression=True
+        )
+        self.prev_gray:   Optional[np.ndarray] = None
+        self.prev_pts:    Optional[np.ndarray] = None
+        self.frame_count: int = 0
+
+    def compute(self, frame_bgr: np.ndarray) -> Optional[dict]:  # type: ignore[type-arg]
+        """
+        Args:
+            frame_bgr: HxWx3 uint8 BGR numpy array
+        Returns:
+            dict: {tac_grid, danger, flow_pts, min_tau}
+            None on first frame.
+        """
+        gray = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2GRAY)
+
+        if self.prev_gray is None or self.prev_pts is None or len(self.prev_pts) == 0:
+            self.prev_gray = gray
+            self._refresh(gray)
+            return None
+
+        curr_pts, status, _ = cv2.calcOpticalFlowPyrLK(
+            self.prev_gray, gray, self.prev_pts, None, **self.lk_params
+        )
+
+        good_mask = status.ravel() == 1
+        good_prev = self.prev_pts[good_mask].reshape(-1, 2)
+        good_curr = curr_pts[good_mask].reshape(-1, 2)
+        flow_vecs = good_curr - good_prev
+
+        self.frame_count += 1
+        if self.frame_count % self.refresh_interval == 0 or len(good_prev) < 50:
+            self._refresh(gray)
+        else:
+            self.prev_pts = good_curr.reshape(-1, 1, 2)
+        self.prev_gray = gray
+
+        if len(good_prev) < 10:
+            return None
+
+        h, w     = gray.shape
+        tac_grid = self._tac_grid(good_prev, flow_vecs, h, w)
+        min_tau  = float(np.nanmin(tac_grid)) if not np.all(np.isnan(tac_grid)) else np.inf
+
+        return {
+            "tac_grid": tac_grid,
+            "danger":   min_tau < self.tau_threshold,
+            "flow_pts": (good_prev, good_curr),
+            "min_tau":  min_tau,
+        }
+
+    def _refresh(self, gray: np.ndarray) -> None:
+        kps = sorted(self.fast.detect(gray, None),
+                     key=lambda k: k.response, reverse=True)[:self.max_keypoints]
+        self.prev_pts = (
+            np.array([[k.pt] for k in kps], dtype=np.float32) if kps else None
+        )
+
+    def _tac_grid(self, pts: np.ndarray, flows: np.ndarray, h: int, w: int) -> np.ndarray:
+        div_grid = np.full((self.grid_size, self.grid_size), np.nan, dtype=np.float32)
+        cell_h   = h / self.grid_size
+        cell_w   = w / self.grid_size
+        for i in range(self.grid_size):
+            for j in range(self.grid_size):
+                mask = (
+                    (pts[:, 0] >= j * cell_w) & (pts[:, 0] < (j+1) * cell_w) &
+                    (pts[:, 1] >= i * cell_h) & (pts[:, 1] < (i+1) * cell_h)
+                )
+                cf = flows[mask]
+                if len(cf) < 3:
+                    continue
+                div_grid[i, j] = np.mean(cf[:, 0]) / cell_w + np.mean(cf[:, 1]) / cell_h
+        return divergence_to_tac(div_grid)