Skip to content

Question: Is current uv.lock combination expected to work for compute_norm_stats_sim.py? #3

@zhang-runhao

Description

@zhang-runhao

Summary

I want to confirm whether the current version combination in this repository is expected to work as-is.

From my run, it looks like there may be an API mismatch:

  • uv.lock pins lerobot to git rev 0cf864870cf29f4738d3ade893e6fd13fbd7cdb5 (version 0.1.0)
  • src/openpi/training/mixture_dataset.py calls LeRobotDataset(..., load_video=False)

In pinned lerobot revision 0cf864..., LeRobotDataset.__init__ does not appear to accept load_video (it uses download_videos instead).

The norm-stats script fails at dataset construction with:

TypeError: LeRobotDataset.__init__() got an unexpected keyword argument 'load_video'

Also, scripts/compute_norm_stats_sim.py currently uses bare except: and prints only path, which hides traceback unless locally changed.

Evidence

From this repo (policy/openpi-InternData-A1/uv.lock):

  • name = "lerobot" at line ~2002
  • git source contains rev 0cf864870cf29f4738d3ade893e6fd13fbd7cdb5
  • name = "datasets" version 3.6.0 at line ~603
  • name = "pyarrow" version 20.0.0 at line ~3638

From this repo (policy/openpi-InternData-A1/src/openpi/training/mixture_dataset.py):

  • load_video=False is passed to LeRobotDataset (around line ~672)

From pinned upstream lerobot revision (0cf864...):

  • LeRobotDataset.__init__(..., download_videos=True, ...)
  • no load_video parameter

From this repo (policy/openpi-InternData-A1/scripts/compute_norm_stats_sim.py):

  • bare except: in two loops (around lines ~294 and ~313)

Reproduction

  1. Create/sync env from this repo lock (policy/openpi-InternData-A1/uv.lock).
  2. Run norm stats script with a valid dataset root.
  3. Observe crash at dataset construction stage.

Troubleshooting Timeline (What was tried and what happened)

  1. Initial run showed 0it and no effective processing.
  2. After correcting invocation/path usage, script started iterating but still did not expose root cause because compute_norm_stats_sim.py had bare except:.
  3. After making traceback visible locally, first concrete failure was:
TypeError: LeRobotDataset.__init__() got an unexpected keyword argument 'load_video'
  1. After bypassing that argument mismatch locally for debugging, next failure was:
TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column
  1. Then environment was aligned back toward the repository's original lock combination (datasets==3.6.0, pyarrow==20.0.0) to compare behavior, and a new schema-level error appeared:
ValueError: Feature type 'List' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', 'TranslationVariableLanguages', 'LargeList', 'Sequence', 'Array2D', 'Array3D', 'Array4D', 'Array5D', 'Audio', 'Image', 'Video', 'Pdf', 'VideoFrame']

This made it unclear whether the current dataset schema, script assumptions, and pinned dependency set are expected to be mutually compatible.

Questions

  1. Is load_video=False expected to be valid with the currently pinned lerobot revision in uv.lock?
  2. If yes, am I using the script in a wrong way, or is there an environment/version assumption not documented yet?
  3. Is the current behavior of swallowing exceptions in compute_norm_stats_sim.py intentional?
  4. For the same data, is ValueError: Feature type 'List' not found ... expected under the repository's original dependency stack, or does it indicate a known schema-version mismatch?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions