Skip to content

Adds instructions for using pre-made locomanipulation SDG dataset/model#5357

Open
jaybdub wants to merge 1 commit intoisaac-sim:developfrom
jaybdub:jwelsh/locomanipulation_sdg_huggingface_docs
Open

Adds instructions for using pre-made locomanipulation SDG dataset/model#5357
jaybdub wants to merge 1 commit intoisaac-sim:developfrom
jaybdub:jwelsh/locomanipulation_sdg_huggingface_docs

Conversation

@jaybdub
Copy link
Copy Markdown
Contributor

@jaybdub jaybdub commented Apr 22, 2026

Description

Adds two tip callouts to the Demo 3 (G1 locomanipulation) section of the humanoids imitation learning documentation, giving users shortcuts to skip
expensive pipeline steps:

  1. Pre-made dataset (nvidia/g1_locomanip_dataset on Hugging Face): placed at the start of the SDG generation section, allowing users to skip manipulation
    dataset generation, SDG generation, and LeRobot conversion, and proceed directly to finetuning. Includes huggingface-cli download + unzip commands with the
    exact extracted path (g1_simple_high_var_lerobot/) and a note that policies trained on this dataset require --policy_quat_format wxyz at rollout.
  2. Pre-trained model (nvidia/g1_locomanip_finetune on Hugging Face): placed immediately before the rollout section, allowing users to skip finetuning
    entirely. Includes download + unzip commands with the exact checkpoint path (g1_locomanip_finetune_20260129_231610/checkpoint-20000) and the required
    --policy_quat_format wxyz flag.

Fixes # (issue)

Type of change

  • Documentation update

Screenshots

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 22, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR adds two .. tip:: callouts to the G1 locomanipulation section of the humanoids imitation learning docs, offering shortcuts to skip expensive pipeline steps: one to download a pre-made LeRobot-format dataset from Hugging Face and one to download a pre-trained GR00T N1.5 checkpoint. Both tips include huggingface-cli download and unzip commands, extracted paths, and a reminder to pass --policy_quat_format wxyz at rollout time.

Confidence Score: 5/5

Documentation-only change; safe to merge with minor P2 clarity suggestions.

All findings are P2 (style/clarity) — no logic errors, no broken commands on the happy path, no security concerns. The glob unzip and hardcoded path are cosmetic robustness notes that do not block correctness for the stated use case.

No files require special attention; all changes are in the RST documentation file.

Important Files Changed

Filename Overview
docs/source/overview/imitation-learning/humanoids_imitation.rst Adds two tip callouts to the G1 locomanipulation section providing shortcuts via pre-made HuggingFace dataset and checkpoint; minor clarity and robustness issues with section-skip scope, glob-based unzip, and hardcoded version-stamped paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Generate manipulation dataset] -->|Normal path| B[SDG: Generate locomanipulation dataset]
    A -->|Tip 1: Download pre-made dataset| D
    B -->|Normal path| C[Convert dataset to LeRobot format]
    C -->|Normal path| D[Finetune GR00T N1.5 policy]
    D -->|Normal path| E[Rollout policy in Isaac Lab]
    D -->|Tip 2: Download pre-trained checkpoint| E
    style A fill:#f9f,stroke:#333
    style B fill:#f9f,stroke:#333
    style C fill:#f9f,stroke:#333
    style D fill:#bbf,stroke:#333
    style E fill:#bfb,stroke:#333
Loading

Reviews (1): Last reviewed commit: "added instructions for using pre-made lo..." | Re-trigger Greptile

Comment on lines +544 to +545
Downloading it lets you skip this section and the dataset conversion step, proceeding directly to
**Finetune the policy** below.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Tip omits skipping the preceding manipulation dataset section

The pre-made locomanipulation dataset in LeRobot format bundles everything (manipulation data, SDG, and conversion), so a user following the tip also needs to skip the manipulation dataset generation section that comes before this one. The current text only mentions skipping "this section and the dataset conversion step," which may leave readers confused about whether they still need to run the earlier manipulation generation commands.

Comment on lines +551 to +552
huggingface-cli download nvidia/g1_locomanip_dataset --repo-type dataset --local-dir ./datasets/g1_locomanip_hf
unzip ./datasets/g1_locomanip_hf/*.zip -d ./datasets/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Glob-based unzip may fail silently or unzip unexpected files

unzip ./datasets/g1_locomanip_hf/*.zip -d ./datasets/ relies on shell glob expansion. If the downloaded directory contains zero zip files, the shell will pass the literal string *.zip to unzip, resulting in an error. If it contains more than one zip file (e.g., after a future dataset update), all archives will be expanded in one command, which may not be what the user expects. Consider naming the expected zip file explicitly, or adding a note that exactly one zip file is expected.

Comment on lines +703 to +705
The archive extracts to ``./checkpoints/g1_locomanip_finetune_20260129_231610/``.
Use ``./checkpoints/g1_locomanip_finetune_20260129_231610/checkpoint-20000`` as the ``--model_path``
in the rollout command below. This checkpoint requires ``--policy_quat_format wxyz``.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded version-stamped extraction path may become stale

The extracted path g1_locomanip_finetune_20260129_231610/ and checkpoint subdirectory checkpoint-20000 are hardcoded. If the Hugging Face artifact is ever updated or re-uploaded with a different name, these instructions will silently point users to a path that no longer exists. Consider advising users to verify the extracted folder name (e.g., ls ./checkpoints/) or note that the date-stamped name corresponds to a specific release.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR adds two helpful "tip" callouts to the Demo 3 (G1 locomanipulation) section of the humanoids imitation learning documentation, providing shortcuts to download pre-made datasets and pre-trained checkpoints from Hugging Face. The documentation changes are well-structured and follow the existing formatting conventions.

Architecture Impact

No cross-module impact — changes are documentation-only in humanoids_imitation.rst.

Implementation Verdict

Minor fixes needed — The documentation content is correct and helpful, but CI is failing due to broken Hugging Face links.

Test Coverage

Not applicable — this is a documentation-only PR. The checklist correctly indicates that tests are not required for documentation changes.

CI Status

Check for Broken Links — FAILED

  • https://huggingface.co/datasets/nvidia/g1_locomanip_dataset → 401 Unauthorized
  • https://huggingface.co/nvidia/g1_locomanip_finetune → 401 Unauthorized

The Hugging Face resources return 401 (Unauthorized), indicating they are either:

  1. Not yet publicly available (gated/private)
  2. The URLs are incorrect

pre-commit — PASSED

⚠️ Build Latest Docs — FAILED (pre-existing unrelated issues with isaaclab_mimic/isaaclab_rl module imports, not caused by this PR)

Branch Status

Branch is 1 commit behind develop — minor divergence, no conflicts expected.

Findings

🔴 Critical: Broken Hugging Face Links
The two Hugging Face URLs added in this PR are returning 401 Unauthorized:

  • nvidia/g1_locomanip_dataset (dataset)
  • nvidia/g1_locomanip_finetune (model)

Action required: Either:

  1. Make the Hugging Face resources publicly accessible before merging
  2. Add these URLs to the link checker exclusion list in .github/workflows/ if they are intentionally gated resources that require authentication
  3. Verify the URLs are correct and the resources exist

🔵 Improvement: Consider adding a note about authentication
If these Hugging Face resources require authentication (e.g., accepting terms of use), consider adding a note in the tip boxes mentioning that users may need to log in via huggingface-cli login before downloading.

To create a comprehensive locomanipulation dataset that combines both manipulation and navigation capabilities, you can generate a navigation dataset using the manipulation dataset from the previous step as input.

.. tip::

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hugging Face URL https://huggingface.co/datasets/nvidia/g1_locomanip_dataset returns 401 Unauthorized. Please verify:

  1. The resource exists and the URL is correct
  2. The dataset is publicly accessible (or add to CI link checker exclusion list if intentionally gated)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed link seems to be returning 404.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming! The dataset at nvidia/g1_locomanip_dataset is currently returning 401 Unauthorized (gated/private).

This should be resolved before merge since the docs reference it as a publicly available pre-made dataset. Options:

  1. Make the HuggingFace dataset public
  2. If intentionally gated, update the docs to mention authentication requirements
  3. Add to link checker exclusion list if this is temporary

@jaybdub - can you confirm the intended access level for this dataset?

.. tip::

**Skip finetuning:** A pre-trained GR00T N1.5 checkpoint for this task is available on
Hugging Face at `nvidia/g1_locomanip_finetune <https://huggingface.co/nvidia/g1_locomanip_finetune>`__.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hugging Face URL https://huggingface.co/nvidia/g1_locomanip_finetune returns 401 Unauthorized. Please verify the model checkpoint is publicly accessible before merging.

@jaybdub jaybdub requested a review from peterd-NV April 22, 2026 19:56
@kellyguo11 kellyguo11 changed the title added instructions for using pre-made locomanipulation SDG dataset/model Adds instructions for using pre-made locomanipulation SDG dataset/model Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants