Skip to content

fix(cli): catch corrupted credentials file at import time#282

Closed
deanq wants to merge 2 commits intomainfrom
deanq/ae-2576-bug-corrupted-credentials-file-crash
Closed

fix(cli): catch corrupted credentials file at import time#282
deanq wants to merge 2 commits intomainfrom
deanq/ae-2576-bug-corrupted-credentials-file-crash

Conversation

@deanq
Copy link
Copy Markdown
Member

@deanq deanq commented Mar 20, 2026

Related to runpod/runpod-python#481

Summary

  • New thin CLI entrypoint (entrypoint.py) wraps the import of runpod_flash.cli.main in a try/except to catch TOMLDecodeError from the runpod package's import-time credential read.
  • Surfaces a clean error message suggesting flash login instead of a raw traceback.
  • Console script updated in pyproject.toml: flash = "runpod_flash.cli.entrypoint:main"

Root cause: runpod/__init__.py calls get_credentials() -> toml.load() at import time. Flash's existing handler in credentials.py never runs because the crash happens during import runpod. This is defense-in-depth alongside the root cause fix in runpod-python.

Fixes: AE-2576
Depends on: runpod/runpod-python#478 (root cause fix)
Related: AE-2485 (fixed Flash's own read path)

Test plan

  • test_normal_import_runs_app -- happy path invokes Typer app
  • test_corrupted_toml_shows_clean_error -- TOMLDecodeError gives clean stderr + exit 1
  • test_non_toml_value_error_propagates -- unrelated ValueError still raises
  • All 2538 existing tests pass, 85.83% coverage

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a defensive CLI entrypoint wrapper so flash can fail gracefully when runpod crashes at import-time due to a corrupted ~/.runpod/config.toml, instead of showing a raw traceback.

Changes:

  • Introduce runpod_flash.cli.entrypoint:main to wrap importing the Typer app and catch TOML decode failures.
  • Update the console script target from runpod_flash.cli.main:app to the new wrapper entrypoint.
  • Add unit tests covering the happy path, TOML decode handling, and non-TOML ValueError propagation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/runpod_flash/cli/entrypoint.py New thin entrypoint that catches import-time TOML decode errors and prints a clean remediation message.
pyproject.toml Points flash console script to the new entrypoint wrapper.
tests/unit/cli/test_entrypoint.py New tests validating wrapper behavior for success, TOML decode, and unrelated errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Approach — correct, live-verified

The root cause (runpod/__init__.py calling toml.load() at import time before any Flash error handling can run) requires a wrapper outside the normal import chain. The thin entrypoint that catches during from runpod_flash.cli.main import app is the right level to intercept this. The PR description correctly frames this as defense-in-depth alongside the runpod-python root cause fix.

Live-verified on PR branch: corrupted config.toml produces a clean error and exits 1. Normal operation unaffected.

2. Bug: parse error location dropped — user can't identify or fix the corruption

Error: ~/.runpod/config.toml is corrupted and cannot be parsed.
Run 'flash login' to re-authenticate, or delete the file and retry.

The TOMLDecodeError includes the parse location (e.g. "Expected '=' after a key in a key/value pair (at line 1, column 6)"). The current message drops it. A user whose config is partially corrupted (e.g. a single bad line after a manual edit) can't identify or repair the file — they're told to delete it or re-authenticate regardless. Including str(exc) would help:

print(
    f"Error: ~/.runpod/config.toml is corrupted and cannot be parsed: {exc}\n"
    "Run 'flash login' to re-authenticate, or delete the file and retry.",
    file=sys.stderr,
)

3. Question: detection logic — implicit tomllib coverage

The check is:

exc_module.startswith("toml")
or exc_module.startswith("tomli")
or exc_module.startswith("tomllib")

"tomllib".startswith("toml") and "tomli".startswith("toml") are both True, so the last two branches are unreachable — startswith("toml") covers all three. Either collapse to one check, or keep all three explicitly with a comment.

Nits

  • raise SystemExit(1) from None suppresses the exception chain entirely. If a user runs flash with --debug or in a context where tracebacks are useful, the chain is gone. Acceptable for a user-facing CLI, but worth noting.
  • No test for ImportError (missing runpod package) — correctly propagates uncaught, but a comment in the source noting which error classes are intentionally not caught would help future readers.

Verdict: PASS WITH NITS

The fix works. Item 2 is the main nit worth addressing before merge — the parse location in the error message is the difference between "delete everything and re-login" and "fix line 3 of your config file".

🤖 Reviewed by Henrik's AI-Powered Bug Finder

@promptless
Copy link
Copy Markdown

promptless bot commented Mar 23, 2026

📝 Documentation updates detected!

Updated existing suggestion: Add troubleshooting for corrupted credentials file


Tip: Tell your friends working on non-commercial open-source projects to apply for free Promptless access at promptless.ai/oss ❤️

Copy link
Copy Markdown
Contributor

@KAJdev KAJdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like smell to me, this should be caught in the credential loader function rather than as a shim like this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not implement this at the loader stage rather than having a shim that uses odd introspection to catch errors

deanq added 2 commits March 23, 2026 14:31
The runpod package reads ~/.runpod/config.toml at import time before
Flash error handling runs. Wrap the CLI entrypoint import in a
try/except to surface a clean error instead of a raw TOMLDecodeError
traceback.
Address review feedback:
- Match on class name (TOMLDecodeError) and module prefix instead of
  string-matching exception messages to avoid false positives
- Update comments to reference all TOML library variants
@deanq deanq force-pushed the deanq/ae-2576-bug-corrupted-credentials-file-crash branch from f2d533c to ebb80fc Compare March 23, 2026 21:32
Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta review — since 2026-03-21T01:11:50Z

Addressed

Finding #3 (Question — redundant startswith / false positive risk) — The "tighten" commit adds exc_type.__name__ == "TOMLDecodeError" as a required condition alongside the module check. This closes the false positive risk: a plain ValueError("Invalid TOML") from unrelated code no longer matches. The redundant startswith("tomli") / startswith("tomllib") branches are still unreachable when startswith("toml") is present, but this is cosmetic now that the class name check does the real guard.


Still open

Finding #2 — parse error location dropped

The error message still discards str(exc). A user who manually edited config.toml and introduced a syntax error on one line is told to delete the file and re-login, when the TOML library would have told them exactly where the problem is. One-line fix:

print(
    f"Error: ~/.runpod/config.toml is corrupted and cannot be parsed: {exc}\n"
    "Run 'flash login' to re-authenticate, or delete the file and retry.",
    file=sys.stderr,
)

The test already uses "Invalid value at line 1 col 9" as the exception message — adding an assertion that str(exc) appears in captured.err would lock this in.


Nits (unchanged)

  • raise SystemExit(1) from None suppresses the chain entirely — acceptable for a user-facing CLI but means --debug or crash reporters see nothing.
  • No test for ImportError (missing runpod package). Correctly propagates uncaught; worth a source comment noting the intentional omission.

Verdict

PASS WITH NITS — tightened detection is good. Finding #2 (parse location) is the only remaining nit worth addressing before merge.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

@deanq
Copy link
Copy Markdown
Member Author

deanq commented Mar 24, 2026

Closing — the root cause is fixed at the credential loader in runpod-python#481, which handles corrupted config.toml in get_credentials/set_credentials directly. The entrypoint shim approach is no longer needed.

@deanq deanq closed this Mar 24, 2026
@deanq deanq deleted the deanq/ae-2576-bug-corrupted-credentials-file-crash branch March 24, 2026 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants