fix(timeout): guard signal.alarm() against None; centralize auth; fix security issues#111
Closed
srinivamd wants to merge 5 commits intoROCm:coketaste/v2-review-fixfrom
Closed
Conversation
When --timeout 0 is passed, run.py converts it to None as a sentinel for "no timeout". Timeout.__enter__ passed self.seconds directly to signal.alarm() which requires an int, causing TypeError. Add early-return guards in __enter__ and __exit__ so None (and 0) are treated as "no timeout" without calling signal.alarm(). Also remove unused `import typing`.
…as; rename TimeoutError->DeploymentTimeoutError; remove unused imports - Copilot review comment ROCm#1: ConnectionError rename is a breaking API change; add `ConnectionError = NetworkError` deprecated alias so existing callers still work for one release cycle - Remove unused `import traceback` and `from rich.table import Table` - Rename TimeoutError -> DeploymentTimeoutError to avoid shadowing Python built-in and avoid confusion with madengine.core.timeout - Remove RuntimeError = ExecutionError alias (shadowed Python built-in)
…e() for shell args - Copilot review comment ROCm#3: Docker --filter name=^/...$ treats the value as a Go regex; shlex.quote() only provides shell quoting, not regex safety. Use re.escape() on the container name for the filter value, shlex.quote() for shell-level quoting of stop/rm args. - Fix envVars: wrap values with shlex.quote(str(...)) to prevent shell injection from env var values containing spaces or special chars - Fix docker exec: wrap command with shlex.quote() instead of embedding it inside a double-quoted string (avoids escaping issues) - Add `import re` and `import shlex`
Extracts duplicated credential loading and Docker registry login logic from BuildOrchestrator and RunOrchestrator into a shared core/auth.py module (~120 lines of duplicated login logic removed). Security improvements over PR ROCm#106: - Registry password passed via MAD_REGISTRY_PASSWORD env var instead of argv (avoids password leaking in process list / shell history) - docker ps uses --filter name=^/$ with re.escape() to avoid regex metachar false positives (see docker.py fix) - build-arg values wrapped with str() before shlex.quote() Copilot review comment ROCm#2 fix: login_to_registry typed as Optional[str] (was str) since callers pass None for DockerHub and tests call with None.
…play - Copilot review comment ROCm#4: timeout option declared as int but assigned None after conversion; add elif timeout == 0: timeout = None sentinel so "no timeout" is expressed as None (not a magic int) - Fix validation message to mention 0 is valid (--timeout 0 = no timeout) - Fix 2 panel displays: show "Disabled" when timeout is None instead of "None s" which was confusing to users
Collaborator
|
We can go with #108 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes ROCM-23419: MAD-internal models fail with
TypeError: 'NoneType' object cannot be interpreted as an integerinTimeout.__enter__when--timeout 0is passed.This PR incorporates all changes from PR #108 (
refactor(v2-review)) and additionally resolves all 4 unresolved Copilot review comments from that PR before merging tomain.Root Cause
PR #106 (merged 2026-04-20) introduced
Noneas a sentinel for "no timeout" incli/commands/run.py:But
core/timeout.pypassedself.secondsdirectly tosignal.alarm(), which requires a non-negative integer — causing an immediateTypeErrorfor any model using--timeout 0or the no-timeout default.PR #107 reverted PR #106 as a hotfix. This PR re-introduces the auth centralization from PR #106 with the timeout guard and all review comments addressed.
Changes
core/timeout.py— ROCM-23419 core fixAdded early-return guards in
__enter__and__exit__forNone(and0), preventingsignal.alarm()from receiving a non-integer:core/errors.py— Copilot comment #1: avoid breaking API changeRenamed
ConnectionError→NetworkErrorbut preserved a deprecated backward-compatibility alias so downstream callers are not broken in this release:Also renamed
TimeoutError→DeploymentTimeoutErrorto avoid shadowing the Python built-in.core/auth.py— Copilot comment #2: correctOptional[str]typing (new file)New centralized credential and registry-login module extracted from 4 duplicated locations.
login_to_registrynow correctly typed asregistry: Optional[str](wasstrin PR #108, but call sites passNonefor unauthenticated pulls). Registry password passed viaMAD_REGISTRY_PASSWORDenv var instead of argv to avoid credential exposure inpsoutput.core/docker.py— Copilot comment #3:re.escape()for Docker--filter name=Docker's
--filter name=interprets the value as a Go regex — container names with dots or other metacharacters produced false positives. Addedre.escape()for the regex portion (separate fromshlex.quote()which only escapes for shell argv):Also applied
shlex.quote()consistently fordocker stop/rmargs and env var values.cli/commands/run.py— Copilot comment #4:Optional[int]+ display "Disabled"timeoutoption now converts toNonefor no-timeout (documented inline)Disabledinstead ofNone s:f"Timeout: [yellow]{f'{timeout}s' if timeout is not None else 'Disabled'}[/yellow]"0is validTesting
Related
Nonesentinel without guardmain