Skip to content

Fix/srun container flags bare metal#3

Open
v-shobhit wants to merge 2 commits intoNVIDIA:mainfrom
v-shobhit:fix/srun-container-flags-bare-metal
Open

Fix/srun container flags bare metal#3
v-shobhit wants to merge 2 commits intoNVIDIA:mainfrom
v-shobhit:fix/srun-container-flags-bare-metal

Conversation

@v-shobhit
Copy link
Copy Markdown

fix(srun): only emit Pyxis container flags when a container is in use

--no-container-mount-home and --container-writable were unconditionally
added to every srun command regardless of whether container_image or
container_name was set. This caused bare-metal operators (e.g. a plain
host_runner with no container) to get spurious Pyxis flags that
activated a container namespace without an image, restricting the
process environment and triggering ulimit -u failures in Go runtimes
(errno=11 from pthread_create).

Gate all Pyxis flags on _using_container (container_image or
container_name is set). Bare-metal srun steps are now unaffected.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

v-shobhit and others added 2 commits April 9, 2026 11:59
When a Jinja2 conditional in extra_args evaluates to an empty string
(e.g. `${{ '--qos=' + variables.SLURM_QOS if variables.SLURM_QOS else '' }}`),
the resolved empty string was previously passed verbatim to salloc,
causing it to fail with an unrecognised argument.

Strip and skip empty-string items after resolution so that optional
salloc flags can be expressed as conditional template expressions.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
--no-container-mount-home and --container-writable were unconditionally
added to every srun command regardless of whether container_image or
container_name was set. This caused bare-metal operators (e.g. a plain
host_runner with no container) to get spurious Pyxis flags that
activated a container namespace without an image, restricting the
process environment and triggering ulimit -u failures in Go runtimes
(errno=11 from pthread_create).

Gate all Pyxis flags on _using_container (container_image or
container_name is set). Bare-metal srun steps are now unaffected.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant