Problem Statement
On macOS, mise run gateway fails while cross-compiling the openshell-sandbox supervisor binary, with error: unable to search for static library …/libwant-*.rlib: ProcessFdQuotaExceeded. The static -unknown-linux-musl link opens ~333 .rlib files at once, which exceeds macOS's default per-process open-file limit of 256 (ulimit -n). This blocks the most common local dev path (docker driver, and the podman driver too) for any macOS contributor whose shell uses the OS default limit. The VM driver already guards against this; the docker/podman paths do not.
Technical Context
The supervisor image (deploy/docker/Dockerfile.supervisor) is FROM scratch and only COPYs a prebuilt openshell-sandbox binary — it never compiles Rust. So the binary must be cross-compiled on the host and staged into the build context first. On macOS that host build uses cargo zigbuild targeting *-unknown-linux-musl. That link step opens every dependency rlib simultaneously; with 333 rlibs and a soft limit of 256, the linker gets EMFILE, surfaced by zig as ProcessFdQuotaExceeded.
The project already knows about this failure mode: tasks/scripts/vm/build-supervisor-bundle.sh defines ensure_build_nofile_limit() which raises ulimit -n (to OPENSHELL_VM_BUILD_NOFILE_LIMIT, default 8192, min 1024) before its cargo zigbuild. But that guard exists only on the VM driver path. Every docker/podman host-staging path lacks it.
Separately, the comment at tasks/scripts/gateway-docker.sh:114-117 claims the cross-compile "happens inside Linux containers — sidestepping macOS's per-process file-descriptor cap." That is stale/incorrect and directly contradicts architecture/build.md:58-60 ("Neither Dockerfile compiles Rust — both copy a staged binary"). The compile happens on the host.
Affected Components
| Component |
Key Files |
Role |
| Host binary staging (chokepoint) |
tasks/scripts/stage-prebuilt-binaries.sh, tasks/scripts/docker-build-image.sh |
Cross-compiles openshell-sandbox/openshell-gateway on the host via cargo zigbuild and stages them into the Docker build context. Single point all docker/podman builds funnel through. |
| Docker gateway path |
tasks/scripts/gateway.sh, tasks/scripts/gateway-docker.sh |
mise run gateway entrypoint; detects driver and triggers the supervisor build. Carries the stale in-container comment. |
| Podman gateway path |
tasks/scripts/gateway.sh (ensure_podman_supervisor_image) |
Builds the supervisor sideload image via the same host-staging path on first run. |
| VM build (reference impl) |
tasks/scripts/vm/build-supervisor-bundle.sh |
Already contains the fd-limit guard — the pattern to lift into a shared helper. |
| Supervisor image |
deploy/docker/Dockerfile.supervisor |
scratch image; only COPYs the prebuilt host binary (never compiles). |
Technical Investigation
Architecture Overview
mise run gateway → gateway.sh:92-111 detect_driver() (order: in-cluster k8s → podman → docker) → dispatches to the driver path. On the docker path it execs gateway-docker.sh (gateway.sh:247). In the cross-compile branch (gateway-docker.sh:146, i.e. macOS or arch mismatch), it calls docker-build-image.sh supervisor-output (:157). docker-build-image.sh:164 runs ensure_prebuilt_binaries, which — when not in CI and PREBUILT_AUTO_STAGE != 0 (:77) — runs stage-prebuilt-binaries.sh on the host (:82). For the musl supervisor target that selects cargo zigbuild (stage-prebuilt-binaries.sh:174-175) and invokes it at :208. The resulting binary is copied into the scratch supervisor image (Dockerfile.supervisor:22,33).
The podman path (gateway.sh:269-272 → ensure_podman_supervisor_image → :180 mise run build:docker:supervisor) funnels into the identical docker-build-image.sh → stage-prebuilt-binaries.sh host build.
Code References
| Location |
Description |
tasks/scripts/gateway.sh:92-111 |
Driver detection (k8s → podman → docker) |
tasks/scripts/gateway-docker.sh:114-117 |
Stale comment claiming in-container cross-compile (incorrect) |
tasks/scripts/gateway-docker.sh:146-157 |
Cross-compile branch → docker-build-image.sh supervisor-output |
tasks/scripts/docker-build-image.sh:77,82,164 |
ensure_prebuilt_binaries; host-staging gate + call to stage-prebuilt-binaries.sh |
tasks/scripts/stage-prebuilt-binaries.sh:110-114 |
Supervisor resolves to openshell-sandbox, musl libc |
tasks/scripts/stage-prebuilt-binaries.sh:174-175,208 |
Selects cargo zigbuild for musl; the unguarded host invocation |
deploy/docker/Dockerfile.supervisor:22,33 |
FROM scratch; bare COPY of prebuilt binary |
tasks/scripts/vm/build-supervisor-bundle.sh:61-116,125 |
ensure_build_nofile_limit() definition + call — the guard to generalize |
architecture/build.md:58-60 |
Documents that Dockerfiles don't compile Rust (contradicts the code comment) |
mise.toml:59 |
Global RUSTC_WRAPPER = "sccache" (holds extra fds; compounds EMFILE) |
Current Behavior
On macOS with the default ulimit -n 256, mise run gateway (docker or podman driver) reaches the host cargo zigbuild for *-unknown-linux-musl and the link fails:
error: unable to search for static library …/libwant-*.rlib: ProcessFdQuotaExceeded
error: could not compile `openshell-sandbox` (bin "openshell-sandbox")
Reproduced deterministically: ulimit -n 256 → fails on libwant; ulimit -n 8192 (same objects, re-link only) → succeeds. The VM path (gateway:vm) does not fail because it raises the limit first.
Complete unguarded call-site inventory
Every host-side musl/zigbuild cross-compile path and whether it has an fd guard:
| Entry point |
Reaches host zigbuild via |
Guard? |
mise run gateway (docker) |
gateway-docker.sh:157 → docker-build-image.sh:82 → stage-prebuilt-binaries.sh:175 |
NO (reported bug) |
mise run gateway (podman, first run) |
gateway.sh:180 → docker-build-image.sh:82 → stage-prebuilt-binaries.sh:175 |
NO |
mise run build:docker:supervisor |
docker.toml:31 → docker-build-image.sh:82 → stage-prebuilt-binaries.sh:175 |
NO |
mise run build:docker:gateway |
docker.toml:26 → docker-build-image.sh:82 → stage-prebuilt-binaries.sh:168 (gateway, zigbuild GNU 2.28) |
NO |
mise run stage-prebuilt (… all) |
direct → stage-prebuilt-binaries.sh:168/175 |
NO |
docker-publish-multiarch.sh:47,51,99 |
→ docker-build-image.sh:82 → stage-prebuilt-binaries.sh |
NO |
mise run gateway:vm / mise run vm:supervisor |
→ vm/build-supervisor-bundle.sh:137 |
YES (:125) |
Not affected: deploy/docker/cross-build.sh (runs inside a Linux build container), CI workflows (Linux runners), and Dockerfile.*-macos (native aarch64-apple-darwin, not cross/musl).
What Would Need to Change
- New shared helper
tasks/scripts/build-env.sh exporting ensure_build_nofile_limit(), lifted from vm/build-supervisor-bundle.sh. Preserve both early-returns verbatim: [ "$(uname -s)" = "Darwin" ] || return 0 and command -v cargo-zigbuild >/dev/null 2>&1 || return 0. Keep the hard-limit clamp logic (macOS hard limit can be unlimited).
tasks/scripts/stage-prebuilt-binaries.sh — source build-env.sh and call ensure_build_nofile_limit once near the top of main (before the per-arch loop). This single chokepoint fixes docker + podman + all docker:*/multiarch paths at once.
tasks/scripts/vm/build-supervisor-bundle.sh — replace its inlined copy with the shared helper (de-dup).
tasks/scripts/gateway-docker.sh:114-117 — fix the stale comment to state the cross-compile happens on the host and the limit is raised automatically.
- Optional: generalize the env var to
OPENSHELL_BUILD_NOFILE_LIMIT (default 8192), honoring the existing OPENSHELL_VM_BUILD_NOFILE_LIMIT for back-compat.
Alternative Approaches Considered
- Put the helper in
tasks/scripts/container-engine.sh (already sourced by docker-build-image.sh). Rejected: that lib runs container-engine detection and errors if no engine is installed; the VM path must not depend on it. The fd guard is orthogonal to engine selection.
- Guard in
docker-build-image.sh instead of stage-prebuilt-binaries.sh. Weaker: misses mise run stage-prebuilt and any direct staging invocation. stage-prebuilt-binaries.sh is the true chokepoint.
- Document-only ("run
ulimit -n 8192"). Rejected: pushes a papercut onto every macOS contributor on the default path; the VM path already proves auto-raising is the right call.
Patterns to Follow
ensure_build_nofile_limit() in tasks/scripts/vm/build-supervisor-bundle.sh:61-116 is the exact, battle-tested pattern — including the hard-limit clamp and the sccache-EMFILE retry (:157-167, greps Too many open files|os error 24 and retries with env -u RUSTC_WRAPPER). ulimit -n set in the shell propagates to the cargo/zig children the scripts spawn (confirmed: cargo runs as a child of the same shell via mise x --, not a fresh login shell).
Proposed Approach
Extract the existing VM fd-limit guard into a small shared tasks/scripts/build-env.sh, then call it from the host-staging chokepoint (stage-prebuilt-binaries.sh) so all docker/podman/multiarch host cross-compiles raise ulimit -n before cargo zigbuild, exactly as the VM path already does. De-duplicate the VM script to use the shared helper, and correct the misleading in-container comment in gateway-docker.sh. The helper stays a no-op on Linux and when zigbuild is absent, so CI and Linux dev are unaffected. The raised limit is the primary fix; porting the VM path's sccache-EMFILE retry into the stage path is a possible follow-up if EMFILE recurs under sccache pressure.
Scope Assessment
- Complexity: Low
- Confidence: High — every step traced to file:line; the fix reuses an in-tree, proven helper; CI/Linux safety is guaranteed by two independent gates (
uname early-return + the CI gate at docker-build-image.sh:77).
- Estimated files to change: 3 core (
tasks/scripts/build-env.sh new, stage-prebuilt-binaries.sh, vm/build-supervisor-bundle.sh) + up to 3 optional (gateway-docker.sh comment, a test, architecture/build.md line).
- Issue type:
fix
Risks & Open Questions
- Env var naming (decision): introduce
OPENSHELL_BUILD_NOFILE_LIMIT vs. reuse the VM-specific OPENSHELL_VM_BUILD_NOFILE_LIMIT. Recommend the general name with back-compat for the old one.
- sccache interaction:
mise.toml:59 sets RUSTC_WRAPPER=sccache globally, which holds extra fds. The stage path lacks the VM path's sccache-EMFILE retry. Raising the limit should resolve it; decide whether to also port the retry now or as follow-up.
- Helper must be safe to call unconditionally:
stage-prebuilt-binaries.sh is not itself CI-gated (only its caller is), so the uname/zigbuild early-returns must live in the helper, not rely on CI gating.
Test Considerations
- No bash unit/bats harness or shellcheck job currently exists for
tasks/scripts/ (only tasks/scripts/test-install-sh.sh, install.sh-specific).
- Recommended low-cost test: source
build-env.sh, set ulimit -n 256 in a subshell, call ensure_build_nofile_limit, assert the limit rose on macOS / is a no-op on Linux (gate the assertion on uname).
- Definitive manual verification (already reproduced):
ulimit -n 256; mise run gateway (docker driver) fails before the fix; succeeds after. Include this in the PR's testing section.
- Consider a separate
chore to add a shellcheck lint over tasks/scripts/*.sh.
Created by spike investigation. Use build-from-issue to plan and implement.
Problem Statement
On macOS,
mise run gatewayfails while cross-compiling theopenshell-sandboxsupervisor binary, witherror: unable to search for static library …/libwant-*.rlib: ProcessFdQuotaExceeded. The static-unknown-linux-musllink opens ~333.rlibfiles at once, which exceeds macOS's default per-process open-file limit of 256 (ulimit -n). This blocks the most common local dev path (docker driver, and the podman driver too) for any macOS contributor whose shell uses the OS default limit. The VM driver already guards against this; the docker/podman paths do not.Technical Context
The supervisor image (
deploy/docker/Dockerfile.supervisor) isFROM scratchand onlyCOPYs a prebuiltopenshell-sandboxbinary — it never compiles Rust. So the binary must be cross-compiled on the host and staged into the build context first. On macOS that host build usescargo zigbuildtargeting*-unknown-linux-musl. That link step opens every dependency rlib simultaneously; with 333 rlibs and a soft limit of 256, the linker getsEMFILE, surfaced by zig asProcessFdQuotaExceeded.The project already knows about this failure mode:
tasks/scripts/vm/build-supervisor-bundle.shdefinesensure_build_nofile_limit()which raisesulimit -n(toOPENSHELL_VM_BUILD_NOFILE_LIMIT, default 8192, min 1024) before itscargo zigbuild. But that guard exists only on the VM driver path. Every docker/podman host-staging path lacks it.Separately, the comment at
tasks/scripts/gateway-docker.sh:114-117claims the cross-compile "happens inside Linux containers — sidestepping macOS's per-process file-descriptor cap." That is stale/incorrect and directly contradictsarchitecture/build.md:58-60("Neither Dockerfile compiles Rust — both copy a staged binary"). The compile happens on the host.Affected Components
tasks/scripts/stage-prebuilt-binaries.sh,tasks/scripts/docker-build-image.shopenshell-sandbox/openshell-gatewayon the host viacargo zigbuildand stages them into the Docker build context. Single point all docker/podman builds funnel through.tasks/scripts/gateway.sh,tasks/scripts/gateway-docker.shmise run gatewayentrypoint; detects driver and triggers the supervisor build. Carries the stale in-container comment.tasks/scripts/gateway.sh(ensure_podman_supervisor_image)tasks/scripts/vm/build-supervisor-bundle.shdeploy/docker/Dockerfile.supervisorscratchimage; onlyCOPYs the prebuilt host binary (never compiles).Technical Investigation
Architecture Overview
mise run gateway→gateway.sh:92-111detect_driver()(order: in-cluster k8s → podman → docker) → dispatches to the driver path. On the docker path itexecsgateway-docker.sh(gateway.sh:247). In the cross-compile branch (gateway-docker.sh:146, i.e. macOS or arch mismatch), it callsdocker-build-image.sh supervisor-output(:157).docker-build-image.sh:164runsensure_prebuilt_binaries, which — when not in CI andPREBUILT_AUTO_STAGE != 0(:77) — runsstage-prebuilt-binaries.shon the host (:82). For the musl supervisor target that selectscargo zigbuild(stage-prebuilt-binaries.sh:174-175) and invokes it at:208. The resulting binary is copied into thescratchsupervisor image (Dockerfile.supervisor:22,33).The podman path (
gateway.sh:269-272→ensure_podman_supervisor_image→:180mise run build:docker:supervisor) funnels into the identicaldocker-build-image.sh→stage-prebuilt-binaries.shhost build.Code References
tasks/scripts/gateway.sh:92-111tasks/scripts/gateway-docker.sh:114-117tasks/scripts/gateway-docker.sh:146-157docker-build-image.sh supervisor-outputtasks/scripts/docker-build-image.sh:77,82,164ensure_prebuilt_binaries; host-staging gate + call tostage-prebuilt-binaries.shtasks/scripts/stage-prebuilt-binaries.sh:110-114openshell-sandbox, musl libctasks/scripts/stage-prebuilt-binaries.sh:174-175,208cargo zigbuildfor musl; the unguarded host invocationdeploy/docker/Dockerfile.supervisor:22,33FROM scratch; bareCOPYof prebuilt binarytasks/scripts/vm/build-supervisor-bundle.sh:61-116,125ensure_build_nofile_limit()definition + call — the guard to generalizearchitecture/build.md:58-60mise.toml:59RUSTC_WRAPPER = "sccache"(holds extra fds; compounds EMFILE)Current Behavior
On macOS with the default
ulimit -n 256,mise run gateway(docker or podman driver) reaches the hostcargo zigbuildfor*-unknown-linux-musland the link fails:Reproduced deterministically:
ulimit -n 256→ fails onlibwant;ulimit -n 8192(same objects, re-link only) → succeeds. The VM path (gateway:vm) does not fail because it raises the limit first.Complete unguarded call-site inventory
Every host-side musl/zigbuild cross-compile path and whether it has an fd guard:
mise run gateway(docker)gateway-docker.sh:157→docker-build-image.sh:82→stage-prebuilt-binaries.sh:175mise run gateway(podman, first run)gateway.sh:180→docker-build-image.sh:82→stage-prebuilt-binaries.sh:175mise run build:docker:supervisordocker.toml:31→docker-build-image.sh:82→stage-prebuilt-binaries.sh:175mise run build:docker:gatewaydocker.toml:26→docker-build-image.sh:82→stage-prebuilt-binaries.sh:168(gateway, zigbuild GNU 2.28)mise run stage-prebuilt(… all)stage-prebuilt-binaries.sh:168/175docker-publish-multiarch.sh:47,51,99docker-build-image.sh:82→stage-prebuilt-binaries.shmise run gateway:vm/mise run vm:supervisorvm/build-supervisor-bundle.sh:137:125)Not affected:
deploy/docker/cross-build.sh(runs inside a Linux build container), CI workflows (Linux runners), andDockerfile.*-macos(nativeaarch64-apple-darwin, not cross/musl).What Would Need to Change
tasks/scripts/build-env.shexportingensure_build_nofile_limit(), lifted fromvm/build-supervisor-bundle.sh. Preserve both early-returns verbatim:[ "$(uname -s)" = "Darwin" ] || return 0andcommand -v cargo-zigbuild >/dev/null 2>&1 || return 0. Keep the hard-limit clamp logic (macOS hard limit can beunlimited).tasks/scripts/stage-prebuilt-binaries.sh— sourcebuild-env.shand callensure_build_nofile_limitonce near the top ofmain(before the per-arch loop). This single chokepoint fixes docker + podman + alldocker:*/multiarch paths at once.tasks/scripts/vm/build-supervisor-bundle.sh— replace its inlined copy with the shared helper (de-dup).tasks/scripts/gateway-docker.sh:114-117— fix the stale comment to state the cross-compile happens on the host and the limit is raised automatically.OPENSHELL_BUILD_NOFILE_LIMIT(default 8192), honoring the existingOPENSHELL_VM_BUILD_NOFILE_LIMITfor back-compat.Alternative Approaches Considered
tasks/scripts/container-engine.sh(already sourced bydocker-build-image.sh). Rejected: that lib runs container-engine detection and errors if no engine is installed; the VM path must not depend on it. The fd guard is orthogonal to engine selection.docker-build-image.shinstead ofstage-prebuilt-binaries.sh. Weaker: missesmise run stage-prebuiltand any direct staging invocation.stage-prebuilt-binaries.shis the true chokepoint.ulimit -n 8192"). Rejected: pushes a papercut onto every macOS contributor on the default path; the VM path already proves auto-raising is the right call.Patterns to Follow
ensure_build_nofile_limit()intasks/scripts/vm/build-supervisor-bundle.sh:61-116is the exact, battle-tested pattern — including the hard-limit clamp and the sccache-EMFILE retry (:157-167, grepsToo many open files|os error 24and retries withenv -u RUSTC_WRAPPER).ulimit -nset in the shell propagates to thecargo/zig children the scripts spawn (confirmed: cargo runs as a child of the same shell viamise x --, not a fresh login shell).Proposed Approach
Extract the existing VM fd-limit guard into a small shared
tasks/scripts/build-env.sh, then call it from the host-staging chokepoint (stage-prebuilt-binaries.sh) so all docker/podman/multiarch host cross-compiles raiseulimit -nbeforecargo zigbuild, exactly as the VM path already does. De-duplicate the VM script to use the shared helper, and correct the misleading in-container comment ingateway-docker.sh. The helper stays a no-op on Linux and when zigbuild is absent, so CI and Linux dev are unaffected. The raised limit is the primary fix; porting the VM path's sccache-EMFILE retry into the stage path is a possible follow-up if EMFILE recurs under sccache pressure.Scope Assessment
unameearly-return + theCIgate atdocker-build-image.sh:77).tasks/scripts/build-env.shnew,stage-prebuilt-binaries.sh,vm/build-supervisor-bundle.sh) + up to 3 optional (gateway-docker.shcomment, a test,architecture/build.mdline).fixRisks & Open Questions
OPENSHELL_BUILD_NOFILE_LIMITvs. reuse the VM-specificOPENSHELL_VM_BUILD_NOFILE_LIMIT. Recommend the general name with back-compat for the old one.mise.toml:59setsRUSTC_WRAPPER=sccacheglobally, which holds extra fds. The stage path lacks the VM path's sccache-EMFILE retry. Raising the limit should resolve it; decide whether to also port the retry now or as follow-up.stage-prebuilt-binaries.shis not itself CI-gated (only its caller is), so theuname/zigbuild early-returns must live in the helper, not rely onCIgating.Test Considerations
tasks/scripts/(onlytasks/scripts/test-install-sh.sh, install.sh-specific).build-env.sh, setulimit -n 256in a subshell, callensure_build_nofile_limit, assert the limit rose on macOS / is a no-op on Linux (gate the assertion onuname).ulimit -n 256; mise run gateway(docker driver) fails before the fix; succeeds after. Include this in the PR's testing section.choreto add a shellcheck lint overtasks/scripts/*.sh.Created by spike investigation. Use
build-from-issueto plan and implement.