fix(cli): support arm64 hosts for --docker render#1196
Conversation
The Docker render path pinned `--platform linux/amd64` for both build and run, which on Apple Silicon / Graviton forced qemu emulation of chrome-headless-shell. The emulated chrome process either SEGV'd or hung on page navigation, producing the failures reported in #1193 / #1194 / #1195. Derive the platform from `process.arch` instead. On arm64 hosts: - The image builds natively (no qemu). - The Dockerfile skips the chrome-headless-shell install because Chrome for Testing only publishes a `linux64` build (verified against the known-good-versions manifest). - The wrapper script leaves `PRODUCER_HEADLESS_SHELL_PATH` unset when no headless-shell binary is present, so the engine falls back to the system chromium that the Dockerfile already installs from apt and points at via `PUPPETEER_EXECUTABLE_PATH`. `TARGETARCH` is forwarded as an explicit `--build-arg` instead of relying on BuildKit's automatic platform args — the legacy builder (and some BuildKit configs, including colima on macOS) leaves it unset, which would silently bypass the arch conditional in the Dockerfile. Image tags are now suffixed with `-arm64` on arm64 hosts so amd64 and arm64 images of the same hyperframes version can coexist in the local cache. The arm64 path renders correctly but loses byte-for-byte parity with amd64 (system chromium uses screenshot capture, not HeadlessExperimental.beginFrame). The CLI prints a one-line warning so users comparing against amd64 baselines know. Verified on macOS 26.5 / M4 Max: - Before: `qemu: unknown option 'type=gpu-process'` followed by a chrome-headless-shell SIGSEGV after ~4 minutes. - After: 300/300 frames captured in ~18s of render time (1m18s wallclock including a one-time image build), MP4 produced. Closes #1193 Closes #1194 Closes #1195
miguel-heygen
left a comment
There was a problem hiding this comment.
Clean fix. Root cause is clear — QEMU emulation of chrome-headless-shell on Apple Silicon reliably crashes. Approach is correct.
Key things that look right:
resolveDockerPlatform(arch = process.arch)is testable without mocking process.arch-arm64image tag suffix prevents cache collisions between amd64/arm64 builds on the same machineARG TARGETARCHexplicit pass (not relying on BuildKit auto-args) handles colima/legacy builder edge cases2>/dev/nullon the find in the wrapper script prevents build failure on arm64 where the shell path doesn't exist- Tests pin
platform: "linux/amd64"so snapshots are arch-independent — good forward-thinking - amd64 path is byte-for-byte unchanged (only difference is the new
--build-arg TARGETARCH=amd64, which the Dockerfile already handled with the existing install step)
Trade-off is correctly documented and explicitly communicated to the user at render time. LGTM.
vanceingalls
left a comment
There was a problem hiding this comment.
Root cause is correct: pinning --platform linux/amd64 on arm64 hosts forced qemu to emulate chrome-headless-shell, producing the SEGV/navigation-timeout the issues describe. The fix is minimal, the trade-off is honestly documented, and the amd64 code path is byte-identical to before (aside from the explicit --build-arg TARGETARCH=amd64, which is a no-op for the install step). The arm64 fallback path is correctly wired end-to-end: resolveHeadlessShellPath() returns undefined when PRODUCER_HEADLESS_SHELL_PATH is unset, launchBrowser falls to screenshot mode and passes executablePath = undefined, and puppeteer picks up PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium from the Dockerfile. Tested and reading clean.
Important: Dockerfile wrapper loses loud failure on amd64 if chrome-headless-shell install regresses
Old wrapper (&& chain):
RUN SHELL_PATH=$(find ... | head -1) && printf '...' > /usr/local/bin/hf-render && chmod +x /usr/local/bin/hf-renderIf find returned empty (install failed, path changed), the && short-circuited and the RUN step failed, breaking the build loudly and visibly.
New wrapper:
RUN SHELL_PATH=$(find ... 2>/dev/null | head -1); if [ -n "$SHELL_PATH" ]; then printf '#!/bin/sh
export PRODUCER_HEADLESS_SHELL_PATH=...
' ...; else printf '#!/bin/sh
exec hyperframes render "$@"
' > ...; fiOn amd64 with a regressed install, SHELL_PATH is empty and the else branch writes a system-chromium wrapper — silently dropping BeginFrame determinism with no build error or warning. Suggest adding a guard inside the else:
else if [ "$TARGETARCH" = "amd64" ]; then echo "ERROR: chrome-headless-shell not found on amd64 — install may have failed" >&2; exit 1; fi; printf '#!/bin/sh
exec hyperframes render "$@"
' > /usr/local/bin/hf-render; fiThis restores the previous loud-failure behavior on amd64 while still letting arm64 fall through cleanly.
Nit: arm64 parity-loss warning is suppressed by --quiet
The if (!options.quiet && platform === "linux/arm64") guard means the warning is invisible to any automation or CI script that passes --quiet. For end-user CLI invocations this is fine; just worth noting if a future CI pipeline runs arm64 Docker renders and compares against amd64 goldens.
Callout: arch-suffixed cache tag is the right call
dockerImageTagForPlatform appending -arm64 is the correct approach. A developer who pulls an amd64 image (or vice versa) on a multi-arch machine won't get a cache hit against the wrong arch image. The tag format prefix:version-arm64 is well-formed (no digest involved).
Callout: resolveDockerPlatform test coverage is solid
Parametrized with an explicit arch argument so the test is arch-independent and snapshots are stable across hosts. The everything-else→amd64 safe-default for unknown arches (riscv64 test case) is a nice belt.
— Vai
Follow-up to 61880cd. Addresses one substantive review comment from @vanceingalls and three self-review gaps. 1. Restore loud build failure on amd64 when chrome-headless-shell is missing (per @vanceingalls). The original Dockerfile used an `&&` chain that crashed the build if `find` returned empty; the new `if/else` wrapper silently fell through to system chromium even on amd64, which would mask golden-baseline regressions from a future @puppeteer/browsers cache layout change. The else branch now checks `TARGETARCH = amd64` and exits 1 with an actionable error, while arm64 still falls through to the system-chromium wrapper cleanly. 2. Add `HYPERFRAMES_DOCKER_PLATFORM` env override. The fix derives platform from `process.arch`, which silently picks the wrong arch in three real-world cases: x64 Node under Rosetta on Apple Silicon (re-triggers issue #1193), parity-regen for amd64 golden baselines on an arm64 host, and DOCKER_HOST pointing at a remote daemon with a different arch. Empty/whitespace override is a no-op (falls back to arch detection) so `export FOO=""` doesn't pin platform to "". 3. Fail fast when `--gpu` is requested on arm64. Docker Desktop on Apple Silicon doesn't implement `--gpus` passthrough; the previous code would crash at `docker run` with an opaque device-driver error. We now short-circuit with errorBox pointing at the env override as the workaround. 4. Close the test gap on the default-arch resolution. Every previous test passed `arch` explicitly; a refactor that dropped the `= process.arch` default would pass all tests but break every arm64 host at runtime. Added one assertion that calls `resolveDockerPlatform()` with no args, plus coverage for the env override. The new arm64 platform-checking logic is extracted into `resolveDockerHostPlatform()` so `renderDocker` itself stays focused on the build/run wiring (and below the fallow complexity gate). Test plan: - `bunx vitest run packages/cli/src/utils/dockerRunArgs.test.ts` — 31 passed (was 27). - `bunx vitest run packages/cli` — 647 passed (was 643). - E2E on macOS 26.5 / M4 Max: deleted the cached arm64 image, ran `--docker --quality draft --workers 1` against the blank scaffold — 300/300 frames in 1m1s wallclock, MP4 produced.
jrusso1020
left a comment
There was a problem hiding this comment.
Thanks for the careful read! Pushed cfeb8db to address.
Your main ask — fail-loud on amd64 when chrome-headless-shell is missing: done exactly as suggested. The else branch now checks TARGETARCH = amd64 and exits 1 with an error pointing at the likely cause (@puppeteer/browsers install regression or cache layout change). arm64 still falls through cleanly. That restores the previous loud-failure semantic where the &&-chained wrapper crashed docker build on a missing binary.
Your nit about --quiet suppressing the warning: good call. I went one better than just unsuppressing — added a HYPERFRAMES_DOCKER_PLATFORM env override so any caller (CI script, automation, baseline-regen workflow) can force linux/amd64 on an arm64 host to keep byte-parity. Mentioned in the warning text now. The env override also closes two related self-review gaps:
- x64 Node under Rosetta on Apple Silicon (
process.arch === "x64"even though the host is arm64 → CLI would otherwise emitlinux/amd64and re-trigger #1193). Common with nvm-installed x64 Node orarch -x86_64 node. DOCKER_HOST=ssh://amd64-serverwhere the daemon arch ≠ local arch.
Also picked up from self-review:
buildDockerRunArgswas still emitting--gpus allon arm64 unconditionally, which fails atdocker runon Apple Silicon (Docker Desktop / colima don't expose--gpuspassthrough). Now short-circuits inresolveDockerHostPlatformwith errorBox pointing at the env override as the GPU-encoding workaround.- The previous tests pinned
platformexplicitly in every case, so a refactor that dropped the= process.archdefault param would have passed CI but broken every arm64 host at runtime. Added a single assertion that callsresolveDockerPlatform()with no args, plus coverage for the env override and whitespace-handling edge cases.
The arm64 platform/policy logic moved into a small resolveDockerHostPlatform() helper so renderDocker stayed below the fallow complexity gate.
Tests: 31 in dockerRunArgs.test.ts (was 27), 647 in packages/cli (was 643). E2E re-verified by deleting the cached arm64 image and rendering — 300/300 frames in 1m1s wallclock.
miguel-heygen
left a comment
There was a problem hiding this comment.
Updated Dockerfile now includes the explicit exit 1 guard on amd64 when chrome-headless-shell isn't found — Vai's nit is addressed. LGTM.
vanceingalls
left a comment
There was a problem hiding this comment.
Re-review on cfeb8db.
Blocker addressed — the else branch in the Dockerfile wrapper now checks $TARGETARCH = amd64 and calls exit 1 with an explicit error message when chrome-headless-shell is missing. Fail-loud semantics on amd64 are fully restored. arm64 still falls through to system chromium cleanly.
Nit upgraded — rather than just unsuppressing the --quiet-gated warning, James added HYPERFRAMES_DOCKER_PLATFORM as an env override. CI automation can force linux/amd64 on an arm64 host for byte-parity runs without touching source. That's strictly better than what I asked for.
New --gpu / arm64 guard — resolveDockerHostPlatform now short-circuits with an errorBox when --gpu is requested on an arm64 host, catching the Docker Desktop / colima --gpus limitation before docker run surfaces an opaque device-driver error. Clean.
Test coverage — the no-arg resolveDockerPlatform() regression guard is a good catch; a future refactor dropping either default param would have passed CI while breaking every arm64 host at runtime. Whitespace / empty-string handling on the env override is also covered.
All CI green. Nothing outstanding.
— Vai
Summary
--dockerwas pinning--platform linux/amd64unconditionally, which on Apple Silicon / Graviton forced qemu emulation of chrome-headless-shell. The emulated chrome either SEGV'd or hung on page navigation.process.archso arm64 hosts get a native arm64 image with system chromium fallback. amd64 hosts retain the existing chrome-headless-shell BeginFrame path with no behavior change.Closes #1193, #1194, #1195.
Root cause
packages/cli/src/utils/dockerRunArgs.tsandpackages/cli/src/commands/render.tsboth hardcoded--platform linux/amd64. The accompanying comment explained: "chrome-headless-shell doesn't ship ARM Linux binaries." That's still true today — the Chrome for Testing known-good-versions manifest only shipslinux64for chrome-headless-shell. But the consequence of pinning amd64 on arm64 hosts (forced qemu emulation of chrome) is worse than the cure (no Docker render at all on Apple Silicon).Approach
resolveDockerPlatform()mapsprocess.arch→linux/amd64|linux/arm64.buildDockerRunArgsaccepts an optionalplatform; default = host arch.ensureDockerImagetakesplatform, builds with--platform, tags-arm64suffix on arm64 so caches don't collide, passesTARGETARCHas an explicit--build-arg(BuildKit auto-args are unreliable on the legacy builder and on colima's BuildKit).Dockerfile.renderreadsARG TARGETARCH, skips the chrome-headless-shell install on non-amd64, and the wrapper script only exportsPRODUCER_HEADLESS_SHELL_PATHwhen the binary is actually present — otherwise the engine uses the system chromium that's already installed via apt and pointed at byPUPPETEER_EXECUTABLE_PATH.HeadlessExperimental.beginFrame).Trade-off
Arm64 renders lose byte-for-byte parity with amd64 renders. That's an explicit downgrade documented in the warning, but it's strictly better than the current state where
--dockerdoesn't work at all on Apple Silicon. The producer regression test image (Dockerfile.test) is untouched — golden baselines still run on amd64 in CI.Test plan
bunx vitest run packages/cli— 643 tests pass (+5 new forresolveDockerPlatformand the new platform arg). One pre-existing unrelated module-resolution warning, present on baseline.bunx oxlintandbunx oxfmt --checkclean on the four changed files.qemu: unknown option 'type=gpu-process'followed bychrome-headless-shellSIGSEGV after ~4 min.Browser: system, falls back to screenshot mode with documented warning, 300/300 frames captured in 18s, MP4 produced. Total wallclock 1m18s (most is the one-time arm64 image build).--build-arg TARGETARCH=amd64, which the legacyRUNstep now consumes; the install is unchanged.