Skip to content

fix(cli): support arm64 hosts for --docker render#1196

Merged
jrusso1020 merged 2 commits into
mainfrom
fix/docker-render-arm64-host
Jun 4, 2026
Merged

fix(cli): support arm64 hosts for --docker render#1196
jrusso1020 merged 2 commits into
mainfrom
fix/docker-render-arm64-host

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

Summary

  • --docker was pinning --platform linux/amd64 unconditionally, which on Apple Silicon / Graviton forced qemu emulation of chrome-headless-shell. The emulated chrome either SEGV'd or hung on page navigation.
  • Derive the platform from process.arch so arm64 hosts get a native arm64 image with system chromium fallback. amd64 hosts retain the existing chrome-headless-shell BeginFrame path with no behavior change.

Closes #1193, #1194, #1195.

Root cause

packages/cli/src/utils/dockerRunArgs.ts and packages/cli/src/commands/render.ts both hardcoded --platform linux/amd64. The accompanying comment explained: "chrome-headless-shell doesn't ship ARM Linux binaries." That's still true today — the Chrome for Testing known-good-versions manifest only ships linux64 for chrome-headless-shell. But the consequence of pinning amd64 on arm64 hosts (forced qemu emulation of chrome) is worse than the cure (no Docker render at all on Apple Silicon).

Approach

  1. resolveDockerPlatform() maps process.archlinux/amd64 | linux/arm64.
  2. buildDockerRunArgs accepts an optional platform; default = host arch.
  3. ensureDockerImage takes platform, builds with --platform, tags -arm64 suffix on arm64 so caches don't collide, passes TARGETARCH as an explicit --build-arg (BuildKit auto-args are unreliable on the legacy builder and on colima's BuildKit).
  4. Dockerfile.render reads ARG TARGETARCH, skips the chrome-headless-shell install on non-amd64, and the wrapper script only exports PRODUCER_HEADLESS_SHELL_PATH when the binary is actually present — otherwise the engine uses the system chromium that's already installed via apt and pointed at by PUPPETEER_EXECUTABLE_PATH.
  5. On arm64 hosts we print a one-line warning that output won't be byte-identical to amd64 renders (system chromium uses screenshot capture, not HeadlessExperimental.beginFrame).

Trade-off

Arm64 renders lose byte-for-byte parity with amd64 renders. That's an explicit downgrade documented in the warning, but it's strictly better than the current state where --docker doesn't work at all on Apple Silicon. The producer regression test image (Dockerfile.test) is untouched — golden baselines still run on amd64 in CI.

Test plan

  • Unit: bunx vitest run packages/cli — 643 tests pass (+5 new for resolveDockerPlatform and the new platform arg). One pre-existing unrelated module-resolution warning, present on baseline.
  • Lint / format: bunx oxlint and bunx oxfmt --check clean on the four changed files.
  • End-to-end on macOS 26.5 / M4 Max:
    • Before: qemu: unknown option 'type=gpu-process' followed by chrome-headless-shell SIGSEGV after ~4 min.
    • After: Browser: system, falls back to screenshot mode with documented warning, 300/300 frames captured in 18s, MP4 produced. Total wallclock 1m18s (most is the one-time arm64 image build).
  • Smoke test on an amd64 Linux host — would appreciate a reviewer running this. The amd64 code path is byte-identical to before except for the new --build-arg TARGETARCH=amd64, which the legacy RUN step now consumes; the install is unchanged.
  • Optional: run the producer regression harness inside the test image to confirm nothing there shifted (Dockerfile.test wasn't modified, so expected to be a no-op).

The Docker render path pinned `--platform linux/amd64` for both build
and run, which on Apple Silicon / Graviton forced qemu emulation of
chrome-headless-shell. The emulated chrome process either SEGV'd or
hung on page navigation, producing the failures reported in #1193 /
#1194 / #1195.

Derive the platform from `process.arch` instead. On arm64 hosts:

- The image builds natively (no qemu).
- The Dockerfile skips the chrome-headless-shell install because
  Chrome for Testing only publishes a `linux64` build (verified
  against the known-good-versions manifest).
- The wrapper script leaves `PRODUCER_HEADLESS_SHELL_PATH` unset
  when no headless-shell binary is present, so the engine falls
  back to the system chromium that the Dockerfile already
  installs from apt and points at via `PUPPETEER_EXECUTABLE_PATH`.

`TARGETARCH` is forwarded as an explicit `--build-arg` instead of
relying on BuildKit's automatic platform args — the legacy
builder (and some BuildKit configs, including colima on macOS)
leaves it unset, which would silently bypass the arch conditional
in the Dockerfile.

Image tags are now suffixed with `-arm64` on arm64 hosts so amd64
and arm64 images of the same hyperframes version can coexist in
the local cache.

The arm64 path renders correctly but loses byte-for-byte parity
with amd64 (system chromium uses screenshot capture, not
HeadlessExperimental.beginFrame). The CLI prints a one-line
warning so users comparing against amd64 baselines know.

Verified on macOS 26.5 / M4 Max:

- Before: `qemu: unknown option 'type=gpu-process'` followed by a
  chrome-headless-shell SIGSEGV after ~4 minutes.
- After: 300/300 frames captured in ~18s of render time (1m18s
  wallclock including a one-time image build), MP4 produced.

Closes #1193
Closes #1194
Closes #1195
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix. Root cause is clear — QEMU emulation of chrome-headless-shell on Apple Silicon reliably crashes. Approach is correct.

Key things that look right:

  • resolveDockerPlatform(arch = process.arch) is testable without mocking process.arch
  • -arm64 image tag suffix prevents cache collisions between amd64/arm64 builds on the same machine
  • ARG TARGETARCH explicit pass (not relying on BuildKit auto-args) handles colima/legacy builder edge cases
  • 2>/dev/null on the find in the wrapper script prevents build failure on arm64 where the shell path doesn't exist
  • Tests pin platform: "linux/amd64" so snapshots are arch-independent — good forward-thinking
  • amd64 path is byte-for-byte unchanged (only difference is the new --build-arg TARGETARCH=amd64, which the Dockerfile already handled with the existing install step)

Trade-off is correctly documented and explicitly communicated to the user at render time. LGTM.

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Root cause is correct: pinning --platform linux/amd64 on arm64 hosts forced qemu to emulate chrome-headless-shell, producing the SEGV/navigation-timeout the issues describe. The fix is minimal, the trade-off is honestly documented, and the amd64 code path is byte-identical to before (aside from the explicit --build-arg TARGETARCH=amd64, which is a no-op for the install step). The arm64 fallback path is correctly wired end-to-end: resolveHeadlessShellPath() returns undefined when PRODUCER_HEADLESS_SHELL_PATH is unset, launchBrowser falls to screenshot mode and passes executablePath = undefined, and puppeteer picks up PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium from the Dockerfile. Tested and reading clean.


Important: Dockerfile wrapper loses loud failure on amd64 if chrome-headless-shell install regresses

Old wrapper (&& chain):

RUN SHELL_PATH=$(find ... | head -1)     && printf '...' > /usr/local/bin/hf-render     && chmod +x /usr/local/bin/hf-render

If find returned empty (install failed, path changed), the && short-circuited and the RUN step failed, breaking the build loudly and visibly.

New wrapper:

RUN SHELL_PATH=$(find ... 2>/dev/null | head -1);     if [ -n "$SHELL_PATH" ]; then       printf '#!/bin/sh
export PRODUCER_HEADLESS_SHELL_PATH=...
' ...;     else       printf '#!/bin/sh
exec hyperframes render "$@"
' > ...;     fi

On amd64 with a regressed install, SHELL_PATH is empty and the else branch writes a system-chromium wrapper — silently dropping BeginFrame determinism with no build error or warning. Suggest adding a guard inside the else:

    else       if [ "$TARGETARCH" = "amd64" ]; then         echo "ERROR: chrome-headless-shell not found on amd64 — install may have failed" >&2; exit 1;       fi;       printf '#!/bin/sh
exec hyperframes render "$@"
' > /usr/local/bin/hf-render;     fi

This restores the previous loud-failure behavior on amd64 while still letting arm64 fall through cleanly.


Nit: arm64 parity-loss warning is suppressed by --quiet

The if (!options.quiet && platform === "linux/arm64") guard means the warning is invisible to any automation or CI script that passes --quiet. For end-user CLI invocations this is fine; just worth noting if a future CI pipeline runs arm64 Docker renders and compares against amd64 goldens.


Callout: arch-suffixed cache tag is the right call

dockerImageTagForPlatform appending -arm64 is the correct approach. A developer who pulls an amd64 image (or vice versa) on a multi-arch machine won't get a cache hit against the wrong arch image. The tag format prefix:version-arm64 is well-formed (no digest involved).

Callout: resolveDockerPlatform test coverage is solid

Parametrized with an explicit arch argument so the test is arch-independent and snapshots are stable across hosts. The everything-else→amd64 safe-default for unknown arches (riscv64 test case) is a nice belt.

— Vai

Follow-up to 61880cd. Addresses one substantive review comment from
@vanceingalls and three self-review gaps.

1. Restore loud build failure on amd64 when chrome-headless-shell is
   missing (per @vanceingalls). The original Dockerfile used an `&&`
   chain that crashed the build if `find` returned empty; the new
   `if/else` wrapper silently fell through to system chromium even on
   amd64, which would mask golden-baseline regressions from a future
   @puppeteer/browsers cache layout change. The else branch now checks
   `TARGETARCH = amd64` and exits 1 with an actionable error, while
   arm64 still falls through to the system-chromium wrapper cleanly.

2. Add `HYPERFRAMES_DOCKER_PLATFORM` env override. The fix derives
   platform from `process.arch`, which silently picks the wrong arch
   in three real-world cases: x64 Node under Rosetta on Apple Silicon
   (re-triggers issue #1193), parity-regen for amd64 golden baselines
   on an arm64 host, and DOCKER_HOST pointing at a remote daemon with
   a different arch. Empty/whitespace override is a no-op (falls back
   to arch detection) so `export FOO=""` doesn't pin platform to "".

3. Fail fast when `--gpu` is requested on arm64. Docker Desktop on
   Apple Silicon doesn't implement `--gpus` passthrough; the previous
   code would crash at `docker run` with an opaque device-driver
   error. We now short-circuit with errorBox pointing at the env
   override as the workaround.

4. Close the test gap on the default-arch resolution. Every previous
   test passed `arch` explicitly; a refactor that dropped the
   `= process.arch` default would pass all tests but break every arm64
   host at runtime. Added one assertion that calls
   `resolveDockerPlatform()` with no args, plus coverage for the env
   override.

The new arm64 platform-checking logic is extracted into
`resolveDockerHostPlatform()` so `renderDocker` itself stays focused
on the build/run wiring (and below the fallow complexity gate).

Test plan:
- `bunx vitest run packages/cli/src/utils/dockerRunArgs.test.ts` — 31 passed (was 27).
- `bunx vitest run packages/cli` — 647 passed (was 643).
- E2E on macOS 26.5 / M4 Max: deleted the cached arm64 image, ran
  `--docker --quality draft --workers 1` against the blank scaffold —
  300/300 frames in 1m1s wallclock, MP4 produced.
Copy link
Copy Markdown
Collaborator Author

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful read! Pushed cfeb8db to address.

Your main ask — fail-loud on amd64 when chrome-headless-shell is missing: done exactly as suggested. The else branch now checks TARGETARCH = amd64 and exits 1 with an error pointing at the likely cause (@puppeteer/browsers install regression or cache layout change). arm64 still falls through cleanly. That restores the previous loud-failure semantic where the &&-chained wrapper crashed docker build on a missing binary.

Your nit about --quiet suppressing the warning: good call. I went one better than just unsuppressing — added a HYPERFRAMES_DOCKER_PLATFORM env override so any caller (CI script, automation, baseline-regen workflow) can force linux/amd64 on an arm64 host to keep byte-parity. Mentioned in the warning text now. The env override also closes two related self-review gaps:

  • x64 Node under Rosetta on Apple Silicon (process.arch === "x64" even though the host is arm64 → CLI would otherwise emit linux/amd64 and re-trigger #1193). Common with nvm-installed x64 Node or arch -x86_64 node.
  • DOCKER_HOST=ssh://amd64-server where the daemon arch ≠ local arch.

Also picked up from self-review:

  • buildDockerRunArgs was still emitting --gpus all on arm64 unconditionally, which fails at docker run on Apple Silicon (Docker Desktop / colima don't expose --gpus passthrough). Now short-circuits in resolveDockerHostPlatform with errorBox pointing at the env override as the GPU-encoding workaround.
  • The previous tests pinned platform explicitly in every case, so a refactor that dropped the = process.arch default param would have passed CI but broken every arm64 host at runtime. Added a single assertion that calls resolveDockerPlatform() with no args, plus coverage for the env override and whitespace-handling edge cases.

The arm64 platform/policy logic moved into a small resolveDockerHostPlatform() helper so renderDocker stayed below the fallow complexity gate.

Tests: 31 in dockerRunArgs.test.ts (was 27), 647 in packages/cli (was 643). E2E re-verified by deleting the cached arm64 image and rendering — 300/300 frames in 1m1s wallclock.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated Dockerfile now includes the explicit exit 1 guard on amd64 when chrome-headless-shell isn't found — Vai's nit is addressed. LGTM.

Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review on cfeb8db.

Blocker addressed — the else branch in the Dockerfile wrapper now checks $TARGETARCH = amd64 and calls exit 1 with an explicit error message when chrome-headless-shell is missing. Fail-loud semantics on amd64 are fully restored. arm64 still falls through to system chromium cleanly.

Nit upgraded — rather than just unsuppressing the --quiet-gated warning, James added HYPERFRAMES_DOCKER_PLATFORM as an env override. CI automation can force linux/amd64 on an arm64 host for byte-parity runs without touching source. That's strictly better than what I asked for.

New --gpu / arm64 guardresolveDockerHostPlatform now short-circuits with an errorBox when --gpu is requested on an arm64 host, catching the Docker Desktop / colima --gpus limitation before docker run surfaces an opaque device-driver error. Clean.

Test coverage — the no-arg resolveDockerPlatform() regression guard is a good catch; a future refactor dropping either default param would have passed CI while breaking every arm64 host at runtime. Whitespace / empty-string handling on the env override is also covered.

All CI green. Nothing outstanding.

— Vai

@jrusso1020 jrusso1020 merged commit 2be4193 into main Jun 4, 2026
35 checks passed
@jrusso1020 jrusso1020 deleted the fix/docker-render-arm64-host branch June 4, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker render times out on macOS 26.5.1

3 participants