Skip to content

Detect SPL code size from firmware binary#55

Merged
widgetii merged 2 commits intomasterfrom
fix/dynamic-spl-size
Apr 21, 2026
Merged

Detect SPL code size from firmware binary#55
widgetii merged 2 commits intomasterfrom
fix/dynamic-spl-size

Conversation

@widgetii
Copy link
Copy Markdown
Member

Summary

Automatically detect SPL code size instead of relying on the hardcoded profile value. Finds the LZMA compressed payload boundary in the firmware binary and sends all code before it to SRAM.

Problem

The profile's FILELEN[1] (SPL max size) was set to match the original HiTool values (e.g. 0x4F00 = 20224 bytes for hi3516av200). When U-Boot includes optional features like SVB (Smart Voltage Body — mandatory voltage regulation), the SPL code grows beyond this limit. The bootrom only sends the first N bytes to SRAM, truncating start_ddr_training() mid-function. The SPL executes garbage, corrupting DDR.

Fix

_detect_spl_size() scans for the LZMA header (byte 0x5D + valid dictionary size) in the firmware binary. Everything before it is SPL code. Uses max(detected, profile_default) so existing firmware works unchanged, while larger SPL binaries automatically get the full code sent.

Hardware proof (Hi3516AV200, SVB-enabled U-Boot)

INFO: SPL code extends to 0x6C00 (27648 bytes), profile default was 0x4F00 (20224 bytes)

PHY ID: 0x02430c54
PHY interface: RMII (auto-detected)
ETH0: PHY(phyaddr=1, rmii) link UP: DUPLEX=FULL : SPEED=100M

>>> ping 10.216.128.2
host 10.216.128.2 is alive
[OK]

Fixes: OpenIPC/firmware#517

🤖 Generated with Claude Code

widgetii and others added 2 commits April 21, 2026 15:08
Restore command detects UBIFS partitions (magic 0x06101831) and uses
UBI commands (nand erase → ubi part → ubi create → ubi write) instead
of raw nand write which corrupts UBI due to bad block layout differences.

- Add --mtdparts flag for explicit NAND partition layout
- Add tftp_to_ram() with tftpboot/tftp fallback (vendor U-Boot compat)
- Fix restore power-off ordering (kill OS before opening serial)
- Use real partition offsets from mtdparts for NAND operations
- Detect UBIFS vs raw UBI vs non-UBI from file magic bytes

Known issue: ubi write fails with "Cannot start volume update" on
hi3516av200 due to U-Boot heap exhaustion after TFTP. Tracked separately.

Tested on hi3516av200:
- Boot + kernel partitions: nand write OK, camera boots
- UBI detection + ubi part + ubi create: OK
- Full 128MB vendor dump restore (non-UBI parts): verified

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hardcoded SPL max size from the chip profile can be too small when
the U-Boot SPL includes optional features like SVB (voltage regulation).
This causes the bootrom to truncate the SPL, executing garbage code.

Detect the actual code boundary by finding the LZMA compressed payload
header in the firmware binary. Everything before it is executable SPL
code. Use max(detected, profile_default) so firmware that grew beyond
the original profile limit works automatically.

Tested on Hi3516AV200 with SVB-enabled U-Boot (code size 0x6C00 vs
profile default 0x4F00). Boot + DDR + ping verified working.

Also always apply 300ms DDR training delay regardless of SPL size,
since the previous check against specific sizes was fragile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit 4e04c39 into master Apr 21, 2026
13 checks passed
widgetii added a commit that referenced this pull request May 4, 2026
End-to-end install of OpenIPC firmware on hi3516av300 hung mid-transfer
in two places. This PR fixes both, plus the supporting infrastructure
that masked the failures as indefinite hangs.

## Why

End-to-end `defib install -c hi3516av300 --firmware
openipc.hi3516av300-nor-neo.tgz` failed silently — process blocked
indefinitely with no error.

Two distinct hi3516cv500-family bootrom quirks, plus the
`SerialTransport.write()` blocking forever on stalled writes (which
masked everything as a hang).

## SPL boundary detection (gzip + round-down)

OpenIPC's hi3516av300 universal U-Boot uses **gzip** (not LZMA) to
compress the embedded U-Boot payload. The previous `_detect_spl_size`
only scanned for LZMA, found nothing, and fell back to the profile
default `FILELEN[1]=0x6000` (24 KB). That overshoots the actual 21 KB
SPL code by 3 KB into SRAM that the bootrom uses for its own working
memory — the cv500-family bootrom hangs the moment we start overwriting
it.

PR #55's `max(detected, profile_max)` was correct for HiTool reference
SPLs (which fill the full window) and for SVB-enabled av200 (where
detected > profile_max). It is wrong for OpenIPC builds that are more
compact than HiTool's reference: we must trust the detected boundary
even when it's smaller than profile_max.

Two changes:
- Detect gzip (`1f 8b 08`) in addition to LZMA.
- Round the boundary **down** to the nearest 1 KB so we never include
any bytes of the compressed payload (the previous round-up included ~272
bytes of gzip data past the boundary).

## SPL TAIL non-fatal for av200/av300

When `prestep_data` is set (av200/av300/sendFrameForStart chips), the
SPL detaches the bootrom protocol handler as soon as it receives the
declared byte count, so the SPL TAIL frame is never ACKed. Treating that
as fatal stalled the SPL stage. Mirrors the existing best-effort TAIL
handling for U-Boot on these same chips.

## U-Boot upload: zero long 0xFF runs

After the SPL boundary fix, the U-Boot upload to DDR reproducibly hung
at chunk 21 with the *exact* same byte content. Bisection narrowed the
trigger to a 12-byte run of `0xFF` padding between the end of the SPL
code and the start of the gzip header (byte offsets `0x52E4..0x52EF` of
the av300 universal U-Boot).

Confirmed root cause:
- 11 consecutive `0xFF` bytes: PASS
- 12 consecutive `0xFF` bytes: HANG mid-DATA frame, no ACK ever
- Patching those 12 bytes to `0x00`: full 248 KB U-Boot uploads cleanly
and the rest of the install completes.

Almost certainly a UART RX-path quirk in the cv500-family bootrom. The
0xFF runs are inert padding, never executed by anything, so zeroing them
in `_send_uboot` is safe. Threshold of 12 matches the empirically
observed boundary.

## Write timeout via pyserial `write_timeout`

Both bugs above presented as **indefinite hangs**, not errors, because
`SerialTransport.write` had no timeout. pyserial's blocking `write()`
blocked in `pselect6` forever when the kernel TX buffer stopped draining
(because the device stopped accepting bytes).

`asyncio.wait_for` on a `run_in_executor` future does not help —
cancelling the asyncio task can't interrupt a thread blocked in a
syscall. The fix is to set `port.write_timeout = 5.0` so pyserial itself
returns and raises `SerialTimeoutException`, which we map to
`TransportTimeout`. 5 s ceiling: a 1 KB write at 115200 baud is ~89 ms.

## Retry-loop catches write timeouts

Previously `transport.write(frame_data)` was called *outside* the retry
loop's `try/except TransportTimeout`. A transient write failure would
propagate up the stack and bypass retry. Moving the write inside the try
block makes write timeouts symmetric with read timeouts — both are
retried.

## install: `--nor-size 32`

The `install` subcommand only supported 8 MB and 16 MB NOR layouts.
Added a 32 MB layout (256 KB boot, 64 KB env, 3 MB kernel, 24 MB rootfs,
rest rootfs_data). OpenIPC U-Boot defines `setnor8m` and `setnor16m` env
vars but not `setnor32m`, so for the 32 MB case we send the raw mtdparts
string inline instead of `run mtdpartsnor32m`.

## Verification

End-to-end on a real hi3516av300 (Vstarcam, IMX415):

\`\`\`
Phase 1: Burning U-Boot to RAM                        ✓ 32 s
  SPL boundary detected (gzip) at 0x5000 (20480 bytes);
  profile default was 0x6000 (24576 bytes)
  _zero_long_ff_runs: zeroed 12 0xFF bytes at offset 0x52E4
  DDR step / SPL / U-Boot complete                    ✓
Phase 2: Flash via TFTP
  U-Boot   → 0x000000 (248775 B)  Flash verified DF75ECC7
  kernel   → 0x050000 (1987111 B) Flash verified 49DA597E
  rootfs   → 0x350000 (7499776 B) Flash verified AF769A05
Setting boot environment                              ✓
Resetting device                                      ✓
Install complete! Device is rebooting into OpenIPC.
\`\`\`

12 new regression tests in `tests/test_protocol_standard.py`:
- `TestDetectSplSize` — gzip + LZMA detection, round-down, no-marker
fallback, scan-window bounds
- `TestZeroLongFfRuns` — threshold edge (11 vs 12), end-of-buffer,
multiple runs, no-op short-circuit, exact av300 padding pattern
- `TestSplTailNonFatalForFrameBlast` — succeeds on av300 without TAIL
ACK; still strictly fails on chips without prestep_data
- `TestWriteTimeoutRetry` — `_send_frame_with_retry` recovers from
transient write timeouts

402 tests pass. ruff clean. mypy clean on changed files.

## Test plan

- [ ] `uv run pytest tests/ -x --ignore=tests/fuzz` (402 pass)
- [ ] `uv run defib install -c hi3516av300 --firmware <openipc.tgz> -p
<port> --power-cycle --nor-size 32` against a real hi3516av300 NOR-32
board
- [ ] No regression on hi3516av200 install (uses the same
`_detect_spl_size` + `_zero_long_ff_runs` paths; av200 LZMA detection
still trusted as before)
- [ ] No regression on chips without `prestep_data` (SPL TAIL still
treated as fatal — covered by
`test_spl_tail_no_ack_fails_for_chip_without_prestep`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Dmitry Ilyin <widgetii@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IMX385 on Hi3516AV200

1 participant