Skip to content

Use xxh128 instead of sha256 for file content hashing#5842

Closed
SanderMuller wants to merge 1 commit into
phpstan:2.2.xfrom
SanderMuller:perf/xxh128-file-hashing
Closed

Use xxh128 instead of sha256 for file content hashing#5842
SanderMuller wants to merge 1 commit into
phpstan:2.2.xfrom
SanderMuller:perf/xxh128-file-hashing

Conversation

@SanderMuller

@SanderMuller SanderMuller commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What & why

Every hash_file() call changed here produces a cache-invalidation key, not a security
boundary: result cache file hashes, source locator directory scans, per-identifier
reflection cache keys, PHPDoc cache validation, container config hashes, and FileMonitor.

The algorithm is picked per PHP version (PHPStan\Internal\FileHashing::ALGORITHM):
xxh128 on PHP 8.1+ where it exists, sha256 below, so 7.4–8.0 behaviour is unchanged.

The win is architecture-dependent. php-src's PHP 8.4 sha256 acceleration
(php/php-src#15152) is x86/x86_64-only (SHA-NI + SSE2); on arm64 — Apple Silicon dev
machines, Graviton CI — sha256 still runs the portable C code at ~268 MB/s vs
~25 GB/s for xxh128 (PHP 8.5.7, M4 Pro). For PHPStan's real workload (many small files,
3.4 KB average in src/) file-open overhead dominates, so a full hashing pass over the
1707-file src/ tree goes from 83 ms to 48 ms.

Benchmarks

hyperfine end to end on arm64 (base = ff2647a, self-analysis level 8; caches primed for
warm runs, wiped per run for cold):

scenario base PR delta
cold, src/Type (383 files) 8.778 ± 0.172 s 8.637 ± 0.092 s −141 ms wall, −1.15 s user CPU
warm, src (1707 files) 583.4 ± 6.4 ms 571.9 ± 6.8 ms −12 ms wall, −28 ms user CPU

Roughly −2% wall / −3% CPU cold and −2% warm on arm64; on x86 with SHA-NI the delta
shrinks accordingly. Modest, but the cost is a one-line change per site. Existing caches
invalidate once because the hash values differ, then rebuild as usual.

Tests

make tests (12,711, green), make phpstan (full self-analysis), make cs, make lint
and make composer-dependency-analyser all pass. Analysis output is byte-identical to the
baseline.

🤖 Generated with Claude Code

All hash_file() calls here are cache-invalidation keys, not security
boundaries. xxh128 hashes a file in ~31 µs vs ~266 µs for sha256, which
adds up across result-cache validation, source-locator scans,
per-identifier reflection cache keys, and PHPDoc cache validation.

Existing caches invalidate once (different hash values), then rebuild.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@SanderMuller SanderMuller force-pushed the perf/xxh128-file-hashing branch from 533e342 to 86d63c3 Compare June 11, 2026 08:25
@staabm

staabm commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

we use sha256 because it is hardware optimized in PHP 8.4+, see #4656
in which php version did you test this PR?

how much is the end to end impact?

@SanderMuller SanderMuller marked this pull request as draft June 11, 2026 08:38
@SanderMuller

Copy link
Copy Markdown
Contributor Author

we use sha256 because it is hardware optimized in PHP 8.4+, see #4656 in which php version did you test this PR?

how much is the end to end impact?

Will get back to you with the exact impact, it was tested on PHP 8.4+ but mostly as a larger set of performance improvements which I then split in several smaller PRs to make it easier to review and accept/deny each. Putting in draft for now.

@SanderMuller

Copy link
Copy Markdown
Contributor Author

Good question, and it sent me down exactly the right rabbit hole. I tested on PHP 8.5.7 on arm64 (Apple M4 Pro), and that turns out to be the crux: the PHP 8.4 sha256 optimization from php/php-src#15152 covers x86/x86_64 only (SHA-NI plus an SSE2 fallback). On arm64, php-src still runs the portable C implementation. Throughput on this machine:

algo hash() throughput
sha256 268 MB/s
sha1 987 MB/s
xxh128 25.3 GB/s

That said, you are right to push on end-to-end impact, because for the real workload it is much smaller than the raw throughput gap suggests. PHPStan hashes many small files (src/ here: 1707 files, 5.6 MB, 3.4 KB average), and at that size the file-open overhead dominates: a full hashing pass over that tree takes 83 ms with sha256 vs 48 ms with xxh128 — 1.7x, not 95x.

hyperfine end to end (base = ff2647a vs this PR, self-analysis at level 8, caches primed for warm runs and wiped per run for cold, --runs 10..15):

scenario base PR delta
cold, src/Type (383 files) 8.778 ± 0.172 s 8.637 ± 0.092 s −141 ms wall, −1.15 s user CPU
warm, src (1707 files) 583.4 ± 6.4 ms 571.9 ± 6.8 ms −12 ms wall, −28 ms user CPU

So on arm64 this is roughly −2% wall and −3% CPU on a cold run, −2% on a warm run. On x86 with SHA-NI the sha256 side gets 2–5x faster and the delta shrinks accordingly, probably into the noise.

Honest summary: the win is real but small, and it is biggest exactly where php-src's sha256 acceleration does not reach — Apple Silicon dev machines and Graviton CI runners. After the earlier CI failures the PR now picks the algorithm per PHP version (xxh128 on 8.1+, sha256 below), so pre-8.1 behaviour is unchanged. If you don't think it clears the bar, closing it is fine by me; none of the other PRs depend on it functionally.

@SanderMuller SanderMuller marked this pull request as ready for review June 11, 2026 09:08
@phpstan-bot

Copy link
Copy Markdown
Collaborator

This pull request has been marked as ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants