Use xxh128 instead of sha256 for file content hashing#5842
Conversation
All hash_file() calls here are cache-invalidation keys, not security boundaries. xxh128 hashes a file in ~31 µs vs ~266 µs for sha256, which adds up across result-cache validation, source-locator scans, per-identifier reflection cache keys, and PHPDoc cache validation. Existing caches invalidate once (different hash values), then rebuild. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
533e342 to
86d63c3
Compare
|
we use how much is the end to end impact? |
Will get back to you with the exact impact, it was tested on PHP 8.4+ but mostly as a larger set of performance improvements which I then split in several smaller PRs to make it easier to review and accept/deny each. Putting in draft for now. |
|
Good question, and it sent me down exactly the right rabbit hole. I tested on PHP 8.5.7 on arm64 (Apple M4 Pro), and that turns out to be the crux: the PHP 8.4 sha256 optimization from php/php-src#15152 covers x86/x86_64 only (SHA-NI plus an SSE2 fallback). On arm64, php-src still runs the portable C implementation. Throughput on this machine:
That said, you are right to push on end-to-end impact, because for the real workload it is much smaller than the raw throughput gap suggests. PHPStan hashes many small files ( hyperfine end to end (base = ff2647a vs this PR, self-analysis at level 8, caches primed for warm runs and wiped per run for cold,
So on arm64 this is roughly −2% wall and −3% CPU on a cold run, −2% on a warm run. On x86 with SHA-NI the sha256 side gets 2–5x faster and the delta shrinks accordingly, probably into the noise. Honest summary: the win is real but small, and it is biggest exactly where php-src's sha256 acceleration does not reach — Apple Silicon dev machines and Graviton CI runners. After the earlier CI failures the PR now picks the algorithm per PHP version (xxh128 on 8.1+, sha256 below), so pre-8.1 behaviour is unchanged. If you don't think it clears the bar, closing it is fine by me; none of the other PRs depend on it functionally. |
|
This pull request has been marked as ready for review. |
What & why
Every
hash_file()call changed here produces a cache-invalidation key, not a securityboundary: result cache file hashes, source locator directory scans, per-identifier
reflection cache keys, PHPDoc cache validation, container config hashes, and FileMonitor.
The algorithm is picked per PHP version (
PHPStan\Internal\FileHashing::ALGORITHM):xxh128 on PHP 8.1+ where it exists, sha256 below, so 7.4–8.0 behaviour is unchanged.
The win is architecture-dependent. php-src's PHP 8.4 sha256 acceleration
(php/php-src#15152) is x86/x86_64-only (SHA-NI + SSE2); on arm64 — Apple Silicon dev
machines, Graviton CI — sha256 still runs the portable C code at ~268 MB/s vs
~25 GB/s for xxh128 (PHP 8.5.7, M4 Pro). For PHPStan's real workload (many small files,
3.4 KB average in
src/) file-open overhead dominates, so a full hashing pass over the1707-file
src/tree goes from 83 ms to 48 ms.Benchmarks
hyperfine end to end on arm64 (base = ff2647a, self-analysis level 8; caches primed for
warm runs, wiped per run for cold):
src/Type(383 files)src(1707 files)Roughly −2% wall / −3% CPU cold and −2% warm on arm64; on x86 with SHA-NI the delta
shrinks accordingly. Modest, but the cost is a one-line change per site. Existing caches
invalidate once because the hash values differ, then rebuild as usual.
Tests
make tests(12,711, green),make phpstan(full self-analysis),make cs,make lintand
make composer-dependency-analyserall pass. Analysis output is byte-identical to thebaseline.
🤖 Generated with Claude Code