fix(dvcr): upload images as uncompressed layers to remove gzip bottle…#2552
Draft
universal-itengineer wants to merge 4 commits into
Draft
fix(dvcr): upload images as uncompressed layers to remove gzip bottle…#2552universal-itengineer wants to merge 4 commits into
universal-itengineer wants to merge 4 commits into
Conversation
…neck The DVCR importer compressed each image into a single gzip layer via go-containerregistry stream.NewLayer (gzip.BestSpeed, single goroutine). gzip is CPU-bound and the provisioning pod is limited to 750m CPU, so imports of large disk images were capped at ~4 MB/s. Measured on cluster: the same source URL downloads at 133 MB/s while the importer container sits pegged at exactly its 750m CPU limit, so the network and DVCR are not the bottleneck. Disk images barely compress (qcow2 626MB->598Mi, ISO 2.9GB->2.7Gi), so an uncompressed tar layer is roughly the same size with near-zero CPU. Replace stream.NewLayer with a custom single-pass streaming uncompressed layer (application/vnd.docker.image.rootfs.diff.tar). containerd and CDI/containers-image read uncompressed layers transparently. Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
1c27ac7 to
9466482
Compare
remote's pusher detects a streaming layer via errors.Is(err, stream.ErrNotComputed) (pusher.go writeLayer/manifest paths) and only then routes the upload through its lazy digest-after-consume path. The uncompressed layer returned its own sentinel errors, so the pusher treated "value not computed until stream is consumed" as fatal and the import failed after retries. Return go-containerregistry's stream.ErrNotComputed (Digest/DiffID/Size) and stream.ErrConsumed (Compressed/Uncompressed) so the layer is handled identically to stream.Layer. The unit test now asserts the exported sentinels, guarding against this regression. Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
TEMPORARY debug hook (to be removed before merge): start net/http/pprof on :6060 in the dvcr-importer so a CPU profile can be captured during an import via kubectl port-forward + go tool pprof. Used to locate the single-threaded ~1-core bottleneck that caps import throughput at ~4.6 MB/s regardless of the provisioning pod CPU limit. Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
CPU profiling of the importer showed ~95% of CPU spent in GOST (Kuznyechik/MGM) TLS encryption of the layer upload to the in-cluster DVCR: gost3412128.l alone was 76% flat. The GOST cipher is pure-software (no hardware acceleration) and caps upload throughput at a few MB/s on a single core, which is the real DVCR import bottleneck (gzip was a red herring: incompressible disk images send ~the same bytes through the cipher either way). Pin the destination TLS client to TLS 1.2 with AES-GCM cipher suites so the AES-NI accelerated cipher is negotiated instead of GOST. TLS 1.3 ignores CipherSuites, so the version is capped for the list to apply. Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…neck
The DVCR importer compressed each image into a single gzip layer via go-containerregistry stream.NewLayer (gzip.BestSpeed, single goroutine). gzip is CPU-bound and the provisioning pod is limited to 750m CPU, so imports of large disk images were capped at ~4 MB/s.
Measured on cluster: the same source URL downloads at 133 MB/s while the importer container sits pegged at exactly its 750m CPU limit, so the network and DVCR are not the bottleneck. Disk images barely compress (qcow2 626MB->598Mi, ISO 2.9GB->2.7Gi), so an uncompressed tar layer is roughly the same size with near-zero CPU.
Replace stream.NewLayer with a custom single-pass streaming uncompressed layer (application/vnd.docker.image.rootfs.diff.tar). containerd and CDI/containers-image read uncompressed layers transparently.
Description
Why do we need it, and what problem does it solve?
What is the expected result?
Checklist
Changelog entries