Skip to content

fix(dvcr): upload images as uncompressed layers to remove gzip bottle…#2552

Draft
universal-itengineer wants to merge 4 commits into
mainfrom
fix/dvcr-importer-uncompressed-layer
Draft

fix(dvcr): upload images as uncompressed layers to remove gzip bottle…#2552
universal-itengineer wants to merge 4 commits into
mainfrom
fix/dvcr-importer-uncompressed-layer

Conversation

@universal-itengineer

Copy link
Copy Markdown
Member

…neck

The DVCR importer compressed each image into a single gzip layer via go-containerregistry stream.NewLayer (gzip.BestSpeed, single goroutine). gzip is CPU-bound and the provisioning pod is limited to 750m CPU, so imports of large disk images were capped at ~4 MB/s.

Measured on cluster: the same source URL downloads at 133 MB/s while the importer container sits pegged at exactly its 750m CPU limit, so the network and DVCR are not the bottleneck. Disk images barely compress (qcow2 626MB->598Mi, ISO 2.9GB->2.7Gi), so an uncompressed tar layer is roughly the same size with near-zero CPU.

Replace stream.NewLayer with a custom single-pass streaming uncompressed layer (application/vnd.docker.image.rootfs.diff.tar). containerd and CDI/containers-image read uncompressed layers transparently.

Description

Why do we need it, and what problem does it solve?

What is the expected result?

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section:
type:
summary:

…neck

The DVCR importer compressed each image into a single gzip layer via
go-containerregistry stream.NewLayer (gzip.BestSpeed, single goroutine).
gzip is CPU-bound and the provisioning pod is limited to 750m CPU, so
imports of large disk images were capped at ~4 MB/s.

Measured on cluster: the same source URL downloads at 133 MB/s while the
importer container sits pegged at exactly its 750m CPU limit, so the
network and DVCR are not the bottleneck. Disk images barely compress
(qcow2 626MB->598Mi, ISO 2.9GB->2.7Gi), so an uncompressed tar layer is
roughly the same size with near-zero CPU.

Replace stream.NewLayer with a custom single-pass streaming uncompressed
layer (application/vnd.docker.image.rootfs.diff.tar). containerd and
CDI/containers-image read uncompressed layers transparently.

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
@universal-itengineer universal-itengineer force-pushed the fix/dvcr-importer-uncompressed-layer branch from 1c27ac7 to 9466482 Compare June 29, 2026 14:11
remote's pusher detects a streaming layer via
errors.Is(err, stream.ErrNotComputed) (pusher.go writeLayer/manifest
paths) and only then routes the upload through its lazy
digest-after-consume path. The uncompressed layer returned its own
sentinel errors, so the pusher treated "value not computed until stream
is consumed" as fatal and the import failed after retries.

Return go-containerregistry's stream.ErrNotComputed (Digest/DiffID/Size)
and stream.ErrConsumed (Compressed/Uncompressed) so the layer is handled
identically to stream.Layer. The unit test now asserts the exported
sentinels, guarding against this regression.

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
TEMPORARY debug hook (to be removed before merge): start net/http/pprof
on :6060 in the dvcr-importer so a CPU profile can be captured during an
import via kubectl port-forward + go tool pprof. Used to locate the
single-threaded ~1-core bottleneck that caps import throughput at
~4.6 MB/s regardless of the provisioning pod CPU limit.

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
CPU profiling of the importer showed ~95% of CPU spent in GOST
(Kuznyechik/MGM) TLS encryption of the layer upload to the in-cluster
DVCR: gost3412128.l alone was 76% flat. The GOST cipher is pure-software
(no hardware acceleration) and caps upload throughput at a few MB/s on a
single core, which is the real DVCR import bottleneck (gzip was a red
herring: incompressible disk images send ~the same bytes through the
cipher either way).

Pin the destination TLS client to TLS 1.2 with AES-GCM cipher suites so
the AES-NI accelerated cipher is negotiated instead of GOST. TLS 1.3
ignores CipherSuites, so the version is capped for the list to apply.

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant