Skip to content

Content app OOMs serving large S3 artifacts with REDIRECT_TO_OBJECT_STORAGE=False #7806

@vycius

Description

@vycius

Version

  • pulpcore: 3.114.0
  • Install method: pulp-oci-images container

Describe the bug

When the storage backend is S3 (django-storages) and REDIRECT_TO_OBJECT_STORAGE = False, downloading a large artifact through the content app causes the gunicorn content worker to be killed (timeout / OOM) and the client receives a 502.

Root cause: in this configuration the content app serves the artifact through ArtifactResponse (pulpcore/content/handler.py:1122-1123). ArtifactResponse._sendfile (pulpcore/responses.py:35) reads in 256 KB chunks from self._artifact.file, but that file object is a django-storages S3File. The first seek/read triggers S3File._get_file(), which calls obj.download_fileobj() and materializes the entire object into a SpooledTemporaryFile(max_size=AWS_S3_MAX_MEMORY_SIZE) before returning a single byte. Since AWS_S3_MAX_MEMORY_SIZE defaults to 0, that spooled buffer never rolls over to disk and the whole object is held in memory. So the chunked _sendfile provides no streaming benefit for S3: the worker tries to buffer the full artifact in RAM (→ OOM kill), and because download_fileobj is one long blocking call, the worker also stops responding to gunicorn's heartbeat (→ WORKER TIMEOUT). Setting REDIRECT_TO_OBJECT_STORAGE = True avoids this entirely because pulp returns a 302 to a presigned URL and no bytes pass through the worker.

Observed worker log:

[2026-06-17 08:59:42 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:794531)
[2026-06-17 08:59:43 +0000] [1] [ERROR] Worker (pid:794531) was sent SIGKILL! Perhaps out of memory?

To Reproduce

  1. Configure pulpcore with S3 object storage (STORAGES/AWS_*), use default AWS_S3_MAX_MEMORY_SIZE (0).
  2. Set REDIRECT_TO_OBJECT_STORAGE = False.
  3. Upload a large file to a pulp_file repository — large enough to exceed the content worker's available memory.
  4. Publish/distribute it and download via the content app: curl -fL -o /dev/null https://<pulp>/pulp/content/<distribution>/<file>.
  5. Observe the download stall, the content worker get SIGKILLed (WORKER TIMEOUT / OOM as above), and the client receive a 502.
  6. Set REDIRECT_TO_OBJECT_STORAGE = True and repeat — the download succeeds.

Expected behavior

With REDIRECT_TO_OBJECT_STORAGE = False, the content app should stream the artifact from object storage with bounded (chunk-sized) memory regardless of artifact size, rather than buffering the entire object in the worker. Serving a large artifact should not OOM or time out the worker.

Additional context

The chunking in ArtifactResponse._sendfile is effectively defeated for cloud backends because django-storages' S3File eagerly downloads the whole object on first access. A fix would bypass that file object on the serve path and stream directly from boto3's StreamingBody (e.g. a ranged GetObject), preserving the per-object download params django-storages would normally pass (SSE-C, RequestPayer, VersionId). The same buffering shape affects the Azure and GCS backends under REDIRECT_TO_OBJECT_STORAGE = False. As a workaround, REDIRECT_TO_OBJECT_STORAGE = True resolves it whenever clients can reach object storage directly. Probably this also affects other network storage backends.

Issue creation was assisted by Claude Opus 4.8

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions