feat: data edit operations — batch delete, copy rejection, wider write headers#88
Conversation
Extends data-edit support beyond the existing single-object write/delete and
multipart paths:
- DeleteObjects (`POST /{bucket}?delete`): new `S3Operation::DeleteObjects`,
parsed from the `delete` subresource. The keys live in the body, so it goes
through the `NeedsBody` path. `api/delete.rs` owns the pure XML parse/build/
merge; `proxy.rs::execute_delete_objects` authorizes each key individually
(S3 partial-result semantics — denied keys become per-key `AccessDenied`
entries and never reach the backend) and forwards the rest with the required
`Content-MD5` (new `md-5` dep). S3-gated like multipart.
- Authorization: `authorize` does a coarse bucket-level check for DeleteObjects
(it carries no key); new `auth::key_authorized` does the per-key prefix check.
`PendingRequest` now carries the resolved identity so the body handler can run
it. Anonymous callers still cannot write.
- CopyObject/UploadPartCopy: a `PUT` with `x-amz-copy-source` is now rejected
with `501 NotImplemented` (new `ProxyError::NotImplemented`) instead of being
mis-parsed as an empty PutObject that would overwrite the destination.
- PutObject: forward standard HTTP entity headers (Content-Disposition,
-Encoding, -Language, Cache-Control, Expires) alongside type/length/md5.
`x-amz-*` write headers stay dropped — S3 rejects unsigned `x-amz-*` on
presigned URLs, so those need a header-signing path (deferred).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- reference/operations.md: add DeleteObjects, document per-key batch-delete authorization, write request-header forwarding, and the new limitations (server-side copy unsupported, x-amz-* write headers dropped, versioned delete ignored). - integration: add a batch-delete roundtrip test (put 3 → delete_objects → assert all gone). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
📖 Docs preview deployed to https://multistore-docs-pr-88.development-seed.workers.dev
|
Multipart had zero end-to-end coverage — and the integration static credential and github-actions role weren't even granted the multipart actions, so the path could never have been tested. - Grant create/upload_part/complete/abort_multipart_upload on private-uploads for both the static credential and the github-actions role. - Add TestMultipartUploads: a high-level transfer-manager round-trip (forces Create + 2x UploadPart + Complete), an explicit low-level round-trip, and an Abort test. Each verifies the object bytes (or that an aborted upload cannot be completed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
🚀 Latest commit deployed to https://multistore-proxy-pr-88.development-seed.workers.dev
|
`canonicalize_query_string` sorted the raw `&`-split parts but never normalized a value-less flag parameter to `key=value`. SigV4 requires `?uploads` to canonicalize as `uploads=` (trailing `=`); clients and backends both sign it that way. The proxy reconstructed `uploads`, so every operation whose query is a value-less flag failed signature verification. This silently broke CreateMultipartUpload (`?uploads`) and DeleteObjects (`?delete`) on both sides of the proxy — they were never exercised by the test suite, so it went unnoticed. The integration tests added alongside this change surface it as `SignatureDoesNotMatch`. - `auth/sigv4.rs`: append `=` to value-less params (and drop empty segments); unit tests for flag, valued, mixed, and empty-segment cases. - `backend/request_signer.rs`: route the outbound canonical query through the same helper so the proxy's own signature to the backend matches too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- test_put_larger_body_single_request: a 2 MiB single PUT round-trips intact (non-trivial, streamed, still below the multipart threshold). - test_put_preserves_content_headers: Content-Type / Content-Disposition / Cache-Control set on PUT survive the round-trip, exercising the widened PUT forward allowlist. `x-amz-meta-*` is intentionally not asserted (deferred). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`send_raw` read the response body via `worker_resp.bytes().await` and *then*
converted `worker::Response` into `web_sys::Response` to read the headers. That
conversion panics ("unreachable") once the body has been consumed, so every
`send_raw` response aborted the worker with a bare 500 — breaking all multipart
operations and batch delete on the edge runtime. It went unnoticed because no
test exercised the raw-HTTP path on Workers.
Convert to `web_sys::Response` and read the headers first (as `forward()`
already does), then read the body via `arrayBuffer()`. Verified end-to-end
against `wrangler dev` + MinIO: multipart round-trip, abort, and batch delete
all pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Running the integration suite locally previously meant six manual steps (docker compose, worker-build, wrangler dev, wait, pytest, cleanup). Wrap them in a single command that mirrors CI: MinIO via docker compose behind the Workers runtime (wrangler dev), with wrangler torn down on exit. - scripts/integration-test.sh: orchestrates the stack; polls a seeded public object for MinIO readiness (avoids `--wait`, which fails on the one-shot seeder), auto-installs worker-build if missing, and traps cleanup. - Makefile: `make test-integration` (extra pytest args via ARGS). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document that the Workers runtime is bound by Cloudflare's request-body size limit (100 MB Free/Pro … 500 MB Enterprise), enforced at the edge with a 413. Guidance: use multipart for large objects (only part size must stay under the limit) and configure clients to chunk below it. Cross-references the streaming UploadPart follow-up (#89) and notes the native/Lambda runtimes differ. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cloudflare Workers rejects request bodies over the plan limit with an opaque edge 413 before the proxy can shape the response. Give clients an actionable S3 error instead: when configured, reject a `PutObject`/`UploadPart` whose declared `Content-Length` exceeds a maximum with `EntityTooLarge` (HTTP 400). - core: add `ProxyError::EntityTooLarge` (→ S3 `EntityTooLarge`, 400) and `ProxyGateway::with_max_request_body_size`. `check_upload_size` runs in the PutObject and UploadPart dispatch arms; no limit (the default) preserves current behavior, and requests without a `Content-Length` fall through to the runtime's own limit. Unit tests cover over/under/no-limit for both ops. - cf-workers example: read the limit from the `MAX_UPLOAD_BYTES` env var. - integration: set `MAX_UPLOAD_BYTES=10 MiB` in the test config (above the part sizes the multipart tests use) and assert a 12 MiB PUT is rejected with `EntityTooLarge`. - docs: document the knob under the Workers upload-size limitations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Small simplifications, no behavior change: - `build_object_path` was a duplicate of `apply_backend_prefix` with a Path return type; collapse it to `Path::from(apply_backend_prefix(...))`. - `with_max_request_body_size` took `Option<u64>` but every caller passed `Some(_)`; take `u64` and wrap internally. - Drop the unused `Default` derive on `BackendOutcome` (only ever built with an explicit literal). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
End-to-end coverage for the two behaviors that were only unit-tested: - Add a prefix-restricted static credential (`AKTEST000000000002`, scoped to the `allowed/` prefix on private-uploads). - test_batch_delete_partial_authorization: a batch delete of one in-scope and one out-of-scope key deletes the allowed one, reports the other as AccessDenied, and verifies the out-of-scope key is NOT deleted (the security property) while the in-scope one is gone. - test_copy_object_rejected: copy_object (x-amz-copy-source) returns 501 NotImplemented and does not create the destination. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`make test-integration` started MinIO but left it running, so a one-off run leaked containers. Now it tears MinIO down on exit — but only if the script started it. If MinIO is already up (a dev iterating with `docker compose up -d`), it's left warm so repeated runs stay fast and we never stop a stack we didn't start. wrangler dev is always stopped on exit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
No behavior change (covered by existing unit + integration tests):
- Unify the S3-backend gate: rename `supports_s3_multipart` → `is_s3_backend`
and use it in both the multipart and batch-delete dispatch arms (the latter
hand-rolled the same `parsed_backend_type() == S3` check).
- `canonicalize_query_string`: use `Cow<str>` so the common `key=value` params
aren't copied on the per-request signing/verification path (only value-less
flags allocate).
- `strip_backend_prefix`: strip `{prefix}/` via `strip_prefix` chaining instead
of a per-key `format!` allocation.
- Docs: the `multipart`/`request_signer` module docs and the cf-workers
architecture comment now note batch delete also uses the raw-signed path;
`with_max_request_body_size` docs note it covers DeleteObjects too.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The batch-delete wire-format helpers were all `pub`, but nothing outside the core crate uses them — they're internal plumbing an integrator would never call (integrators build custom resolvers/backends, not S3 XML). Narrow them to `pub(crate)`, matching the sibling `api::list` module, which scopes its equivalent `build_list_xml`/`build_list_prefix` helpers `pub(crate)`. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`check_upload_size` (added in this PR) re-implemented the `content-length` header parse that `handle_request` already had as a nested fn. Extract one module-level `content_length(&HeaderMap) -> Option<u64>` and use it in both; `response_body_bytes` stays nested (genuinely single-use). No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
execute_delete_objects hard-coded per-key authorization against the caller's STS scopes via auth::key_authorized, while the coarse path (get_bucket) lets the registry own authorization. A registry that authorizes through an external API — resolving permission at get_bucket time, with empty STS scopes — had no way to allow batch-delete keys; every key was denied even for authorized callers. Add BucketRegistry::authorize_key with a default that preserves the existing scope check, and call it from the gateway instead of the free function. API-driven registries can override it to honor the bucket-level decision they already made in get_bucket. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed a small follow-on commit to this branch: Why: |
MappedRegistry forwards get_bucket/list_buckets/bucket_owner but inherited the default authorize_key, which bypasses the inner registry entirely (it checks the caller's STS scopes directly). An inner registry that overrides authorize_key — e.g. one authorizing batch-delete keys via an external API — was silently ignored, so every key fell to the scope check. Forward it like the other methods. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What I'm changing
The gateway already proxies single-object writes, deletes, and the four multipart operations. But three real gaps remained for data-editing clients:
aws s3 rm --recursive,aws s3 sync --delete, anddelete_objectsall issuePOST /{bucket}?delete, which the parser rejected withunsupported POST operation.CopyObjectsilently corrupted data. APUTcarryingx-amz-copy-sourcewas mis-parsed as a plainPutObject; the copy-source header was dropped and the destination would be overwritten with an empty body.PutObjectforwarded only content-type/length/md5, losing standard entity headers likeContent-DispositionandCache-Control.This PR closes those gaps. It deliberately does not rebuild the write path around streaming/header-signing: a wire capture against modern aws-cli (2.27) showed
put_object/s3 cpsend a plain body with a realx-amz-content-sha256— notaws-chunked— so the originally-feared upload corruption does not reproduce, and that larger refactor is deferred (the only thing it unlocks, forwardingx-amz-*write headers, is documented as a known limitation). Design notes and the deferred work live in.plans/and inponytail:comments at the relevant sites.How I did it
types.rs— newS3Operation::DeleteObjects { bucket }; method/action/ bucket/key arms updated. Batch delete authorizes asAction::DeleteObject.error.rs— newProxyError::NotImplemented→ S3501 NotImplemented.api/request.rs— rejectPUT+x-amz-copy-sourcewithNotImplementedinparse_s3_request; parsePOST ?delete→DeleteObjects.api/delete.rs(new) — pure, I/O-free batch-delete logic: parse the inbound<Delete>body, build the backend<Delete>body, parse the backend<DeleteResult>(toleratingVersionId/DeleteMarker), and serialize the merged client<DeleteResult>(XML-escaped, quiet-mode aware). Unit-tested.auth/authorize.rs—authorizedoes a coarse bucket-level check forDeleteObjects(it carries no key); newkey_authorizeddoes the per-key prefix check used by the body handler.route_handler.rs—PendingRequestcarries the resolvedidentityso the deferred body phase can run per-key authorization.proxy.rs—execute_delete_objects: per-key authz (denied keys become per-keyAccessDeniedentries and are never forwarded — S3 partial-result semantics), forward allowed keys with the requiredContent-MD5, merge results.PutObjectforward list widened with standard entity headers.build_object_pathgains string-keyapply_backend_prefix/strip_backend_prefixsiblings and acontent_md5helper.backend/multipart.rs—build_backend_urlarm forDeleteObjects({base}/{bucket}?delete).md-5(already in the tree transitively) forContent-MD5.reference/operations.mdupdated; integration test for a batch-delete roundtrip; unit tests for parsing, per-key authz, and an end-to-end batch delete proving denied keys never reach the backend.Test plan
cargo fmt --checkcargo test(workspace) — 90 core tests pass, incl. new delete/authz/parse casescargo clippy --all-targets(no new warnings; one pre-existing warning inauth/tests.rs)cargo check -p multistore-cf-workers --target wasm32-unknown-unknowntest_batch_delete(runs against MinIO + wrangler)