Skip to content

Conversation

@edsiper
Copy link
Member

@edsiper edsiper commented Dec 2, 2025

When using pre-signed URLs for blob uploads via authorization_endpoint_url), the plugin didn't extract or use the host from the pre-signed URL. It treated the URL as a URI path, so requests went to the wrong host or failed.

This patch added s3_parse_presigned_url() to parse pre-signed URLs and extract host, URI, and port Updated put_blob_object(), complete_multipart_upload(), and abort_multipart_upload() to:

  • Extract the host from the pre-signed URL
  • Temporarily set ctx->s3_client->host to the extracted host
  • Validate the port matches the configuration
  • Restore the original host after the request

Now blob uploads using pre-signed URLs now correctly use the host specified in the URL instead of the default S3 client host.

Example test:

service:
  flush: 1
  log_level: debug

pipeline:
  inputs:
    - name: blob
      path: '/tmp/the_project_gutenberg_ebook.txt'
      database_file: blob.db
      upload_success_suffix: emit_log

  outputs:
    - name: s3
      match: '*'
      bucket: fluent-bit-s3-integration-blob
      region: us-east-1
      blob_database_file: s3-blob.db

      # File buffering before upload
      total_file_size: 1M
      upload_timeout: 5s

      # Blob upload settings
      upload_parts_timeout: 10s

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • More robust presigned URL handling: improved parsing and validation of host, URI, and port.
    • Stronger error recovery and consistent resource cleanup across single and multipart S3 operations.
    • Ensures port/host consistency when using presigned endpoints, avoiding partial failures and leaks.

✏️ Tip: You can customize this high-level summary in your review settings.

When using pre-signed URLs for blob uploads via authorization_endpoint_url),
the plugin didn't extract or use the host from the pre-signed URL. It treated
the URL as a URI path, so requests went to the wrong host or failed.

This patch added s3_parse_presigned_url() to parse pre-signed URLs and extract
host, URI, and port Updated put_blob_object(), complete_multipart_upload(),
and abort_multipart_upload() to:

- Extract the host from the pre-signed URL
- Temporarily set ctx->s3_client->host to the extracted host
- Validate the port matches the configuration
- Restore the original host after the request

Now blob uploads using pre-signed URLs now correctly use the host specified in
the URL instead of the default S3 client host.

Signed-off-by: Eduardo Silva <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

📝 Walkthrough

Walkthrough

Adds a presigned-URL parsing helper and integrates it into blob and multipart S3 upload flows; when a presigned URL is used the plugin parses host/URI/port, temporarily replaces the client host/upstream, validates ports, and centralizes cleanup to restore original state and free resources.

Changes

Cohort / File(s) Summary
Presigned URL parsing infrastructure
plugins/out_s3/s3.h, plugins/out_s3/s3.c
New public helper s3_parse_presigned_url() added and declared. Parses a presigned URL into host, URI, and port; returns allocated strings and resolved port, with input validation and error cleanup.
Put blob object integration
plugins/out_s3/s3.c
put_blob_object() updated to fetch and parse a presigned URL (when used), swap the S3 client host/upstream to the parsed host, validate port consistency, and restore original host/port on cleanup; early returns replaced by unified cleanup paths.
Multipart operations integration
plugins/out_s3/s3_multipart.c
Multipart create/complete/abort/upload_part paths now parse presigned URLs, swap/restore host/upstream, validate ports, delay URI construction until needed, and use centralized cleanup labels with a result variable to ensure consistent resource deallocation.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Plugin as S3 Plugin
    participant Client as S3 Client / Upstream
    participant Presigned as Presigned URL Host (S3)
    Note over Plugin,Client: Normal flow without presigned URL
    Plugin->>Client: Use configured host, construct URI, send request
    Client->>Presigned: HTTP request to configured host
    Presigned-->>Client: Response
    Client-->>Plugin: Return response

    Note over Plugin,Presigned: Presigned-URL flow (new)
    Plugin->>Plugin: fetch presigned_full URL
    Plugin->>Plugin: s3_parse_presigned_url(url) → presigned_host, uri, port
    Plugin->>Client: swap host/upstream → presigned_host
    Plugin->>Client: construct request using parsed URI and port
    Client->>Presigned: HTTP request to presigned_host:port
    Presigned-->>Client: Response
    Client-->>Plugin: Return response
    Plugin->>Client: restore original host/upstream
    Plugin->>Plugin: free presigned resources
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Resource cleanup consistency: Verify all error paths restore original host/upstream and free presigned_host, presigned_uri, bodies, and headers without leaks.
  • Port validation logic: Check scheme/default-port handling and cases where presigned port mismatches configured client port.
  • State management: Review host/upstream swap and restore for race conditions and concurrency safety.
  • Error propagation: Ensure unified cleanup labels are reached on all control-flow branches.

Poem

🐰 I nibbled at a presigned string,
Split host and path and made them sing,
Swapped a hop, then hopped it back,
Cleaned my crumbs — no resource lack,
A tiny rabbit’s parsing spring.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: fixing the handling of pre-signed URLs in the S3 output plugin's blob upload functionality.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@edsiper edsiper assigned edsiper and unassigned edsiper Dec 2, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
plugins/out_s3/s3.c (1)

2590-2606: Consider extracting common presigned URL host-swap pattern.

The presigned URL handling pattern (parse → validate port → swap host → cleanup) is repeated across five functions (put_blob_object, complete_multipart_upload, abort_multipart_upload, create_multipart_upload, upload_part).

A helper struct + setup/teardown functions could reduce duplication:

struct presigned_context {
    flb_sds_t host;
    flb_sds_t uri;
    const char *original_host;
    int port;
};

int presigned_setup(struct flb_s3 *ctx, const char *url, struct presigned_context *pc);
void presigned_cleanup(struct flb_s3 *ctx, struct presigned_context *pc);

This is optional and can be deferred if the current approach is preferred for clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8f1e4b and 1b478a2.

📒 Files selected for processing (3)
  • plugins/out_s3/s3.c (7 hunks)
  • plugins/out_s3/s3.h (1 hunks)
  • plugins/out_s3/s3_multipart.c (15 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
  • s3_parse_presigned_url (2208-2284)
plugins/out_s3/s3_multipart.c (1)
plugins/out_s3/s3.c (1)
  • s3_parse_presigned_url (2208-2284)
plugins/out_s3/s3.c (3)
src/flb_utils.c (1)
  • flb_utils_url_split (1441-1534)
src/flb_sds.c (2)
  • flb_sds_create (78-90)
  • flb_sds_destroy (389-399)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
🔇 Additional comments (8)
plugins/out_s3/s3.h (1)

221-224: LGTM!

The function declaration is well-structured with appropriate parameters for parsing presigned URLs. The signature matches the implementation in s3.c and follows the existing coding conventions in this header.

plugins/out_s3/s3_multipart.c (4)

405-516: LGTM - Cleanup flow is well-structured.

The presigned URL handling and cleanup logic is correctly implemented:

  • Resources (body, presigned_host, uri) are properly freed in all paths
  • original_host restoration is guarded by NULL check
  • Early returns before resource allocation don't leak memory

518-615: LGTM!

The abort_multipart_upload function follows the same consistent pattern as complete_multipart_upload. Resource cleanup is properly handled via the abort_cleanup label.


617-723: LGTM!

The create_multipart_upload function correctly places the cleanup section before response handling. This ensures that the host is restored and resources are freed regardless of the response outcome, with subsequent returns exiting cleanly.


769-910: LGTM!

The upload_part function follows the established pattern correctly. Headers are freed immediately after the request (lines 847-850), and the cleanup section properly handles host restoration and URI cleanup.

plugins/out_s3/s3.c (3)

2208-2284: LGTM - Robust URL parsing with proper cleanup.

The implementation correctly:

  • Validates required parameters upfront
  • Handles port resolution from explicit port or scheme defaults
  • Creates owned flb_sds_t copies for output
  • Frees all temporary allocations in both success and error paths

2583-2607: LGTM - Presigned URL handling is correctly integrated.

The presigned path properly:

  • Fetches the full presigned URL and immediately parses it
  • Destroys the temporary presigned_full after parsing to avoid leaks
  • Validates port consistency before swapping the host
  • Cleans up presigned_host and uri on port validation failure

2660-2670: LGTM - Cleanup section is well-designed.

The cleanup label consolidates all resource deallocation and state restoration, ensuring consistent behavior across all exit paths (success, failure, and early error conditions).

@edsiper
Copy link
Member Author

edsiper commented Dec 2, 2025

Ping @singholt @sparrc (pls if you can take a look at these changes)

@edsiper edsiper added this to the Fluent Bit v4.2.1 milestone Dec 2, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 440 to 441
original_host = ctx->s3_client->host;
ctx->s3_client->host = presigned_host;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pre-signed host never used for network connection

The new pre-signed flow swaps ctx->s3_client->host to the parsed host before issuing the request (plugins/out_s3/s3_multipart.c:440-441), but the HTTP connection still comes from ctx->s3_client->upstream, which is created once with the configured endpoint (plugins/out_s3/s3.c:984-989) and reused in request_do (src/aws/flb_aws_util.c:342-376). When the pre-signed URL points at a different host—the case this change is meant to fix—the TCP/TLS connection (and SNI) stay bound to the original endpoint, so the request is still sent to the wrong server even though the Host header is rewritten. The upstream needs to be rebuilt or the connection opened against the pre-signed host for these URLs to work.

Useful? React with 👍 / 👎.

@sparrc
Copy link

sparrc commented Dec 4, 2025

When using pre-signed URLs for blob uploads via authorization_endpoint_url), the plugin didn't extract or use the host from the pre-signed URL. It treated the URL as a URI path, so requests went to the wrong host or failed.

Is the test config that you have provided missing the authorization_endpoint_url parameter?

- Presigned multipart uploads now temporarily override both the S3 client host and upstream
  target when using authorization-generated URLs, ensuring each multipart operation connects
  to the correct presigned endpoint before restoring the original settings.

- Presigned single-part blob uploads likewise update the upstream host/port while the
  presigned URL is in use so requests are sent to the right destination.

Signed-off-by: Eduardo Silva <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @plugins/out_s3/s3.c:
- Around line 2607-2612: The presigned URL host/port swap (setting
ctx->s3_client->host, ctx->s3_client->upstream->tcp_host and tcp_port using
presigned_host/presigned_port) is insufficient because flb_upstream_conn_get()
may return an existing keepalive socket already connected to the original
endpoint; requests will then go to the wrong host. Fix by detecting when
presigned_host/presigned_port differ from the upstream's original
tcp_host/tcp_port and in that case avoid using pooled connections: either
invalidate or bypass the pool and force creation of a new connection for this
request (i.e., do not call flb_upstream_conn_get() for presigned requests, or
clear/mark the pool entry so a fresh socket is established), ensuring the socket
is actually connected to presigned_host/presigned_port before issuing the HTTP
request.
🧹 Nitpick comments (1)
plugins/out_s3/s3_multipart.c (1)

701-746: Unusual control flow: cleanup runs before response handling.

The cleanup label (lines 703-713) executes mid-function, before the response handling logic (lines 715-746). While this works because:

  1. The cleanup frees uri and presigned_host which aren't needed for response handling
  2. Early returns at lines 727 and 734 are safe since cleanup already ran

However, this structure is non-standard and could confuse future maintainers. Typically, cleanup runs at the end with the response handling setting a result variable.

Consider restructuring to match the pattern used in complete_multipart_upload and upload_part, where cleanup is at the very end and response handling sets a result variable:

Suggested structure
-    ret = -1;
-
-cleanup:
-    if (original_host != NULL) {
-        ctx->s3_client->host = original_host;
-        ctx->s3_client->upstream->tcp_host = original_upstream_host;
-        ctx->s3_client->upstream->tcp_port = original_upstream_port;
-        flb_sds_destroy(presigned_host);
-    }
-
-    if (uri != NULL) {
-        flb_sds_destroy(uri);
-    }
-
     if (c) {
         flb_plg_debug(ctx->ins, "CreateMultipartUpload http status=%d",
                       c->resp.status);
         if (c->resp.status == 200) {
             tmp = flb_aws_xml_get_val(c->resp.payload, c->resp.payload_size,
                                   "<UploadId>", "</UploadId>");
             if (!tmp) {
                 flb_plg_error(ctx->ins, "Could not find upload ID in "
                               "CreateMultipartUpload response");
                 flb_plg_debug(ctx->ins, "Raw CreateMultipartUpload response: %s",
                               c->resp.payload);
                 flb_http_client_destroy(c);
-                return -1;
+                goto cleanup;
             }
             m_upload->upload_id = tmp;
             flb_plg_info(ctx->ins, "Successfully initiated multipart upload "
                          "for %s, UploadId=%s", m_upload->s3_key,
                          m_upload->upload_id);
             flb_http_client_destroy(c);
-            return 0;
+            ret = 0;
+            goto cleanup;
         }
         flb_aws_print_xml_error(c->resp.payload, c->resp.payload_size,
                                 "CreateMultipartUpload", ctx->ins);
         if (c->resp.payload != NULL) {
             flb_plg_debug(ctx->ins, "Raw CreateMultipartUpload response: %s",
                           c->resp.payload);
         }
         flb_http_client_destroy(c);
     }

     flb_plg_error(ctx->ins, "CreateMultipartUpload request failed");
+
+cleanup:
+    if (original_host != NULL) {
+        ctx->s3_client->host = original_host;
+        ctx->s3_client->upstream->tcp_host = original_upstream_host;
+        ctx->s3_client->upstream->tcp_port = original_upstream_port;
+        flb_sds_destroy(presigned_host);
+    }
+
+    if (uri != NULL) {
+        flb_sds_destroy(uri);
+    }
+
     return ret;
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b478a2 and ec4b960.

📒 Files selected for processing (2)
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3_multipart.c
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/out_s3/s3.c (2)
src/flb_utils.c (1)
  • flb_utils_url_split (1441-1534)
src/flb_sds.c (2)
  • flb_sds_create (78-90)
  • flb_sds_destroy (389-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (6)
plugins/out_s3/s3.c (3)

2208-2284: Parsing logic is correct with proper cleanup.

The function properly validates inputs, handles both the port and scheme-based port resolution, and correctly manages memory by creating flb_sds_t copies for the caller while freeing the raw strings from flb_utils_url_split().

One minor edge case: if both port and scheme are NULL after parsing (invalid URL edge case), resolved_port remains 0. This is acceptable since such a URL would typically fail the earlier validation at line 2226.


2600-2605: Port validation logic is sound.

Properly validates port consistency and cleans up allocated resources before returning on error.


2666-2678: Cleanup section handles both code paths correctly.

The check on original_host != NULL correctly distinguishes between the presigned and non-presigned paths, ensuring restoration only occurs when the host was swapped. Both paths properly free the URI.

plugins/out_s3/s3_multipart.c (3)

405-524: Centralized cleanup pattern is well-implemented.

The unified cleanup path correctly handles all exit scenarios:

  • Body is nullified after normal use (line 480) to prevent double-free in cleanup
  • The result variable tracks success/failure state through the cleanup
  • Host restoration and resource cleanup are properly sequenced

526-631: LGTM!

The cleanup pattern mirrors complete_multipart_upload correctly with appropriate label naming (abort_cleanup).


793-942: LGTM!

The cleanup pattern correctly follows the standard approach with cleanup at the end. Headers are properly nullified after freeing (line 879) to prevent any potential double-free scenarios.

Comment on lines +2607 to +2612
original_host = ctx->s3_client->host;
original_upstream_host = ctx->s3_client->upstream->tcp_host;
original_upstream_port = ctx->s3_client->upstream->tcp_port;
ctx->s3_client->host = presigned_host;
ctx->s3_client->upstream->tcp_host = presigned_host;
ctx->s3_client->upstream->tcp_port = presigned_port != 0 ? presigned_port : ctx->port;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how upstream connections are managed and whether tcp_host changes affect new connections
echo "=== Searching for tcp_host and tcp_port usage in upstream ==="
rg -n "tcp_host|tcp_port" src/flb_upstream.c -C 3

echo ""
echo "=== Checking for connection pool/reuse patterns ==="
rg -n "conn_get|conn_release|pool" src/ --type c -i | head -30

echo ""
echo "=== Looking for upstream socket/connection creation logic ==="
rg -n "upstream.*connect|upstream.*socket|flb_upstream_conn" src/ --type c | head -20

Repository: fluent/fluent-bit

Length of output: 8723


🏁 Script executed:

#!/bin/bash
# Check how connections are matched/selected from the pool
echo "=== Looking for connection pool selection/matching logic ==="
rg -n "flb_upstream_queue_get|conn_get_from|connection.*match" src/flb_upstream.c -A 10

echo ""
echo "=== Check upstream connection structure to see if host/port are part of matching ==="
rg -n "struct flb_upstream|struct flb_connection" include/ src/ --type c -A 15 | head -60

echo ""
echo "=== Verify if pool is keyed per upstream or per upstream+host+port ==="
rg -n "queue.*connection|pool.*connection" src/flb_upstream.c -i -B 2 -A 2

Repository: fluent/fluent-bit

Length of output: 7831


🏁 Script executed:

#!/bin/bash
# Check what happens after the host swap in S3 plugin
echo "=== Looking at code after host swap (lines 2607-2650 in s3.c) ==="
sed -n '2607,2650p' plugins/out_s3/s3.c

echo ""
echo "=== Checking what functions are called and if they use upstream ==="
rg -n "flb_upstream_conn_get|s3_client->upstream" plugins/out_s3/s3.c | head -20

Repository: fluent/fluent-bit

Length of output: 2903


🏁 Script executed:

#!/bin/bash
# Find what client_vtable->request does and if it uses flb_upstream_conn_get
echo "=== Search for s3_client_vtable and request function ==="
rg -n "client_vtable|vtable.*request" plugins/out_s3/ -A 5

echo ""
echo "=== Find request function implementation in S3 ==="
rg -n "^.*request.*\(.*s3_client" plugins/out_s3/s3.c -A 3 | head -40

echo ""
echo "=== Check if http_request or similar uses flb_upstream_conn_get ==="
rg -n "flb_upstream_conn_get" plugins/out_s3/ -B 3 -A 3

Repository: fluent/fluent-bit

Length of output: 3771


🏁 Script executed:

#!/bin/bash
# Find the client_vtable definition and request function implementation
echo "=== Find where client_vtable is defined ==="
rg -n "client_vtable.*=" plugins/out_s3/ -B 2 -A 10 | head -40

echo ""
echo "=== Find request function definition ==="
rg -n "\.request\s*=" plugins/out_s3/ -B 2 -A 2

echo ""
echo "=== Search for http_request_vtable or similar ==="
rg -n "struct.*vtable|s3.*request.*\(" plugins/out_s3/ --type c | grep -i "request\|vtable"

Repository: fluent/fluent-bit

Length of output: 814


🏁 Script executed:

#!/bin/bash
# Find s3_client structure and where client_vtable is assigned
echo "=== Find s3_client structure definition ==="
rg -n "struct.*s3_client|typedef.*s3_client" plugins/out_s3/ include/ --type c -A 15 | head -80

echo ""
echo "=== Search for where client_vtable is assigned ==="
rg -n "s3_client->client_vtable" plugins/out_s3/ -B 5 -A 2 | head -40

echo ""
echo "=== Look for http_request or http client implementation ==="
rg -n "flb_http_client_request\|http.*request" plugins/out_s3/ include/ --type c | head -20

Repository: fluent/fluent-bit

Length of output: 7946


🏁 Script executed:

#!/bin/bash
# Find flb_aws_client structure
echo "=== Find flb_aws_client structure ==="
rg -n "struct flb_aws_client|typedef.*flb_aws_client" include/ src/ --type c -A 20 | head -100

echo ""
echo "=== Look for aws_client in fluent-bit core ==="
fd "aws" include/ src/ --type f | head -20

Repository: fluent/fluent-bit

Length of output: 6862


🏁 Script executed:

#!/bin/bash
# Find the request_do implementation to see how it uses upstream
echo "=== Find request_do implementation ==="
rg -n "^static struct flb_http_client \*request_do" src/aws/flb_aws_util.c -A 50

echo ""
echo "=== Look for flb_upstream_conn_get in request_do or related code ==="
sed -n '66,200p' src/aws/flb_aws_util.c

Repository: fluent/fluent-bit

Length of output: 3771


🏁 Script executed:

#!/bin/bash
# Find the full request_do implementation
echo "=== Search for request_do implementation in aws files ==="
rg -n "request_do(" src/aws/ --type c -A 80 | grep -A 80 "^src/aws.*request_do"

echo ""
echo "=== Specifically look at lines around request_do in flb_aws_util.c ==="
wc -l src/aws/flb_aws_util.c

echo ""
echo "=== Get the whole flb_aws_util.c to find request_do ==="
cat src/aws/flb_aws_util.c | grep -n "request_do" -A 60 | head -100

Repository: fluent/fluent-bit

Length of output: 3925


🏁 Script executed:

#!/bin/bash
# Find request_do function body
echo "=== Find request_do function ==="
sed -n '1300,1326p' src/aws/flb_aws_util.c

echo ""
echo "=== Search backwards from end of file for request_do ==="
tail -100 src/aws/flb_aws_util.c | grep -n "request_do" -B 5 -A 50

Repository: fluent/fluent-bit

Length of output: 772


🏁 Script executed:

#!/bin/bash
# Search for request_do definition more broadly
echo "=== Search all files for request_do implementation ==="
rg -n "^[^/]*request_do\s*\(" src/ include/ --type c

echo ""
echo "=== Search for it as a static function ==="
rg -n "static.*request_do" src/ include/ --type c -A 50 | head -150

Repository: fluent/fluent-bit

Length of output: 530


🏁 Script executed:

#!/bin/bash
# Get the request_do implementation starting at line 342
echo "=== Get request_do function implementation ==="
sed -n '342,500p' src/aws/flb_aws_util.c

Repository: fluent/fluent-bit

Length of output: 5493


🏁 Script executed:

#!/bin/bash
# Check if connections in pool are already fully established (connected)
echo "=== Check connection state when retrieved from pool ==="
sed -n '750,790p' src/flb_upstream.c

Repository: fluent/fluent-bit

Length of output: 1692


Connection pooling prevents presigned URL host swap from working—requests will be sent to the original endpoint.

When a presigned URL is used, the code correctly saves the original tcp_host/tcp_port and swaps them (lines 2611–2612), then makes an HTTP request. However, the request uses flb_upstream_conn_get() to retrieve a connection from the pool. If a keepalive connection exists in the available queue, it will be recycled and returned with its socket (fd) already connected to the original host (from before the swap). Simply changing tcp_host/tcp_port variables does not reconnect an already-open socket—the request will be sent over the existing connection to the wrong endpoint.

This is not a theoretical concern: presigned URL requests will fail because they connect to the original S3 endpoint instead of the host embedded in the presigned URL.

To fix this, either:

  • Invalidate pooled connections when presigned URLs require a different host, or
  • Create a fresh connection instead of retrieving from the pool for presigned URL requests
🤖 Prompt for AI Agents
In @plugins/out_s3/s3.c around lines 2607 - 2612, The presigned URL host/port
swap (setting ctx->s3_client->host, ctx->s3_client->upstream->tcp_host and
tcp_port using presigned_host/presigned_port) is insufficient because
flb_upstream_conn_get() may return an existing keepalive socket already
connected to the original endpoint; requests will then go to the wrong host. Fix
by detecting when presigned_host/presigned_port differ from the upstream's
original tcp_host/tcp_port and in that case avoid using pooled connections:
either invalidate or bypass the pool and force creation of a new connection for
this request (i.e., do not call flb_upstream_conn_get() for presigned requests,
or clear/mark the pool entry so a fresh socket is established), ensuring the
socket is actually connected to presigned_host/presigned_port before issuing the
HTTP request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants