Introduce load-balanced channel for OpenTelemetry exporters#2175
Introduce load-balanced channel for OpenTelemetry exporters#2175dkostyrev wants to merge 3 commits into
Conversation
eb9bfe9 to
b86f6d1
Compare
b86f6d1 to
9533feb
Compare
9533feb to
6406230
Compare
|
@dkostyrev Than you for this awesome PR. I have finally had a bit of time to explore it and now I understand quite a bit. This is fantastic. |
|
@amankrx when you get a chance, please help with the merge conflict here. |
| .block_on(async { | ||
| // The OTLP exporters need to run in a Tokio context. | ||
| spawn!("init tracing", async { init_tracing() }) | ||
| spawn!("init tracing", async { init_tracing().await }) |
|
That coverage issue is interesting. |
29cb3d7 to
dff32cd
Compare
31d41d4 to
892ecee
Compare
892ecee to
1eb7f46
Compare
|
I think this is actually hitting https://users.rust-lang.org/t/undefined-reference-to-memcpy-chk-with-target-x86-64-unknown-linux-musl/113567 as it's building a linked zstd, not the vendored one. Upstream has a PR that would sort this out, and I've just prodded them on that. |
|
@dkostyrev do you want help here? |
f322ac7 to
b8d6683
Compare
b8d6683 to
2bdfd7c
Compare
|
@MarcusSorealheis I've reverted opentelemetry update, I think that the coverage should pass now |
|
I managed to break our attic caching. Rebasing on main should fix at least most of these failures |
Add client-side load balancing to OTLP gRPC connections using ginepro. When NL_OTEL_ENDPOINT is set, the telemetry system creates a load-balanced channel shared across log, trace, and metric exporters. This enables better distribution of telemetry traffic across multiple OTLP collector instances and improves overall system resilience. - Add ginepro dependency for gRPC load balancing - Upgrade OpenTelemetry dependencies from 0.29 to 0.30 - Change init_tracing() to async to support channel initialization - Add NL_OTEL_ENDPOINT environment variable for configuration - Update all OTLP exporters to use shared load-balanced channel # Conflicts: # Cargo.lock
2bdfd7c to
ac3d375
Compare
|
I've rebased the branch, CI is passing now |
There was a problem hiding this comment.
I'm not seeing any tests for this? Especially if we're introducing an entirely new library that might well do things in a different way to our current stuff (although hopefully identical in the 1-channel case), I'd like to make sure that it's at least about the same.
There was a problem hiding this comment.
Are you proposing adding an integration test for the whole init_tracing function or just test some parts of the new logic in isolation?
There was a problem hiding this comment.
I'd prefer I think an integration test, just to check the new library does opentelemetry in the same way as our older stuff if that's possible?
bb5068e to
d6a38f4
Compare
d6a38f4 to
9e6ddec
Compare
- Replace `let _ = server.await` with `.await.ok()` to satisfy `-D let-underscore-drop` on ubuntu/windows stable and coverage builds - Add `default-features = false` to opentelemetry-proto and opentelemetry_sdk dev-dependencies to pass taplo schema validation Co-authored-by: Cursor <[email protected]>
Summary
This PR introduces client-side load balancing for OpenTelemetry (OTLP) gRPC connections using the ginepro library. When the
NL_OTEL_ENDPOINT(name to be discussed, maybe boolean flag?) environment variable is set, NativeLink will create a load-balanced channel for exporting logs, traces, and metrics, distributing requests across multiple backend endpoints resolved via DNS. This change allows to distribute OTLP traffic across multiple OTLP collector instances.Changes
Load-balanced OTLP exports:
gineprodependency to provide client-side load balancing for gRPC channels used by OpenTelemetry exportersNL_OTEL_ENDPOINTenvironment variable to configure the OTLP endpoint for load-balanced connectionsinit_tracing()from synchronous to async to support balanced channel initializationThis change is