[FLINK-39127][metrics] Truncate OTel attribute values via config#27989
Open
Izeren wants to merge 1 commit intoapache:masterfrom
Open
[FLINK-39127][metrics] Truncate OTel attribute values via config#27989Izeren wants to merge 1 commit intoapache:masterfrom
Izeren wants to merge 1 commit intoapache:masterfrom
Conversation
Collaborator
cffebfc to
fcc26f3
Compare
Adds configurable per-attribute and global length limits for metric attribute values exported by the OpenTelemetry reporter, addressing oversized payload rejections from OTel collectors. Configuration is prefix-based (transform.attribute-value-length-limits.<attr>), with a `*` global key, 0 to drop, and negative values to disable the limit per attribute. Colliding metrics after truncation are logged and counted best-effort without blocking registration. A second option, transform.collision-tracking-max-slots, bounds the memory footprint of the collision tracker via an access-ordered LRU (default 50000, 0 disables, invalid values WARN and fall back to the default). This prevents unbounded heap growth under failover loops where every task attempt produces a distinct per-attempt UUID that maps to a new tracking slot. Part of FLIP-553. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fcc26f3 to
74ad2be
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Part 3 of FLIP-553. Large OpenTelemetry metric attribute values can be rejected by OTel collectors as oversized payloads. This PR adds two configuration options to the OpenTelemetry metric reporter to keep attribute values within safe bounds and to track post-truncation metric name collisions without unbounded heap growth.
Brief change log
transform.attribute-value-length-limits.<attribute-name>(with*as the global key) truncates metric attribute values to the configured length.0drops the attribute, negative values disable the limit for that attribute, and the*global key applies to any attribute without an explicit entry.transform.collision-tracking-max-slots(default50000) bounds the memory footprint of the post-truncation collision tracker via an access-ordered LRU.0disables collision tracking. Malformed or negative valuesWARNand fall back to the default — a typo should not tear down metric reporting at JobManager startup.MetricAttributeTransformerapplies the limits and tracks collisions. Serialized via the existing reporter lock (@NotThreadSafe).OpenTelemetryMetricReporter.open()builds the transformer from reporter config;notifyOfAddedMetricapplies it before registration.OpenTelemetryReporterOptions, which is shared with the trace and event reporters. Their descriptions explicitly state that these two options apply only to the metric reporter and are ignored by the trace and event reporters.Verifying this change
This change added tests and can be verified as follows:
MetricAttributeTransformerTestcovering: empty attribute values preserved under non-zero limits, access-ordered LRU hot-slot survives eviction, bounded slot tracking undermaxSlots * 4distinct-slot load, malformedNumberFormatExceptionpath falls back to default, negativemax-slotsfalls back to default, global vs per-attribute limit interaction,0drop semantics, and negative-limit disable semantics.OpenTelemetryMetricReporterTestcovering the reporter-level wiring.OpenTelemetryMetricReporterITCase#testAttributeValueTruncationverifying truncated attribute values reach the exporter.Does this pull request potentially affect one of the following parts:
@Public(Evolving): no (the newConfigOptions are@PublicEvolving; the transformer class is package-private)Documentation
docs/layouts/shortcodes/generated/open_telemetry_reporter_configuration.html) and JavaDocs onMetricAttributeTransformerand the newConfigOptions inOpenTelemetryReporterOptions.Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.7)