Add property to allow autoconfiguration of SDK telemetry version#8037
Add property to allow autoconfiguration of SDK telemetry version#8037anuraaga wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (79.59%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #8037 +/- ##
==========================================
Coverage 90.16% 90.17%
- Complexity 7478 7603 +125
==========================================
Files 834 840 +6
Lines 22559 22837 +278
Branches 2239 2282 +43
==========================================
+ Hits 20341 20593 +252
- Misses 1515 1528 +13
- Partials 703 716 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jack-berg
left a comment
There was a problem hiding this comment.
Looks good. As a principle, I / we want to make sure that declarative config is a strict superset of what's possible with env vars / system properties, so we'll want to make sure that #7928 is also resolved on the declarative config side of things as well.
| .anySatisfy( | ||
| scopeMetrics -> { | ||
| assertThat(scopeMetrics.getScope().getName()) | ||
| .isEqualTo("io.opentelemetry.sdk.logs"); | ||
| assertThat(scopeMetrics.getMetricsList()) | ||
| .satisfiesExactlyInAnyOrder( | ||
| metric -> | ||
| assertThat(metric.getName()) | ||
| .isEqualTo("otel.sdk.log.created"), | ||
| metric -> | ||
| assertThat(metric.getName()) | ||
| .isEqualTo("otel.sdk.processor.log.processed"), | ||
| metric -> | ||
| assertThat(metric.getName()) | ||
| .isEqualTo( | ||
| "otel.sdk.processor.log.queue.capacity"), | ||
| metric -> | ||
| assertThat(metric.getName()) | ||
| .isEqualTo("otel.sdk.processor.log.queue.size")); | ||
| }) |
There was a problem hiding this comment.
#nit if we're not going to assert on the attributes, maybe a more terse assert syntax
| .anySatisfy( | |
| scopeMetrics -> { | |
| assertThat(scopeMetrics.getScope().getName()) | |
| .isEqualTo("io.opentelemetry.sdk.logs"); | |
| assertThat(scopeMetrics.getMetricsList()) | |
| .satisfiesExactlyInAnyOrder( | |
| metric -> | |
| assertThat(metric.getName()) | |
| .isEqualTo("otel.sdk.log.created"), | |
| metric -> | |
| assertThat(metric.getName()) | |
| .isEqualTo("otel.sdk.processor.log.processed"), | |
| metric -> | |
| assertThat(metric.getName()) | |
| .isEqualTo( | |
| "otel.sdk.processor.log.queue.capacity"), | |
| metric -> | |
| assertThat(metric.getName()) | |
| .isEqualTo("otel.sdk.processor.log.queue.size")); | |
| }) | |
| .anySatisfy( | |
| scopeMetrics -> { | |
| assertThat(scopeMetrics.getScope().getName()) | |
| .isEqualTo("io.opentelemetry.sdk.logs"); | |
| assertThat( | |
| scopeMetrics.getMetricsList().stream() | |
| .map(Metric::getName) | |
| .collect(Collectors.toSet())) | |
| .isEqualTo( | |
| new HashSet<>( | |
| Arrays.asList( | |
| "otel.sdk.log.created", | |
| "otel.sdk.processor.log.processed", | |
| "otel.sdk.processor.log.queue.capacity", | |
| "otel.sdk.processor.log.queue.size"))); |
Or maybe even something like maintaining a Map<String, Set<String>>, with keys of each expected scope and values being the set of metric names expected for that scope.
Not critical, but this test does get pretty long with so many verbose asserts.
There was a problem hiding this comment.
Good call - I added a helper too, it seems to be both much smaller and more readable
| import javax.annotation.Nullable; | ||
|
|
||
| /** Reads the desired SDK internal telemetry version from {@link ConfigProperties}. */ | ||
| final class InternalTelemetryConfiguration { |
There was a problem hiding this comment.
Too bad we need two copies of this code. Maybe each can have a note in the javadoc referencing the other to increase the chance of keeping changes in sync?
There was a problem hiding this comment.
Yeah added the references
| final class InternalTelemetryConfiguration { | ||
|
|
||
| @Nullable | ||
| static InternalTelemetryVersion getVersion(ConfigProperties config) { |
There was a problem hiding this comment.
Hey I was thinking about internal telemetry and defaults more. Right now, we have SDK components which have different versions of internal telemetry according to their programmatic config API. For example, SdkTracerProvider / SdkLoggerProvider / SimpleSpanProcessor / SimpleLogRecordProcessor did not emit any telemetry prior to the new semantic conventions, and so they default to the latest. BatchSpanProcessor, BatchLogRecordProcessor, Otlp{Signal}{Protocol}Exporter all emit legacy by default.
That means that if otel.experimental.sdk.telemetry.version is omitted, or if otel.experimental.sdk.telemetry.version=legacy, the SDK ends up emitting split brain telemetry.
I think we need to address that here in and in #8045. My proposal:
- Both system prop / env var config and declarative config should take measures to prevent ever emitting split brain internal telemetry
- System prop / env var config defaults to legacy if
otel.experimental.sdk.telemetry.versionis unspecified. - Declarative config defaults to no internal telemetry if
.instrumentation/development.java.otel_sdk.internal_telemetry_versionis unspecified - Once the semconv for internal telemetry is stable, declarative config updates the default to the stable version
- If the internal telemetry version resolves to legacy, internal telemetry is only emitted from components that emit legacy. I.e. internal telemetry is disabled for
BatchSpanProcessor,BatchLogRecordProcessor,Otlp{Signal}{Protocol}Exporter.
There was a problem hiding this comment.
Metric names are just names, I don't know if there is any actual negative user impact from having inconsistent naming conventions in legacy mode, most commonly they wouldn't notice the new metrics, or appreciate them existing vs not (and maybe it gives a carrot for users to update to latest? ;) ). Agree that it feels a bit weird but not sure it's better to hide the new metrics to avoid the split brain. Should be simple to implement if we'd like that though.
Declarative config defaults to no internal telemetry
To confirm, in the other PR we discussed it could be confusing if the default is different for the two config mechanisms but looks like we'd go for that. Does it mean we need to add InternalTelemetryVersion.DISABLED?
There was a problem hiding this comment.
and maybe it gives a carrot for users to update to latest? ;)
See, I was thinking that omitting them from legacy would provide a carrot to update to latest 😂
Does it mean we need to add InternalTelemetryVersion.DISABLED?
I was imagining implementing this by calling setMeterProvider with a noop, to avoid adding InternalTelemetryVersion.DISABLED and having two ways to accomplish the same thing.
To confirm, in the other PR we discussed it could be confusing if the default is different for the two config mechanisms but looks like we'd go for that.
Yes and I agree that its confusing. The crux for me right now is whether or not we are allowed to migrate declarative config to change the default in the future when semconv stabilizes. I know you made the argument that its reasonable to do so with sufficient lead time and easy config knobs, but right now the precedent for opentelemetry-java and for the broader opentelemetry project is that such a change is breaking and only allowed in major versions. I think that declarative config is likely to be the dominant config mechanism in the long term, and so the top priority for me is making sure that it doesn't carry around the legacy internal telemetry baggage indefinitely. Can accomplish this by:
- Relaxing the restriction that changing the telemetry schema is breaking and only allowed in major version.
- Accepting inconsistent defaults between system props / env vars and declarative config, and having declarative config default to disabled for the time being.
Option 2 is a two way door. With option 1, the otel-java SIG needs to convince itself that this type of change is allowed, and we need to hope that the broader otel community doesn't come out with very explicit guidance to the contrary.
There was a problem hiding this comment.
See, I was thinking that omitting them from legacy would provide a carrot to update to latest
You're right, that's a much better carrot - reworked so the autoconfiguration knob now does not enable new metrics (by not setting a MeterProvider) when set to legacy, and made the default explicitly legacy instead of null.
While discussing how to reflect telemetry version in declarative config in a cross-language way (open-telemetry/semantic-conventions#3293), it would be helpful to have a traditional config property for it to be able to roll out new metrics to javaagent users. This defines
otel.experimental.sdk.telemetry.versionto allow that.#7928