Improve performances by reducing RC poll interval on all E2E scenarios on python and node.js#6879
Conversation
Node.js starts the poll timer only after a response is received, so 0.2s is safe. Other languages may use a fixed interval or have bugs that cause race conditions at shorter intervals. Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
|
|
🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: 725450a | Docs | Datadog PR Page | Give us feedback! |
Co-authored-by: Brett Langdon <[email protected]>
There was a problem hiding this comment.
As we've discussed by zoom, my point about this is that it diverges from the standard configuration applied on our customers, and does so across all scenarios without distinction. It also introduces a different testing approach for Python/Node compared to the other languages.
As a result:
- From a QA perspective, this is not something I would recommend.
- That said, this is not my decision to make, so I’m neither opposing nor approving it. Each product team should weigh in on this. If there is alignment across teams, then that’s fine.
Just one "request" change : the PR description does not include the expected time savings. This is important to help evaluate the trade-offs, so it would be helpful to add it.
Yes, it's always better to not have any special casing, but this is a problem that is impacting us significantly, so the trade-off is worth it. That is not necessarily the case for every language. For example, a language that already has a 1h CI would not case that system tests are slower. But for languages that have an efficient CI pipeline, this has a huge impact. So the trade-off of polling faster is worth more than the potential risk. That risk is also extremely minimal, polling should always work the same regardless of the interval. That is the nature of short-polling. Long-polling would be instant but that would be a design change.
Discussed this with the teams and everyone is on board with the change. Still limiting the scope to languages that are known to have a safe polling strategy.
Added! |
bm1549
left a comment
There was a problem hiding this comment.
LGTM, one non-blocking comment
|
|
||
| library = self.weblog_infra.library_name | ||
|
|
||
| if library in ("nodejs", "python"): |
There was a problem hiding this comment.
If we want to consider this as the default behavior, it would be nice to invert the condition here so that it signals that a lower polling rate is the "default" and the higher one is anomalous
Not blocking for this PR, but definitely a nice to have
There was a problem hiding this comment.
I agree with the sentiment. I'll open a follow-up PR to expand this to other languages and do some more testing, but for now the opt-in is fine I think to at least unblock Node and improve Python.
Summary
DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS=0.2as a default for all Node.js and Python scenarios inEndToEndScenario.configure()setdefaultso any scenario that explicitly sets this value retains itTest plan
Human notes
Example gains from the worst offenders:
APPSEC_RUNTIME_ACTIVATIONgoes from 9 minutes to 3 minutes.APPSEC_API_SECURITY_RCgoes from 8 minutes to <3 minutes.This is significant because for languages that parallelize the tests, our total test time of our entire CI ends up being our slowest jobs, which were those. So fixing this one problem alone will reduce our total run time significantly.
🤖 Generated with Claude Code