in_node_exporter_metrics: add timex linux collector#11718
in_node_exporter_metrics: add timex linux collector#11718ngillen wants to merge 1 commit intofluent:masterfrom
Conversation
📝 WalkthroughWalkthroughAdds a new "timex" collector to the in_node_exporter_metrics plugin with build changes, plugin wiring, new timex metric fields, and a Linux-specific implementation that polls NTP state via adjtimex() to publish gauges and counters. Changes
Sequence DiagramsequenceDiagram
participant Plugin as Plugin
participant Timex as Timex Collector
participant Kernel as adjtimex() Syscall
participant Metrics as CMT Metrics
Plugin->>Timex: register timex_collector
Plugin->>Timex: cb_init (ne_timex_init)
Timex->>Metrics: create gauges & counters (offset, freq, errors, counters...)
Plugin->>Timex: cb_update (ne_timex_update)
Timex->>Kernel: adjtimex()
Kernel-->>Timex: NTP/timex state (struct)
Timex->>Metrics: cmt_gauge_set / cmt_counter_set (scaled values)
Timex-->>Plugin: update complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f9ff4b9b5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| cmt_gauge_set(ctx->timex_tick, ts, tx.tick / MICROSECONDS, 0, NULL); | ||
| cmt_gauge_set(ctx->timex_pps_freq, ts, tx.ppsfreq / PPM16FRAC, 0, NULL); | ||
| cmt_gauge_set(ctx->timex_jitter, ts, tx.jitter / divisor, 0, NULL); | ||
| cmt_gauge_set(ctx->timex_shift, ts, tx.shift, 0, NULL); |
There was a problem hiding this comment.
Export PPS shift as seconds, not raw shift exponent
node_timex_pps_shift_seconds is currently set from tx.shift directly, but struct timex.shift is the shift value used to represent an interval in powers of two; exporting it raw yields values like 4 instead of the actual interval seconds (16), so the metric is systematically underreported and no longer matches node_exporter-compatible semantics for this series.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
According to man 2 adjtimex, timex.shift is PPS interval duration in seconds.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_node_exporter_metrics/ne_timex_linux.c`:
- Around line 253-256: ne_timex_init currently ignores timex_configure() result
and always returns 0, which allows null metric handles later; change
ne_timex_init to check the return value of timex_configure(ctx) and return a
non-zero error code (propagate the failure) when timex_configure fails so
callers know init failed; update any callers (collector init path) to handle a
non-zero return if necessary.
- Around line 259-269: ne_timex_update currently ignores errors from
timex_update and always returns 0; change it to check the return value of
timex_update(ctx, ts) and propagate a non-zero error when timex_update (which
wraps adjtimex) fails. Specifically, call ret = timex_update(ctx, ts); if ret
indicates failure, log or report the error and return a non-zero code from
ne_timex_update instead of 0 so callers can detect collection failure. Ensure
you reference ne_timex_update and timex_update in the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: efe96f92-6f35-469a-bf96-4102daec5db3
📒 Files selected for processing (6)
plugins/in_node_exporter_metrics/CMakeLists.txtplugins/in_node_exporter_metrics/ne.cplugins/in_node_exporter_metrics/ne.hplugins/in_node_exporter_metrics/ne_timex.cplugins/in_node_exporter_metrics/ne_timex.hplugins/in_node_exporter_metrics/ne_timex_linux.c
Extend node_exporter_metrics by exposing timex metrics by parsing struct timex from adjtimex and aligning the output with Prometheus node_exporter conventions. Signed-off-by: Nicolas Gillen <nicolas.gillen@agregio-solutions.com>
7f9ff4b to
5a21c14
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_node_exporter_metrics/ne_timex_linux.c`:
- Around line 123-126: The gauge node_timex_pps_shift_seconds is exporting the
raw exponent (tx.shift) instead of the PPS interval in seconds; change the
export so that the value reported is 2^tx.shift (e.g., use (double)(1ULL <<
tx.shift) or pow(2.0, tx.shift)) before calling cmt_gauge_set for
node_timex_pps_shift_seconds (refer to tx.shift and the export site around the
existing cmt_gauge_set on line ~242); ensure the result is a floating-point
seconds value and handle tx.shift as an unsigned integer to avoid overflow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7df8f8bf-b426-47e0-aa72-8e73cd211b28
📒 Files selected for processing (6)
plugins/in_node_exporter_metrics/CMakeLists.txtplugins/in_node_exporter_metrics/ne.cplugins/in_node_exporter_metrics/ne.hplugins/in_node_exporter_metrics/ne_timex.cplugins/in_node_exporter_metrics/ne_timex.hplugins/in_node_exporter_metrics/ne_timex_linux.c
✅ Files skipped from review due to trivial changes (2)
- plugins/in_node_exporter_metrics/CMakeLists.txt
- plugins/in_node_exporter_metrics/ne_timex.h
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/in_node_exporter_metrics/ne.c
- plugins/in_node_exporter_metrics/ne_timex.c
| /* node_timex_pps_shift_seconds */ | ||
| g = cmt_gauge_create(ctx->cmt, "node", "timex", "pps_shift_seconds", | ||
| "Pulse per second interval duration.", | ||
| 0, NULL); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Upstream node_exporter timex shift handling:"
curl -fsSL https://raw.githubusercontent.com/prometheus/node_exporter/master/collector/timex_linux.go \
| rg -n "pps_shift_seconds|Shift|Pow|Exp2"
echo
echo "Linux adjtimex(2) docs for 'shift' semantics:"
curl -fsSL https://man7.org/linux/man-pages/man2/adjtimex.2.html \
| rg -n "shift|interval"Repository: fluent/fluent-bit
Length of output: 171
🏁 Script executed:
# First, locate and examine the actual file
fd -t f "ne_timex_linux.c" --exec wc -l {} \;Repository: fluent/fluent-bit
Length of output: 116
🏁 Script executed:
# Read the specific file to verify the referenced code
cat -n plugins/in_node_exporter_metrics/ne_timex_linux.c | sed -n '120,130p'Repository: fluent/fluent-bit
Length of output: 447
🏁 Script executed:
# Check the actual metric update code around line 242
cat -n plugins/in_node_exporter_metrics/ne_timex_linux.c | sed -n '235,250p'Repository: fluent/fluent-bit
Length of output: 1145
🏁 Script executed:
# Search for all references to timex_shift in the codebase
rg "timex_shift" --type c --type h -B 2 -A 2Repository: fluent/fluent-bit
Length of output: 1387
🏁 Script executed:
# Check if there's any local documentation or comments about timex shift semantics
rg -i "shift.*exponent|shift.*2\^|shift.*power" plugins/in_node_exporter_metrics/Repository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Look at the full timex_update function to understand context
cat -n plugins/in_node_exporter_metrics/ne_timex_linux.c | sed -n '200,250p'Repository: fluent/fluent-bit
Length of output: 2113
🏁 Script executed:
# Check if there are any constants or macros related to shift conversion
rg "SHIFT|shift" plugins/in_node_exporter_metrics/ne_timex_linux.cRepository: fluent/fluent-bit
Length of output: 258
🏁 Script executed:
# Try alternative web search using different approach for node_exporter verification
curl -fsSL https://raw.githubusercontent.com/prometheus/node_exporter/main/collector/timex_linux.go 2>/dev/null | head -200 || echo "Fallback: Checking local Go docs reference"Repository: fluent/fluent-bit
Length of output: 43
node_timex_pps_shift_seconds exports the raw exponent instead of converted seconds.
The metric name and description indicate this should be the PPS interval duration in seconds. However, tx.shift is a small integer exponent (0-31) where the actual interval is 2^shift seconds. Line 242 exports the raw value without conversion, unlike other timex metrics (e.g., tx.tick / MICROSECONDS, tx.stabil / PPM16FRAC) which apply appropriate conversions.
For example, tx.shift = 4 should represent 16 seconds (2^4), not the value 4.
Suggested fix
static int timex_update(struct flb_ne *ctx, uint64_t ts)
{
struct timex tx = {};
int ret = 0;
double sync_status = 0;
double divisor = 0;
+ double pps_shift_seconds = 0;
ret = adjtimex(&tx);
if (ret == -1)
{
flb_plg_error(ctx->ins, "error on adjtimex: error: %d, %s", errno, strerror(errno));
return -1;
}
if (ret == TIME_ERROR)
{
sync_status = 0;
}
else
{
sync_status = 1;
}
if (tx.status & STA_NANO)
{
divisor = NANOSECONDS;
}
else
{
divisor = MICROSECONDS;
}
cmt_gauge_set(ctx->timex_sync_status, ts, sync_status, 0, NULL);
cmt_gauge_set(ctx->timex_offset, ts, tx.offset / divisor, 0, NULL);
cmt_gauge_set(ctx->timex_freq, ts, 1.0 + tx.freq / PPM16FRAC, 0, NULL);
cmt_gauge_set(ctx->timex_maxerror, ts, tx.maxerror / MICROSECONDS, 0, NULL);
cmt_gauge_set(ctx->timex_esterror, ts, tx.esterror / MICROSECONDS, 0, NULL);
cmt_gauge_set(ctx->timex_status, ts, tx.status, 0, NULL);
cmt_gauge_set(ctx->timex_constant, ts, tx.constant, 0, NULL);
cmt_gauge_set(ctx->timex_tick, ts, tx.tick / MICROSECONDS, 0, NULL);
cmt_gauge_set(ctx->timex_pps_freq, ts, tx.ppsfreq / PPM16FRAC, 0, NULL);
cmt_gauge_set(ctx->timex_jitter, ts, tx.jitter / divisor, 0, NULL);
- cmt_gauge_set(ctx->timex_shift, ts, tx.shift, 0, NULL);
+ if (tx.shift >= 0 && tx.shift < 63) {
+ pps_shift_seconds = (double) (1ULL << tx.shift);
+ }
+ cmt_gauge_set(ctx->timex_shift, ts, pps_shift_seconds, 0, NULL);
cmt_gauge_set(ctx->timex_stabil, ts, tx.stabil / PPM16FRAC, 0, NULL);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/in_node_exporter_metrics/ne_timex_linux.c` around lines 123 - 126,
The gauge node_timex_pps_shift_seconds is exporting the raw exponent (tx.shift)
instead of the PPS interval in seconds; change the export so that the value
reported is 2^tx.shift (e.g., use (double)(1ULL << tx.shift) or pow(2.0,
tx.shift)) before calling cmt_gauge_set for node_timex_pps_shift_seconds (refer
to tx.shift and the export site around the existing cmt_gauge_set on line ~242);
ensure the result is a floating-point seconds value and handle tx.shift as an
unsigned integer to avoid overflow.
Extend node_exporter_metrics by exposing timex metrics by parsing struct timex from adjtimex and aligning the output with Prometheus node_exporter conventions.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit