in_podman_metrics: fix multiple cgroup v2 issues by stondo · Pull Request #11719 · fluent/fluent-bit

stondo · 2026-04-15T21:51:50Z

Summary

Fix four cgroup v2 bugs in the in_podman_metrics input plugin that caused
most container metrics to be absent or incorrect on systems using cgroup v2
(unified hierarchy).

All four bugs are confirmed on an ARM64 embedded device running
Fluent Bit 4.2.0/5.0.2 with Podman and cgroup v2 unified hierarchy.

Bugs fixed

1. CPU counter division (incorrect unit conversion)

create_counter() divides raw CPU values by 1,000,000,000 (nanoseconds)
unconditionally. This is correct for cgroup v1 (cpuacct.usage reports
nanoseconds) but wrong for cgroup v2 (cpu.stat reports usage_usec
and user_usec in microseconds).

On v2, integer division truncates all values below 1e9 usec (~16.7 min of
CPU time) to zero. In practice, container_cpu_usage_seconds_total and
container_cpu_user_seconds_total always read 0.

Fix: Check ctx->cgroup_version and use the correct divisor:
1e9 for v1 (nanoseconds), 1e6 for v2 (microseconds).

2. RSS memory key name (wrong key for v2)

STAT_KEY_RSS is defined as "rss", which is correct for cgroup v1
memory.stat. However, cgroup v2 memory.stat does not have a rss
field; the equivalent is anon (anonymous memory pages).

Result: container_memory_rss gauge is never reported for any
container on v2, with a [warn] rss not found in .../memory.stat log
message emitted per container per scrape.

Fix: Add V2_STAT_KEY_RSS "anon" and use it in
fill_counters_with_sysfs_data_v2().

3. memory.max "max" keyword (parse failure)

cgroup v2 memory.max contains the literal string "max" when the
memory limit is unlimited (no limit set). read_from_file() uses
fscanf(fp, "%lu", &value) which fails to parse "max", returning
UINT64_MAX and logging a spurious warning per affected container per
scrape.

Result: container_spec_memory_limit_bytes is missing for
containers without an explicit memory limit. Warning spam in logs.

Fix: Add read_from_sysfs_or_max() helper that parses the "max"
keyword and returns 0 (unlimited), matching the convention used by
cAdvisor and other container metric exporters.

4. PID fallback path typo (singular vs. plural)

V2_SYSFS_FILE_PIDS_ALT is defined as "containers/cgroup.procs"
(plural), but the actual cgroup v2 subdirectory is
"container/cgroup.procs" (singular).

On v2, cgroup.procs at the scope level is empty for all containers.
Processes live only in the container/cgroup.procs subdirectory. The
plugin correctly tries the alt path, but the typo means it always fails.

Result: PID lookup fails for all containers, which prevents
get_net_data_from_proc() from being called. All four
container_network_* metrics are completely absent.

Fix: Change "containers/cgroup.procs" to "container/cgroup.procs".

Before/After (gw-cloud-connector container, ARM64 device)

Metric	Before	After
`container_cpu_usage_seconds_total`	0	449
`container_cpu_user_seconds_total`	0	145
`container_memory_rss`	MISSING	12,472,320
`container_spec_memory_limit_bytes`	MISSING	1,048,576,000
`container_network_receive_bytes_total`	MISSING	3,229,233
`container_network_transmit_bytes_total`	MISSING	4,721,901

After the fix, all 10 metric types are successfully emitted for all 9
containers, with zero warnings in the log.

Testing

Tested on:

ARM64 (Cortex-A7) embedded device, 4-core, 4 GB RAM
Yocto Scarthgap (kernel 6.6), cgroup v2 unified hierarchy
Podman 5.x with 9 containers (3 infra, 3 service, 3 app)
Fluent Bit 4.2.0 and verified patch applies cleanly to 5.0.2/5.0.3

Verified:

All 10 metric types present in Prometheus output
CPU values match raw cpu.stat divided by 1e6
RSS values match anon field in memory.stat
Memory limit correctly reports 0 for unlimited and real values for limited
Network metrics present for all containers with non-loopback interfaces
Zero warnings in journal log after fix (previously ~45 warnings per scrape)
No regression: cgroup v1 code path is unchanged

Fixes #7769

Summary by CodeRabbit

Bug Fixes

Fixed CPU metrics calculation to report accurate performance data across different environments
Corrected memory limit readings in container systems
Improved container process identification and tracking accuracy

Fix four cgroup v2 bugs in the podman_metrics input plugin: 1. CPU counter division: cgroup v2 cpu.stat reports usage in microseconds, not nanoseconds like cgroup v1 cpuacct. Use the correct divisor (1e6) when converting to seconds. 2. RSS memory key: cgroup v2 memory.stat does not have a "rss" field. The equivalent metric is "anon" (anonymous memory). Add V2_STAT_KEY_RSS and use it in the v2 collection path. 3. memory.max "max" keyword: cgroup v2 uses the literal string "max" in memory.max when the memory limit is unlimited. read_from_file() fails to parse this with fscanf("%lu"), causing spurious warnings. Add read_from_sysfs_or_max() helper that returns 0 for "max" (unlimited). 4. PID alt path typo: V2_SYSFS_FILE_PIDS_ALT was set to "containers/cgroup.procs" (plural) but the actual cgroup v2 subdirectory is "container/cgroup.procs" (singular). This caused PID lookup to fail for all containers, which in turn prevented all network metrics from being collected. Fixes: fluent#7769 Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>

coderabbitai · 2026-04-15T21:52:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d8141f9d-3215-4723-8400-7c1a380c69ce

📥 Commits

Reviewing files that changed from the base of the PR and between 63ed88e and 6c742a1.

📒 Files selected for processing (3)

plugins/in_podman_metrics/podman_metrics.c
plugins/in_podman_metrics/podman_metrics_config.h
plugins/in_podman_metrics/podman_metrics_data.c

📝 Walkthrough

Walkthrough

These changes enhance cgroup v2 support in the Podman metrics plugin by correcting unit conversions for CPU metrics, updating sysfs file paths for cgroup v2, adding RSS memory metric mapping for cgroup v2, and implementing proper handling of unlimited memory values.

Changes

Cohort / File(s)	Summary
CPU Metric Unit Conversion `plugins/in_podman_metrics/podman_metrics.c`	Updated CPU metric unit conversion logic to branch on cgroup version: divides by 1,000,000 for CGROUP_V2 (microseconds to seconds) and by 1,000,000,000 otherwise (nanoseconds to seconds); updated trace log messaging.
Configuration Constants `plugins/in_podman_metrics/podman_metrics_config.h`	Added new cgroup v2 RSS metric key macro `V2_STAT_KEY_RSS` mapped to `"anon"`; corrected cgroup v2 alternate PIDs sysfs file path from `"containers/cgroup.procs"` to `"container/cgroup.procs"`.
Cgroup V2 Data Collection `plugins/in_podman_metrics/podman_metrics_data.c`	Added `read_from_sysfs_or_max()` helper function to properly handle cgroup v2 memory limit values, treating literal `"max"` as unlimited (0); updated memory limit collection to use this helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hops with glee
v2 cgroups now dance just right,
"max" means zero, units take flight!
Paths are fixed, RSS is keen,
The cleanest metrics ever seen! 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main objective—fixing multiple cgroup v2 issues in the in_podman_metrics plugin across CPU conversion, RSS key, memory.max parsing, and PID path corrections.
Linked Issues check	✅ Passed	The PR addresses all four coding objectives from issue `#7769`: CPU counter division by cgroup version, RSS key selection (anon for v2), memory.max "max" keyword handling, and PID fallback path correction.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing cgroup v2 compatibility issues: CPU unit conversion logic, RSS memory key addition, memory.max parsing helper, and path macro correction.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

stondo requested review from cosmo0920 and edsiper as code owners April 15, 2026 21:51

github-actions bot added the docs-required label Apr 15, 2026

stondo temporarily deployed to pr April 16, 2026 17:17 — with GitHub Actions Inactive

stondo temporarily deployed to pr April 16, 2026 17:42 — with GitHub Actions Inactive

stondo temporarily deployed to pr April 16, 2026 17:43 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in_podman_metrics: fix multiple cgroup v2 issues#11719

in_podman_metrics: fix multiple cgroup v2 issues#11719
stondo wants to merge 1 commit intofluent:masterfrom
stondo:fix/podman-metrics-cgroupv2

stondo commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stondo commented Apr 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bugs fixed

1. CPU counter division (incorrect unit conversion)

2. RSS memory key name (wrong key for v2)

3. memory.max "max" keyword (parse failure)

4. PID fallback path typo (singular vs. plural)

Before/After (gw-cloud-connector container, ARM64 device)

Testing

Summary by CodeRabbit

Bug Fixes

Uh oh!

coderabbitai bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stondo commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 15, 2026 •

edited

Loading