tcpdump / libpcap - packet traces - pcap-grout.so - WIP#565
tcpdump / libpcap - packet traces - pcap-grout.so - WIP#565
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a packet-capture subsystem across control, datapath, API, CLI, plugin, tests, and build. Introduces a POSIX shared-memory MPSC capture ring API ( Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/capture_ring_test.c`:
- Line 14: The file api/capture_ring_test.c has clang-format style violations
(e.g., around the `#include` <gr_cmocka.h> and several lines listed in the CI), so
run an automatic reformat: execute clang-format -i api/capture_ring_test.c (or
apply your project's clang-format configuration) to fix formatting for all
reported lines; verify changes include the corrected formatting for the include
and test functions in capture_ring_test.c and commit the resulting file.
In `@api/gr_capture_ring.h`:
- Around line 275-278: The timestamp conversion in gr_capture_slot_timestamp_ns
overflows because the expression delta * 1000000000ULL is computed in uint64_t;
change the arithmetic to use a wider type (e.g. __uint128_t) or a safe mul-div
helper so the multiplication is performed with 128-bit precision before dividing
by r->tsc_hz, then cast back to uint64_t; update the calculation that produces
the final timestamp (currently computing r->realtime_ref_ns + delta *
1000000000ULL / r->tsc_hz) to perform the multiply/divide in extended precision
to avoid intermediate overflow.
In `@modules/infra/control/capture.c`:
- Around line 126-128: The shared memory object created by shm_open(s->shm_path,
...) is world-readable due to the 0644 mode; change the creation mode to 0600 to
restrict access to the owning user (e.g., shm_open(..., 0600)). If other code
later relies on different permissions, additionally enforce permission with
fchmod(fd, 0600) after opening. Update the callsite around s->shm_path /
capture_seq where fd is created to use 0600 and/or call fchmod(fd, 0600)
immediately after opening.
- Around line 77-85: The callback iface_add_callback currently enables capture
for any newly added port regardless of whether the running capture_session
(active_capture) includes that iface in its frozen iface table; update
iface_add_callback (and the similar callback implementations around lines
117-173) to first check that active_capture is non-NULL and that iface->id is
within the session's iface table bounds (e.g., iface->id < s->n_ifaces and that
the session's iface table contains a valid entry for that index) before setting
GR_IFACE_F_CAPTURE or storing into iface_capture; only opt-in ports present in
the session's iface table to avoid creating descriptors for hot-added ports
outside the session.
- Around line 193-235: capture_session_set_filter() currently destroys the old
BPF (old) immediately after swapping, creating a use-after-free window for
workers that read session state from iface_capture[] with memory_order_relaxed;
fix by introducing lifetime management (RCU or a refcount) on the capture
session and its resources (s, s->ring, s->bpf, s->bpf_jit_func): modify the
worker path that loads sessions from iface_capture[] to acquire a reference
(e.g., atomic_ref_get or rcu_read_lock) and release it when done, update
capture_session_set_filter() to swap pointers without freeing old resources and
instead defer destroy until the refcount reaches zero (or call_rcu callback),
and update capture_session_stop()/capture_clear_flags() to drop its reference
and schedule actual free/unmap only after readers complete; adjust struct (add
refcount or RCU hooks) and ensure all stores use appropriate release semantics
to synchronize with reader acquisitions.
In `@modules/infra/datapath/port_rx.c`:
- Around line 117-125: The datapath can race with capture teardown leading to
use-after-free; fix by making capture access safe: replace the non-atomic flag
read and relaxed session load in the RX path with a synchronized check — perform
an atomic load of the iface capture pointer (iface_capture[ctx->iface->id]) with
acquire semantics (or otherwise load the flag/pointer with memory_order_acquire)
and then, inside capture_enqueue(), validate the session/ring before accessing
ring fields by checking ring->magic (and only proceed if the magic matches the
expected constant); this ensures capture_clear_flags()'s store + munmap
(release) synchronizes with the datapath and prevents accessing unmapped ring
memory.
In `@modules/infra/datapath/port_tx.c`:
- Around line 61-69: The capture_enqueue call in tx_begin() uses pre-VLAN mbufs
so captured packets lack VLAN tags; move the GR_IFACE_F_CAPTURE check and
capture_enqueue invocation out of tx_begin() and into the main tx_process() flow
immediately after tx_add_vlan() (or after any other code that mutates mbufs for
wire format) so capture_enqueue(ctx->txq.port_id, ctx->txq.queue_id, (struct
rte_mbuf **)objs, nb_objs, GR_CAPTURE_DIR_OUT, iface) receives the post-VLAN
mbufs; update any assumptions about objs ownership or pointers if tx_add_vlan()
replaces or reorders mbufs.
In `@pcap/pcap-grout.c`:
- Around line 106-110: The loop in pcap-grout.c comparing waited_ms to
timeout_ms is wrong because GROUT_POLL_US is in microseconds (100us) while
waited_ms is incremented by 1 each iteration, making the effective timeout ~10x
shorter; fix the timeout calculation in the polling loop that calls
usleep(GROUT_POLL_US) by updating waited_ms using the sleep interval in
milliseconds (e.g., waited_ms += GROUT_POLL_US / 1000) or by switching to
microsecond units for the comparison (convert timeout_ms to microseconds and
compare a waited_us counter), ensuring variables timeout_ms, waited_ms,
GROUT_POLL_US and the usleep call are updated consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d7f2e7ce-b262-44c4-8245-c3e4e146c8e3
📒 Files selected for processing (18)
api/capture_ring_test.capi/gr_capture_ring.hapi/meson.buildmeson_options.txtmodules/infra/api/capture.cmodules/infra/api/gr_infra.hmodules/infra/api/meson.buildmodules/infra/cli/capture.cmodules/infra/cli/meson.buildmodules/infra/control/capture.cmodules/infra/control/gr_capture.hmodules/infra/control/meson.buildmodules/infra/datapath/gr_dp_capture.hmodules/infra/datapath/port_rx.cmodules/infra/datapath/port_tx.cpcap/meson.buildpcap/pcap-grout.csmoke/capture_test.sh
| static void iface_add_callback(uint32_t /*event*/, const void *obj) { | ||
| const struct iface *iface = obj; | ||
| struct capture_session *s = active_capture; | ||
| if (s == NULL || s->iface_id != GR_IFACE_ID_UNDEF) | ||
| return; | ||
| if (iface->type != GR_IFACE_TYPE_PORT) | ||
| return; | ||
| iface_from_id(iface->id)->flags |= GR_IFACE_F_CAPTURE; | ||
| atomic_store_explicit(&iface_capture[iface->id], s, memory_order_release); |
There was a problem hiding this comment.
Don't enable capture on ports missing from the session's iface table.
n_ifaces and the shared-memory interface table are frozen when the session starts, but iface_add_callback() later opts newly added ports into capture. The datapath still writes slot->iface_id = iface->id (modules/infra/datapath/gr_dp_capture.h:68-73), so packets from hot-added ports have no descriptor in the ring metadata. That breaks downstream interface mapping/pcapng emission for those packets.
Also applies to: 117-173
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/control/capture.c` around lines 77 - 85, The callback
iface_add_callback currently enables capture for any newly added port regardless
of whether the running capture_session (active_capture) includes that iface in
its frozen iface table; update iface_add_callback (and the similar callback
implementations around lines 117-173) to first check that active_capture is
non-NULL and that iface->id is within the session's iface table bounds (e.g.,
iface->id < s->n_ifaces and that the session's iface table contains a valid
entry for that index) before setting GR_IFACE_F_CAPTURE or storing into
iface_capture; only opt-in ports present in the session's iface table to avoid
creating descriptors for hot-added ports outside the session.
| struct rte_bpf *old = s->bpf; | ||
|
|
||
| if (bpf_len == 0) { | ||
| s->bpf_jit_func = NULL; | ||
| s->bpf = NULL; | ||
| if (old != NULL) | ||
| rte_bpf_destroy(old); | ||
| LOG(INFO, "capture filter cleared"); | ||
| return 0; | ||
| } | ||
|
|
||
| // Build a struct bpf_program from the raw bytecode. | ||
| struct bpf_program prog = { | ||
| .bf_len = bpf_len, | ||
| .bf_insns = (struct bpf_insn *)(uintptr_t)bpf_insns, | ||
| }; | ||
|
|
||
| // Convert classic BPF (libpcap format) to eBPF. | ||
| struct rte_bpf_prm *prm = rte_bpf_convert(&prog); | ||
| if (prm == NULL) { | ||
| LOG(ERR, "rte_bpf_convert: %s", rte_strerror(rte_errno)); | ||
| return errno_set(EINVAL); | ||
| } | ||
|
|
||
| // JIT compile to native code. | ||
| struct rte_bpf *bpf = rte_bpf_load(prm); | ||
| rte_free(prm); | ||
| if (bpf == NULL) { | ||
| LOG(ERR, "rte_bpf_load: %s", rte_strerror(rte_errno)); | ||
| return errno_set(EINVAL); | ||
| } | ||
|
|
||
| struct rte_bpf_jit jit; | ||
| if (rte_bpf_get_jit(bpf, &jit) < 0 || jit.func == NULL) { | ||
| LOG(NOTICE, "BPF JIT not available, using interpreter"); | ||
| jit.func = NULL; | ||
| } | ||
|
|
||
| s->bpf_jit_func = jit.func; | ||
| s->bpf = bpf; | ||
|
|
||
| if (old != NULL) | ||
| rte_bpf_destroy(old); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
mapfile -t targets < <(fd 'gr_dp_capture\.h$|port_rx\.c$|port_tx\.c$|capture\.c$' .)
rg -n -C3 '\biface_capture\b|\bbpf_jit_func\b|->bpf\b|rte_bpf_destroy\s*\(' "${targets[@]}"Repository: DPDK/grout
Length of output: 110
🏁 Script executed:
fd -t f 'capture\.c$|port_rx\.c$|port_tx\.c$|gr_dp_capture\.h$'Repository: DPDK/grout
Length of output: 246
🏁 Script executed:
rg -n 'iface_capture|bpf_jit_func|rte_bpf_destroy|capture_session_set_filter|capture_session_stop' modules/infra/control/capture.c modules/infra/datapath/gr_dp_capture.h modules/infra/datapath/port_rx.c modules/infra/datapath/port_tx.cRepository: DPDK/grout
Length of output: 1358
🏁 Script executed:
cat -n modules/infra/datapath/gr_dp_capture.hRepository: DPDK/grout
Length of output: 3458
🏁 Script executed:
cat -n modules/infra/control/capture.c | head -100Repository: DPDK/grout
Length of output: 3281
🏁 Script executed:
sed -n '187,280p' modules/infra/control/capture.cRepository: DPDK/grout
Length of output: 2322
🏁 Script executed:
grep -n "capture_enqueue\|rte_bpf_exec\|->ring\|->bpf_jit_func\|->bpf\b" modules/infra/datapath/port_rx.c modules/infra/datapath/port_tx.c | head -30Repository: DPDK/grout
Length of output: 332
🏁 Script executed:
sed -n '30,90p' modules/infra/control/capture.cRepository: DPDK/grout
Length of output: 1808
🏁 Script executed:
cat -n modules/infra/control/capture.c | sed -n '250,275p'Repository: DPDK/grout
Length of output: 912
🏁 Script executed:
rg -n "memory_order|synchronize|rcu|call_rcu|grace" modules/infra/control/capture.cRepository: DPDK/grout
Length of output: 370
🏁 Script executed:
cat -n modules/infra/control/capture.c | sed -n '92,160p'Repository: DPDK/grout
Length of output: 2587
🏁 Script executed:
cat -n modules/infra/control/capture.c | sed -n '1,35p'Repository: DPDK/grout
Length of output: 1022
🏁 Script executed:
rg -n "synchronize|rcu|grace|quiescent|defer" modules/infra/control/capture.c modules/infra/datapath/gr_dp_capture.hRepository: DPDK/grout
Length of output: 36
🏁 Script executed:
cat -n modules/infra/control/capture.c | wc -lRepository: DPDK/grout
Length of output: 57
🏁 Script executed:
cat -n modules/infra/datapath/gr_dp_capture.h | sed -n '17,65p'Repository: DPDK/grout
Length of output: 1863
Session teardown creates use-after-free/unmap window for datapath readers.
capture_session_set_filter() destroys the previous BPF immediately after swapping, and capture_session_stop() frees the session and unmaps the ring immediately after clearing publication. A worker thread that has already loaded s from iface_capture[] (via memory_order_relaxed) can still dereference s->ring, s->bpf, and s->bpf_jit_func after these objects are destroyed/freed/unmapped. The relaxed load doesn't synchronize with the release store in capture_clear_flags(), creating an unbounded race window. This requires RCU or refcounting to ensure all in-flight workers complete before freeing resources.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/control/capture.c` around lines 193 - 235,
capture_session_set_filter() currently destroys the old BPF (old) immediately
after swapping, creating a use-after-free window for workers that read session
state from iface_capture[] with memory_order_relaxed; fix by introducing
lifetime management (RCU or a refcount) on the capture session and its resources
(s, s->ring, s->bpf, s->bpf_jit_func): modify the worker path that loads
sessions from iface_capture[] to acquire a reference (e.g., atomic_ref_get or
rcu_read_lock) and release it when done, update capture_session_set_filter() to
swap pointers without freeing old resources and instead defer destroy until the
refcount reaches zero (or call_rcu callback), and update
capture_session_stop()/capture_clear_flags() to drop its reference and schedule
actual free/unmap only after readers complete; adjust struct (add refcount or
RCU hooks) and ensure all stores use appropriate release semantics to
synchronize with reader acquisitions.
| if (unlikely(iface->flags & GR_IFACE_F_CAPTURE)) | ||
| capture_enqueue( | ||
| ctx->txq.port_id, | ||
| ctx->txq.queue_id, | ||
| (struct rte_mbuf **)objs, | ||
| nb_objs, | ||
| GR_CAPTURE_DIR_OUT, | ||
| iface | ||
| ); |
There was a problem hiding this comment.
TX capture occurs before VLAN tag insertion.
The capture happens in tx_begin() before tx_add_vlan() is called in the main tx_process() flow. This means captured TX packets won't include VLAN tags that get added to the wire. If the intent is to capture wire-format packets, this should be moved after tx_add_vlan().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/datapath/port_tx.c` around lines 61 - 69, The capture_enqueue
call in tx_begin() uses pre-VLAN mbufs so captured packets lack VLAN tags; move
the GR_IFACE_F_CAPTURE check and capture_enqueue invocation out of tx_begin()
and into the main tx_process() flow immediately after tx_add_vlan() (or after
any other code that mutates mbufs for wire format) so
capture_enqueue(ctx->txq.port_id, ctx->txq.queue_id, (struct rte_mbuf **)objs,
nb_objs, GR_CAPTURE_DIR_OUT, iface) receives the post-VLAN mbufs; update any
assumptions about objs ownership or pointers if tx_add_vlan() replaces or
reorders mbufs.
|
The current update is just a rebased on the main, I'll provide the updates based on the reviews with the following push |
There was a problem hiding this comment.
♻️ Duplicate comments (4)
modules/infra/control/capture.c (3)
128-128:⚠️ Potential issue | 🔴 CriticalRestrict shm permissions to owner-only.
The shared memory segment is created with mode
0644, making raw packet payloads readable by any local user. This is a confidentiality leak.Proposed fix
- int fd = shm_open(s->shm_path, O_CREAT | O_RDWR | O_EXCL, 0644); + int fd = shm_open(s->shm_path, O_CREAT | O_RDWR | O_EXCL, 0600);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/capture.c` at line 128, The shm_open call that creates the shared memory segment uses mode 0644, exposing packet payloads to other local users; change the creation mode to owner-only (0600) so only the process owner can read/write the segment—i.e., update the shm_open invocation (shm_open(s->shm_path, O_CREAT | O_RDWR | O_EXCL, 0644)) to use 0600 (or S_IRUSR|S_IWUSR) and, if desired for extra safety, follow up with fchmod on the returned fd to enforce the permission change.
251-274:⚠️ Potential issue | 🔴 CriticalUse-after-free window during session teardown.
capture_session_stop()clearsiface_capture[]withmemory_order_release, then immediately frees the session and unmaps the ring. A datapath worker that loadedsviamemory_order_relaxedbefore the clear can still dereferences->ring,s->bpf, ands->bpf_jit_funcafter they are freed/unmapped. This requires RCU-style deferred free or quiescent-state synchronization.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/capture.c` around lines 251 - 274, capture_session_stop currently frees and unmaps session state while datapath workers may still hold relaxed pointers (iface_capture[]), causing a use-after-free; change teardown to wait for readers using an RCU-like mechanism such as a per-session atomic refcount or grace-period wait: add an atomic_ref field to struct capture_session that datapath code increments when loading a session pointer and decrements when done, then modify capture_session_stop to clear iface_capture[] (memory_order_release), signal readers (if needed), and wait until atomic_ref == 0 before calling munmap, shm_unlink, rte_bpf_destroy, and free; ensure all datapath load sites use the increment/decrement around accesses to s->ring, s->bpf, and s->bpf_jit_func so no dereferences occur after free.
77-86:⚠️ Potential issue | 🟠 MajorHot-added ports bypass the frozen interface table.
When
iface_id == GR_IFACE_ID_UNDEF(capture all),iface_add_callbackenables capture on newly created ports even thoughn_ifacesand the shm interface table were frozen at session start. Packets from these ports will haveiface_idvalues with no corresponding descriptor in the ring metadata, breaking pcapng IDB mapping.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/capture.c` around lines 77 - 86, iface_add_callback currently marks any new port for capture when active_capture->iface_id == GR_IFACE_ID_UNDEF, which allows hot-added ports to bypass the frozen interface table; change the guard to only enable capture for interfaces that existed when the session started by checking the session's frozen interface count (e.g., use a field like capture_session->n_ifaces or other "frozen" indicator) and only set iface flags/atomic_store_explicit if iface->id < s->n_ifaces (or the equivalent frozen check) so newly created ports are ignored for this session; update references in iface_add_callback, atomic_store_explicit(&iface_capture[iface->id], s, ...), and iface_from_id(iface->id) to be conditional on that frozen check.pcap/pcap-grout.c (1)
86-113:⚠️ Potential issue | 🟠 MajorTimeout is ~10x shorter than intended.
GROUT_POLL_USis 100 microseconds, butwaited_msincrements by 1 per sleep iteration. When compared againsttimeout_ms, a 1000ms timeout actually waits only ~100ms.Proposed fix
static int pcap_grout_dispatch(pcap_t *p, int max_cnt, pcap_handler cb, u_char *cb_arg) { struct pcap_grout *pg = pcap_plugin_priv(p); struct pcap_pkthdr hdr; int pkt_cnt = 0; int timeout_ms = pcap_plugin_get_timeout(p); - int waited_ms = 0; + int waited_us = 0; int snapshot = pcap_plugin_get_snapshot(p); ... if (!gr_capture_ring_dequeue(pg->ring, &slot)) { if (pg->nonblock) break; - if (timeout_ms > 0 && waited_ms >= timeout_ms) + if (timeout_ms > 0 && waited_us >= timeout_ms * 1000) break; usleep(GROUT_POLL_US); - waited_ms++; + waited_us += GROUT_POLL_US; continue; } - waited_ms = 0; + waited_us = 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pcap/pcap-grout.c` around lines 86 - 113, The timeout bookkeeping in the pcap-grout.c loop is wrong: waited_ms increments by 1 per usleep(GROUT_POLL_US) but GROUT_POLL_US is in microseconds (100), so the timeout is ~10x shorter; fix the loop in the code that uses timeout_ms and waited_ms (around variables timeout_ms, waited_ms, GROUT_POLL_US and the usleep(...) loop where gr_capture_ring_dequeue() is called) by tracking elapsed time in the same units as timeout_ms—either change waited_ms to waited_us and compare against timeout_ms * 1000 while incrementing by GROUT_POLL_US, or keep waited_ms and increment by (GROUT_POLL_US / 1000) (rounding appropriately) so the comparison "waited_ms >= timeout_ms" is correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modules/infra/control/capture.c`:
- Line 128: The shm_open call that creates the shared memory segment uses mode
0644, exposing packet payloads to other local users; change the creation mode to
owner-only (0600) so only the process owner can read/write the segment—i.e.,
update the shm_open invocation (shm_open(s->shm_path, O_CREAT | O_RDWR | O_EXCL,
0644)) to use 0600 (or S_IRUSR|S_IWUSR) and, if desired for extra safety, follow
up with fchmod on the returned fd to enforce the permission change.
- Around line 251-274: capture_session_stop currently frees and unmaps session
state while datapath workers may still hold relaxed pointers (iface_capture[]),
causing a use-after-free; change teardown to wait for readers using an RCU-like
mechanism such as a per-session atomic refcount or grace-period wait: add an
atomic_ref field to struct capture_session that datapath code increments when
loading a session pointer and decrements when done, then modify
capture_session_stop to clear iface_capture[] (memory_order_release), signal
readers (if needed), and wait until atomic_ref == 0 before calling munmap,
shm_unlink, rte_bpf_destroy, and free; ensure all datapath load sites use the
increment/decrement around accesses to s->ring, s->bpf, and s->bpf_jit_func so
no dereferences occur after free.
- Around line 77-86: iface_add_callback currently marks any new port for capture
when active_capture->iface_id == GR_IFACE_ID_UNDEF, which allows hot-added ports
to bypass the frozen interface table; change the guard to only enable capture
for interfaces that existed when the session started by checking the session's
frozen interface count (e.g., use a field like capture_session->n_ifaces or
other "frozen" indicator) and only set iface flags/atomic_store_explicit if
iface->id < s->n_ifaces (or the equivalent frozen check) so newly created ports
are ignored for this session; update references in iface_add_callback,
atomic_store_explicit(&iface_capture[iface->id], s, ...), and
iface_from_id(iface->id) to be conditional on that frozen check.
In `@pcap/pcap-grout.c`:
- Around line 86-113: The timeout bookkeeping in the pcap-grout.c loop is wrong:
waited_ms increments by 1 per usleep(GROUT_POLL_US) but GROUT_POLL_US is in
microseconds (100), so the timeout is ~10x shorter; fix the loop in the code
that uses timeout_ms and waited_ms (around variables timeout_ms, waited_ms,
GROUT_POLL_US and the usleep(...) loop where gr_capture_ring_dequeue() is
called) by tracking elapsed time in the same units as timeout_ms—either change
waited_ms to waited_us and compare against timeout_ms * 1000 while incrementing
by GROUT_POLL_US, or keep waited_ms and increment by (GROUT_POLL_US / 1000)
(rounding appropriately) so the comparison "waited_ms >= timeout_ms" is correct.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 201b28bb-4ed2-4c25-a981-d80e1da3bc01
📒 Files selected for processing (19)
api/capture_ring_test.capi/gr_capture_ring.hapi/meson.buildmeson.buildmeson_options.txtmodules/infra/api/capture.cmodules/infra/api/gr_infra.hmodules/infra/api/meson.buildmodules/infra/cli/capture.cmodules/infra/cli/meson.buildmodules/infra/control/capture.cmodules/infra/control/gr_capture.hmodules/infra/control/meson.buildmodules/infra/datapath/gr_dp_capture.hmodules/infra/datapath/port_rx.cmodules/infra/datapath/port_tx.cpcap/meson.buildpcap/pcap-grout.csmoke/capture_test.sh
✅ Files skipped from review due to trivial changes (6)
- modules/infra/api/meson.build
- modules/infra/control/meson.build
- meson_options.txt
- api/meson.build
- modules/infra/cli/meson.build
- api/gr_capture_ring.h
🚧 Files skipped from review as they are similar to previous changes (4)
- modules/infra/datapath/port_rx.c
- modules/infra/datapath/port_tx.c
- modules/infra/datapath/gr_dp_capture.h
- modules/infra/control/gr_capture.h
|
TODO next: libcap's dependencies. |
There was a problem hiding this comment.
♻️ Duplicate comments (3)
modules/infra/datapath/port_tx.c (1)
65-75:⚠️ Potential issue | 🟠 MajorCapture snapshot is taken too early in TX path
At Line 65, capture runs in
tx_begin()beforetx_add_vlan()(Line 144/Line 164). This can record frames without final VLAN encapsulation and can also capture packets later dropped onNO_HEADROOM, so the capture stream can diverge from what is actually transmitted.Proposed fix
@@ - if (unlikely(iface->flags & GR_IFACE_F_CAPTURE)) { - const struct tx_node_ctx *tx = tx_node_ctx(node); - capture_enqueue( - tx->txq.port_id, - tx->txq.queue_id, - (struct rte_mbuf **)objs, - nb_objs, - GR_CAPTURE_DIR_OUT, - iface - ); - } - return true; } @@ nb_objs = tx_add_vlan(graph, node, objs, nb_objs, mbufs); if (unlikely(nb_objs == 0)) return 0; + + if (unlikely(iface->flags & GR_IFACE_F_CAPTURE)) + capture_enqueue( + ctx->txq.port_id, + ctx->txq.queue_id, + mbufs, + nb_objs, + GR_CAPTURE_DIR_OUT, + iface + ); tx_ok = rte_eth_tx_burst(ctx->txq.port_id, ctx->txq.queue_id, mbufs, nb_objs); @@ nb_objs = tx_add_vlan(graph, node, objs, nb_objs, mbufs); if (unlikely(nb_objs == 0)) return 0; + + if (unlikely(iface->flags & GR_IFACE_F_CAPTURE)) + capture_enqueue( + ctx->txq.port_id, + ctx->txq.queue_id, + mbufs, + nb_objs, + GR_CAPTURE_DIR_OUT, + iface + ); rte_spinlock_lock(ctx->lock);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/datapath/port_tx.c` around lines 65 - 75, The capture is invoked too early in tx_begin() (using tx_node_ctx and capture_enqueue), so frames may be recorded before tx_add_vlan() and before headroom/transmit failure checks; move the capture_enqueue call out of tx_begin() and instead call it after packet modifications are final (after tx_add_vlan()/any VLAN/encapsulation logic) and after headroom/NO_HEADROOM checks that can drop packets, ensuring you pass the final (struct rte_mbuf **)objs and nb_objs and still use tx->txq.port_id/queue_id and GR_CAPTURE_DIR_OUT; update tx_begin()/tx_add_vlan()/the TX path so capture only runs when the packet is guaranteed to be queued for transmit.modules/infra/control/capture.c (2)
241-265:⚠️ Potential issue | 🔴 CriticalUse-after-free race between session teardown and datapath readers.
The datapath loads
iface_capture[iface->id]withmemory_order_relaxedand may still be mid-capture (dereferencings->ring,s->bpf_jit_func) when the control plane stores NULL and immediately munmaps the ring / destroys the BPF. The relaxed load does not synchronize with the release store, leaving an unbounded window where datapath can access freed memory.Consider adding RCU or a quiescent-state mechanism to ensure all in-flight datapath accesses complete before freeing resources.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/capture.c` around lines 241 - 265, capture_session_stop frees session resources (munmap/close/free and rte_bpf_destroy) immediately after setting active_capture NULL, but datapath readers use iface_capture[...] with memory_order_relaxed and can still dereference s->ring or BPF JIT code, causing a use-after-free; modify teardown to wait for datapath quiescence before destroying resources by implementing an RCU/epoch or quiescent-state handshake: on stop (capture_session_stop) publish the NULL with a release-store, wait for readers to observe the change and drain in-flight readers (e.g., call a synchronize_rcu or epoch_wait API, or track per-worker quiescent counters checked by readers that load iface_capture in acquire mode), only then munmap(s->ring), destroy s->bpf (rte_bpf_destroy), close(s->shm_fd) and free(s); update readers that load iface_capture to use appropriate acquire loads or RCU read-side primitives to pair with the chosen grace-period mechanism.
74-81:⚠️ Potential issue | 🟠 MajorHot-added ports may produce packets with unmapped interface IDs.
When capturing all interfaces (
iface_id == GR_IFACE_ID_UNDEF), the interface table in the shared ring is frozen at session start (lines 151-162). If a new port is added during capture,iface_add_callbackenables capture for it, but the port won't have an entry in the ring's interface table. Downstream consumers (pcapng writers) usefind_iface_idx()which falls back to index 0, causing misattributed packets.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/capture.c` around lines 74 - 81, When capturing "all interfaces" (active_capture with iface_id == GR_IFACE_ID_UNDEF), iface_add_callback currently enables capture for hot-added ports even though the shared ring's interface table was frozen at session start, causing unmapped IDs and misattributed packets; update iface_add_callback (the function referencing active_capture, iface->id, iface_from_id, and iface_capture[]) to only set GR_IFACE_F_CAPTURE and atomic_store the session if the new iface actually has a valid mapping in the ring (i.e., check the ring/interface-table mapping for iface->id or a valid ring index) and otherwise ignore the hot-added iface (or postpone enabling until the ring is updated), so downstream callers using find_iface_idx won't fall back to index 0 for unmapped interfaces.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modules/infra/control/capture.c`:
- Around line 241-265: capture_session_stop frees session resources
(munmap/close/free and rte_bpf_destroy) immediately after setting active_capture
NULL, but datapath readers use iface_capture[...] with memory_order_relaxed and
can still dereference s->ring or BPF JIT code, causing a use-after-free; modify
teardown to wait for datapath quiescence before destroying resources by
implementing an RCU/epoch or quiescent-state handshake: on stop
(capture_session_stop) publish the NULL with a release-store, wait for readers
to observe the change and drain in-flight readers (e.g., call a synchronize_rcu
or epoch_wait API, or track per-worker quiescent counters checked by readers
that load iface_capture in acquire mode), only then munmap(s->ring), destroy
s->bpf (rte_bpf_destroy), close(s->shm_fd) and free(s); update readers that load
iface_capture to use appropriate acquire loads or RCU read-side primitives to
pair with the chosen grace-period mechanism.
- Around line 74-81: When capturing "all interfaces" (active_capture with
iface_id == GR_IFACE_ID_UNDEF), iface_add_callback currently enables capture for
hot-added ports even though the shared ring's interface table was frozen at
session start, causing unmapped IDs and misattributed packets; update
iface_add_callback (the function referencing active_capture, iface->id,
iface_from_id, and iface_capture[]) to only set GR_IFACE_F_CAPTURE and
atomic_store the session if the new iface actually has a valid mapping in the
ring (i.e., check the ring/interface-table mapping for iface->id or a valid ring
index) and otherwise ignore the hot-added iface (or postpone enabling until the
ring is updated), so downstream callers using find_iface_idx won't fall back to
index 0 for unmapped interfaces.
In `@modules/infra/datapath/port_tx.c`:
- Around line 65-75: The capture is invoked too early in tx_begin() (using
tx_node_ctx and capture_enqueue), so frames may be recorded before tx_add_vlan()
and before headroom/transmit failure checks; move the capture_enqueue call out
of tx_begin() and instead call it after packet modifications are final (after
tx_add_vlan()/any VLAN/encapsulation logic) and after headroom/NO_HEADROOM
checks that can drop packets, ensuring you pass the final (struct rte_mbuf
**)objs and nb_objs and still use tx->txq.port_id/queue_id and
GR_CAPTURE_DIR_OUT; update tx_begin()/tx_add_vlan()/the TX path so capture only
runs when the packet is guaranteed to be queued for transmit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b872db80-6dda-424f-a95b-84d51ea111ac
📒 Files selected for processing (30)
api/capture_ring_test.capi/gr_api.hapi/gr_api_client_impl.hapi/gr_capture_ring.hapi/meson.buildmain/api.cmain/gr_module.hmeson.buildmeson_options.txtmodules/infra/api/capture.cmodules/infra/api/gr_infra.hmodules/infra/api/meson.buildmodules/infra/cli/capture.cmodules/infra/cli/meson.buildmodules/infra/control/capture.cmodules/infra/control/gr_capture.hmodules/infra/control/meson.buildmodules/infra/datapath/bond_output.cmodules/infra/datapath/gr_dp_capture.hmodules/infra/datapath/iface_input.cmodules/infra/datapath/iface_output.cmodules/infra/datapath/port_rx.cmodules/infra/datapath/port_tx.cmodules/infra/datapath/xconnect.cmodules/infra/datapath/xvrf.cmodules/ipip/datapath_in.cmodules/ipip/datapath_out.cpcap/meson.buildpcap/pcap-grout.csmoke/capture_test.sh
✅ Files skipped from review due to trivial changes (7)
- modules/infra/cli/meson.build
- modules/infra/control/meson.build
- modules/infra/api/meson.build
- modules/infra/datapath/port_rx.c
- meson_options.txt
- api/gr_capture_ring.h
- api/meson.build
🚧 Files skipped from review as they are similar to previous changes (3)
- meson.build
- pcap/meson.build
- modules/infra/datapath/gr_dp_capture.h
e904eb2 to
f790ba5
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/gr_api_client_impl.h`:
- Around line 183-188: The truncated-response branch after recvmsg() (where 0 <
n < sizeof(*resp)) returns -1 without setting errno, which lets the caller
compute return value 0; fix this in the helper that handles recvmsg()/resp by
setting a non-zero errno (e.g. EIO or EPROTO) before returning -1 so callers
doing return -errno cannot get 0; update the branch that currently checks "if (n
< 0 || (size_t)n < sizeof(*resp)) return -1;" to ensure the "(size_t)n <
sizeof(*resp)" case explicitly sets errno (and keep existing ECONNRESET for n ==
0) — reference recvmsg(), resp, errno, and ECONNRESET to locate the logic to
change.
In `@api/gr_capture_ring.h`:
- Around line 252-269: The consumer can accept a torn slot because the producer
only publishes by setting slot->sequence after writing; fix by implementing a
two-phase publish: change the producer to atomically set slot->sequence to an
"in-progress" value before writing the slot contents and only store the final
generation (pos+1) with release semantics after memcpy completes, and update the
consumer (the code reading slot->sequence, memcpy, seq2 comparisons) to treat
any sequence that is not the final published generation as invalid (skip/retry)
and to use acquire semantics when loading sequence; adjust r->cons_head update
logic to advance only when a valid final generation was observed. Ensure atomic
memory orders: producer’s final store is memory_order_release and consumer’s
loads are memory_order_acquire to prevent observing torn writes.
In `@modules/infra/datapath/port_rx.c`:
- Around line 230-238: The virtio receive paths (rx_virtio_process and
rx_bond_virtio_process) are bypassing the capture hook; modify these functions
to call the same post-burst capture step used by the other RX implementations by
invoking capture_enqueue when ctx->iface->flags & GR_IFACE_F_CAPTURE (mirroring
the existing call signature: capture_enqueue(ctx->rxq.port_id,
ctx->rxq.queue_id, mbufs, rx, GR_CAPTURE_DIR_IN, ctx->iface)), or refactor the
common post-burst handling into a shared helper (e.g.,
iface_input_postburst_capture) and call it from rx_virtio_process,
rx_bond_virtio_process and the non-virtio paths so all variants run
capture_enqueue consistently.
In `@modules/ipip/datapath_out.c`:
- Around line 82-86: Duplicate egress capture occurs because
iface_output_process() already enqueues logical-interface captures before
dispatching; remove the extra capture_enqueue call in the IPIP egress path (the
block that checks iface->flags & GR_IFACE_F_CAPTURE and calls capture_enqueue in
modules/ipip/datapath_out.c) so packets aren't exported twice. Locate the
capture_enqueue(...) invocation inside the IPIP egress code path and delete that
conditional block (or skip calling capture_enqueue there) while leaving the
existing iface_output_process() capture behavior unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d1a191e9-6f1c-4a3a-92de-6a1ba7cedb93
📒 Files selected for processing (30)
api/capture_ring_test.capi/gr_api.hapi/gr_api_client_impl.hapi/gr_capture_ring.hapi/meson.buildmain/api.cmain/gr_module.hmeson.buildmeson_options.txtmodules/infra/api/capture.cmodules/infra/api/gr_infra.hmodules/infra/api/meson.buildmodules/infra/cli/capture.cmodules/infra/cli/meson.buildmodules/infra/control/capture.cmodules/infra/control/gr_capture.hmodules/infra/control/meson.buildmodules/infra/datapath/bond_output.cmodules/infra/datapath/gr_dp_capture.hmodules/infra/datapath/iface_input.cmodules/infra/datapath/iface_output.cmodules/infra/datapath/port_rx.cmodules/infra/datapath/port_tx.cmodules/infra/datapath/xconnect.cmodules/infra/datapath/xvrf.cmodules/ipip/datapath_in.cmodules/ipip/datapath_out.cpcap/meson.buildpcap/pcap-grout.csmoke/capture_test.sh
✅ Files skipped from review due to trivial changes (8)
- modules/infra/control/meson.build
- modules/infra/api/meson.build
- modules/infra/cli/meson.build
- meson.build
- modules/infra/datapath/iface_input.c
- meson_options.txt
- modules/infra/control/gr_capture.h
- api/meson.build
🚧 Files skipped from review as they are similar to previous changes (6)
- modules/infra/datapath/bond_output.c
- modules/infra/datapath/xvrf.c
- main/gr_module.h
- modules/infra/datapath/gr_dp_capture.h
- main/api.c
- api/gr_api.h
| if (unlikely(iface->flags & GR_IFACE_F_CAPTURE)) | ||
| capture_enqueue( | ||
| UINT16_MAX, UINT16_MAX, | ||
| &mbuf, 1, GR_CAPTURE_DIR_OUT, iface | ||
| ); |
There was a problem hiding this comment.
This duplicates IPIP egress captures.
iface_output_process() now already enqueues packets on the logical iface before dispatching by type (modules/infra/datapath/iface_output.c, Lines 72-77). Keeping a second enqueue here exports the same IPIP TX packet twice under the same iface_id—first pre-encap, then post-encap—which will skew counts and make tcpdump -i grout:<ipip> show duplicate traffic.
Suggested fix
- if (unlikely(iface->flags & GR_IFACE_F_CAPTURE))
- capture_enqueue(
- UINT16_MAX, UINT16_MAX,
- &mbuf, 1, GR_CAPTURE_DIR_OUT, iface
- );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (unlikely(iface->flags & GR_IFACE_F_CAPTURE)) | |
| capture_enqueue( | |
| UINT16_MAX, UINT16_MAX, | |
| &mbuf, 1, GR_CAPTURE_DIR_OUT, iface | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ipip/datapath_out.c` around lines 82 - 86, Duplicate egress capture
occurs because iface_output_process() already enqueues logical-interface
captures before dispatching; remove the extra capture_enqueue call in the IPIP
egress path (the block that checks iface->flags & GR_IFACE_F_CAPTURE and calls
capture_enqueue in modules/ipip/datapath_out.c) so packets aren't exported
twice. Locate the capture_enqueue(...) invocation inside the IPIP egress code
path and delete that conditional block (or skip calling capture_enqueue there)
while leaving the existing iface_output_process() capture behavior unchanged.
There was a problem hiding this comment.
It seems it is a wrong analisys. I did not find it. Elaborate if you have detected a bug ?
There was a problem hiding this comment.
🧩 Analysis chain
Let me check the actual packet flow in the codebase to validate or refute my earlier analysis.
🏁 Script executed:
#!/bin/bash
# Check iface_output.c to understand the capture and dispatch logic
fd -e c -g "iface_output.c" --exec cat {}
echo "---"
# Check ip_output dispatch to interface types
fd -e c -g "ip_output.c" | head -5
fd -e c --full-path "*/ip4*" --exec grep -l "ip_output_register_interface_type" {}
fd -e c --exec grep -n "ip_output_register_interface_type\|iface_output\|GR_IFACE_TYPE_IPIP" {} 2>/dev/null | grep -v "Binary\|\.o:"
echo "---"
# Look for how packets flow from ip_output to ipip_output vs iface_output
rg -n "ip_output_register_interface_type" --type c -A2 -B2
Length of output: 7321
@vjardin, you are correct — my original analysis was wrong. I apologize for the noise.
After tracing the actual code:
ip_outputdispatches IPIP-bound packets directly toipip_outputvia its owniface_type_edgestable (populated byip_output_register_interface_type). This path does not go throughiface_output_process.- The
capture_enqueueinipip_outputcaptures the inner (pre-encap) packet on the IPIP logical interface. - After encapsulation, the outer packet is sent back to
ip_outputand eventually reachesiface_output_processfor the physical ethernet interface — a completely different iface and a different capture point.
These are two distinct captures on two different interfaces, not duplicates. The code is correct as written. I'm retracting the concern.
✏️ Learnings added
Learnt from: vjardin
Repo: DPDK/grout PR: 565
File: modules/ipip/datapath_out.c:82-86
Timestamp: 2026-03-23T23:58:20.138Z
Learning: In the grout codebase, `ip_output` and `iface_output` maintain separate iface-type dispatch tables. `ip_output_register_interface_type` registers next nodes for L3 dispatch (e.g., ipip_output for GR_IFACE_TYPE_IPIP), while `iface_output_type_register` registers L2/physical next nodes (e.g., port_output for GR_IFACE_TYPE_PORT). IPIP egress packets flow ip_output → ipip_output directly, bypassing iface_output for the IPIP logical interface. iface_output only sees the outer/physical iface for the encapsulated packet. Therefore capture_enqueue calls in ipip_output (for the IPIP logical iface) and in iface_output (for the physical iface) are NOT duplicates — they capture at different layers. Do not flag these as duplicate captures.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: rjarry
Repo: DPDK/grout PR: 553
File: modules/infra/control/loopback.c:154-165
Timestamp: 2026-03-16T08:03:05.858Z
Learning: In the grout codebase, the `tun_pi` header prepended by the TUN device is NOT stripped in `modules/infra/control/loopback.c`. It is stripped in the datapath node `loopback_input_process` in `modules/infra/datapath/loop_input.c` via `rte_pktmbuf_adj(mbuf, sizeof(struct tun_pi))`, before the packet is dispatched to the L3 stack. Do not flag missing tun_pi stripping in loopback.c.
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).
Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.
Learnt from: david-marchand
Repo: DPDK/grout PR: 551
File: modules/infra/datapath/port_rx.c:156-160
Timestamp: 2026-03-17T08:11:06.305Z
Learning: In `modules/infra/datapath/port_rx.c`, the bounds check `rte_pktmbuf_data_len(m) >= csum_offset + 2` in `fix_l4_csum()` can never fail for virtio because the DPDK virtio driver guarantees all headers (including the L4 checksum field) are stored in the first segment/descriptor. Do not flag this guard as an issue in virtio RX paths.
Learnt from: rjarry
Repo: DPDK/grout PR: 466
File: modules/ip6/control/icmp6.c:173-178
Timestamp: 2025-12-18T16:04:37.486Z
Learning: In modules/ip/control/icmp.c and modules/ip6/control/icmp6.c, the mbufs stored in icmp_queue are owned by other pktmbuf pools that are freed elsewhere. It is not necessary to free these mbufs in icmp_fini/icmp6_fini before calling rte_mempool_free on the icmp_queue_item mempool.
Learnt from: david-marchand
Repo: DPDK/grout PR: 551
File: modules/infra/datapath/port_rx.c:154-155
Timestamp: 2026-03-16T15:27:46.743Z
Learning: In the grout codebase, assume all mbufs are mono-segment. Therefore rte_pktmbuf_data_len(m) is equivalent to rte_pktmbuf_pkt_len(m); multi-segment mbuf concerns do not apply. Do not flag multi-segment handling issues in grout code. This guidance applies across grout C sources under modules/, and review checks should not raise multi-segment related issues for such files unless there is explicit evidence of multi-segment usage in a specific case.
There was a problem hiding this comment.
♻️ Duplicate comments (3)
modules/infra/control/capture.c (3)
183-226:⚠️ Potential issue | 🔴 CriticalSame race applies to BPF filter replacement.
capture_session_set_filter()stores the newbpfandbpf_jit_funcpointers, then immediately destroys the old BPF object (lines 224-225). A datapath worker that loadedsbefore the swap may still be callings->bpf_jit_func(mbuf)orrte_bpf_exec(s->bpf, mbuf)on the destroyed object.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/capture.c` around lines 183 - 226, The current capture_session_set_filter replaces s->bpf and s->bpf_jit_func and then immediately calls rte_bpf_destroy(old), which can race with datapath readers calling s->bpf_jit_func or rte_bpf_exec on the old object; to fix it, implement a safe handover (e.g., reference counting or RCU-style grace period) around the swap in capture_session_set_filter: increment a refcount on the new struct rte_bpf (or take an RCU pointer) before storing s->bpf and s->bpf_jit_func, publish the new pointers atomically, and defer calling rte_bpf_destroy(old) until readers are guaranteed to have finished (decrement refcount or wait for grace period), updating the code paths that call rte_bpf_load, rte_bpf_get_jit, and rte_bpf_destroy accordingly so no reader can access a freed object.
74-81:⚠️ Potential issue | 🟠 MajorHot-added interfaces bypass the frozen interface table.
When capture targets all interfaces (
s->iface_id == GR_IFACE_ID_UNDEF),iface_add_callbackenablesGR_IFACE_F_CAPTUREon newly added ports. However, the interface table in the shared-memory ring is frozen at session start. Packets from hot-added ports will haveslot.iface_idvalues that don't exist in the ring'sifaces[]table, breaking downstream interface resolution (e.g.,find_iface_idx()in CLI returns 0 for unknown ports, mapping them to the wrong IDB).
241-269:⚠️ Potential issue | 🔴 CriticalUse-after-free race between datapath and session teardown.
After
capture_clear_flags()stores NULL withmemory_order_release, the session is immediately freed (line 265) and unmapped (line 261). A datapath worker that loadedsfromiface_capture[]withmemory_order_relaxedbefore the store can still be mid-execution incapture_enqueue(), dereferencings->ring,s->bpf, ors->bpf_jit_func.The release-acquire pattern only synchronizes future loads that see the NULL value — it does not wait for in-flight readers that already loaded the old pointer. This requires
rte_rcu_qsbr_synchronize()before freeing, as implemented in other modules (e.g., modules/ip/control/address.c, modules/infra/control/port.c).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/capture.c` around lines 241 - 269, capture_session_stop currently clears the session pointer with capture_clear_flags()/memory_order_release then immediately unmaps/closes/frees the session, causing a use-after-free for in-flight readers that previously loaded the pointer from iface_capture[] (see capture_enqueue). Fix by inserting an RCU synchronization (call rte_rcu_qsbr_synchronize() as used in modules/ip/control/address.c and modules/infra/control/port.c) immediately after capture_clear_flags() (and before destroying BPF, munmap, close, or free), so any in-flight datapath readers complete before you destroy s, s->ring, s->bpf or s->bpf_jit_func; then proceed with teardown (destroy BPF, load atomics, munmap, close, free).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modules/infra/control/capture.c`:
- Around line 183-226: The current capture_session_set_filter replaces s->bpf
and s->bpf_jit_func and then immediately calls rte_bpf_destroy(old), which can
race with datapath readers calling s->bpf_jit_func or rte_bpf_exec on the old
object; to fix it, implement a safe handover (e.g., reference counting or
RCU-style grace period) around the swap in capture_session_set_filter: increment
a refcount on the new struct rte_bpf (or take an RCU pointer) before storing
s->bpf and s->bpf_jit_func, publish the new pointers atomically, and defer
calling rte_bpf_destroy(old) until readers are guaranteed to have finished
(decrement refcount or wait for grace period), updating the code paths that call
rte_bpf_load, rte_bpf_get_jit, and rte_bpf_destroy accordingly so no reader can
access a freed object.
- Around line 241-269: capture_session_stop currently clears the session pointer
with capture_clear_flags()/memory_order_release then immediately
unmaps/closes/frees the session, causing a use-after-free for in-flight readers
that previously loaded the pointer from iface_capture[] (see capture_enqueue).
Fix by inserting an RCU synchronization (call rte_rcu_qsbr_synchronize() as used
in modules/ip/control/address.c and modules/infra/control/port.c) immediately
after capture_clear_flags() (and before destroying BPF, munmap, close, or free),
so any in-flight datapath readers complete before you destroy s, s->ring, s->bpf
or s->bpf_jit_func; then proceed with teardown (destroy BPF, load atomics,
munmap, close, free).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 34a72828-91bd-4287-87a3-ca7b6327a49a
📒 Files selected for processing (11)
meson.buildmeson_options.txtmodules/infra/api/capture.cmodules/infra/api/gr_infra.hmodules/infra/cli/capture.cmodules/infra/control/capture.cmodules/infra/control/gr_capture.hpcap/meson.buildpcap/pcap-grout.csmoke/capture_test.shsubprojects/libpcap.wrap
✅ Files skipped from review due to trivial changes (2)
- subprojects/libpcap.wrap
- meson_options.txt
🚧 Files skipped from review as they are similar to previous changes (2)
- meson.build
- pcap/meson.build
d97315a to
010508d
Compare
The date format needs a % character. Fixes: 8de3676 ("api: generate a single grout.h umbrella header") Signed-off-by: Robin Jarry <[email protected]>
b4d05b4 to
b9304ee
Compare
Remove the EINTR/EAGAIN retry loops from send_all() and recv_all() so that signals like SIGINT can interrupt blocking API calls. Without this, grcli commands that block on event reception cannot be stopped *gracefully* with Ctrl-C. Also remove MSG_WAITALL from recv_all(). On Linux, MSG_WAITALL causes recv() to restart automatically after a signal handler runs instead of returning EINTR, defeating the purpose of the signal. The while loop in recv_all() already handles short reads, so MSG_WAITALL was redundant on a Unix stream socket where messages are delivered atomically anyway. Signed-off-by: Robin Jarry <[email protected]>
Replace signal() with sigaction() without SA_RESTART in the ping and traceroute commands. On Linux, signal() implicitly sets SA_RESTART which causes recv() to restart after the signal handler runs instead of returning EINTR. This prevents Ctrl-C from interrupting blocking API calls. Signed-off-by: Robin Jarry <[email protected]>
Add optional "type" and "count" arguments to the "events" command so that callers can subscribe to specific event categories and stop after a given number of events. Each cli_event_printer now carries a name field (iface, nexthop, addr4, route4, addr6, route6, fdb, flood) used for type filtering and tab completion. When types are specified, only the corresponding events are subscribed to instead of GR_EVENT_ALL. Signal handlers for SIGINT/SIGTERM/SIGPIPE are installed around the event receive loop so that the command can be interrupted cleanly. Combined with the previous commit that stops retrying on EINTR, this allows "grcli events" to be terminated with Ctrl-C or a kill signal. Signed-off-by: Robin Jarry <[email protected]>
Add a GR_API_INLINE macro that normally expands to static inline. In check_api.sh, it is redefined to empty so the compiler emits an external symbol that abidiff can track. For inline functions whose body is part of the API contract (e.g. shared memory ring accessors), compare the disassembly of both revisions with objdump. Since both binaries are built with the same toolchain and flags, any difference is a real code change. Treat removed or changed inline functions as breaking. Signed-off-by: Robin Jarry <[email protected]>
Allow API handlers to return a file descriptor alongside the response payload using SCM_RIGHTS ancillary data on the UNIX socket. On the server side, api_out_fd() sets an fd in struct api_out. When fd >= 0, the response is sent via sendmsg() with SCM_RIGHTS instead of bufferevent_write(). The fd is closed after sending. On the client side, recv() is replaced by recvmsg() so SCM_RIGHTS ancillary data is captured. gr_api_client_recv_fd() and gr_api_client_send_recv_fd() return the received fd to callers that need it. The existing gr_api_client_recv() delegates to recv_fd() and silently closes any unexpected fd. Out-of-order cached responses now carry their fd so it is preserved until the matching recv call picks it up. Suggested-by: Robin Jarry <[email protected]> Signed-off-by: Vincent Jardin <[email protected]>
Lock-free MPSC ring for packet capture over shared memory. Producers (datapath workers) claim slots with a single atomic fetch-add on prod_head and publish via store-release on the per-slot sequence number. The consumer reads with acquire semantics and detects overwrites via sequence mismatch, skipping ahead when lapped. Fixed 4096-byte slots hold a 32-byte metadata header (timestamps, interface id, direction, lengths) followed by raw packet data. The ring header carries TSC calibration so consumers can convert timestamps to wall-clock nanoseconds without system calls. A stress test spawns multiple producer threads and one consumer on a MAP_ANONYMOUS ring, validating per-producer ordering and data integrity. With 16 producers pushing 1M messages each through a 4096-slot ring (batch size 32), the test sustains ~290 Mslots/s on x86-64 and keeps corruption below 0.2%. Signed-off-by: Vincent Jardin <[email protected]>
Add libpcap as a meson wrap subproject with three patches on top of upstream commit bac2884b. These patches implement the plugin loader proposed in the-tcpdump-group/libpcap#1659: - runtime plugin loader for external capture modules - timestamp type accessors for adapter code - complete accessor API for tcpdump feature parity The meson.build exposes a plugindir variable via declare_dependency() so consumers can install plugins in the right location. Signed-off-by: Robin Jarry <[email protected]>
The control plane manages capture sessions backed by memfd_create(). Multiple captures can run concurrently (one per interface), each with its own shared memory ring. When not in test mode, the memfd uses hugepages (MFD_HUGETLB | MFD_HUGE_2MB). The memfd is passed to clients via SCM_RIGHTS through the existing fd-passing mechanism. BPF filter support converts classic BPF bytecode to eBPF via rte_bpf_convert() and JIT-compiles it with rte_bpf_get_jit(). API handlers wire GR_CAPTURE_START (returns capture_id, mmap_flags and the memfd), GR_CAPTURE_SET_FILTER, GR_CAPTURE_STOP and GR_CAPTURE_LIST. The CLI "capture" command maps the ring, writes pcapng to stdout and polls with usleep when idle. It refuses to write to a terminal and requires piping to tcpdump or redirecting to a file. An optional BPF filter expression is compiled with pcap_compile() and sent to the server for datapath JIT filtering. The pcapng writer is a minimal hand-rolled implementation because libpcap only writes legacy pcap (no per-interface metadata) and no standalone pcapng library exists. Signed-off-by: Vincent Jardin <[email protected]>
Add the capture_enqueue() inline function that writes packets into the shared memory ring. It skips mbufs already marked with a per-mbuf ol_flags dynflag (gr_captured) to prevent double-capture when a packet traverses multiple capture points. When a BPF filter is installed, only matching packets enter the ring. Add CAPTURE_VARS/CAPTURE_ENQUEUE/CAPTURE_FLUSH batching macros that accumulate consecutive packets for the same interface and flush with a single capture_enqueue() call, reducing the number of atomic operations from one per packet to one per burst run. Hook capture at the logical interface level: iface_input captures after VLAN resolution, iface_output before VLAN encapsulation. Also hook xconnect, xvrf, and ipip tunnel nodes in both directions. On the control plane, capture packets going to and from the Linux tap device in ctlplane.c. Signed-off-by: Vincent Jardin <[email protected]>
Build pcap-grout.so as a shared module loaded at runtime by libpcap's plugin loader. The plugin connects to grout's API socket, starts a shared memory capture session, maps the ring and reads packets from it. It pushes BPF filters to grout's datapath for JIT execution so only matching packets enter the ring. Device names use the "grout:" prefix (e.g. "tcpdump -i grout:p0"). The "grout:all" device captures on all interfaces. Signed-off-by: Vincent Jardin <[email protected]>
We already run grcli events in the background for debugging purposes. Redirect the output of it to a file that can be read by tests. Add two helper functions to wait for certain events. Example: mark_events grcli interface add .... wait_event "iface up" mark_events must be called *before* running the command that will trigger the awaited event. wait_event accepts optional flags: -c <count> Change the number of awaited events. By default it is 1. -t <timeout> Change the number of seconds after which wait_event will abort. Signed-off-by: Robin Jarry <[email protected]>
Test per-interface and all-interfaces capture via grcli capture dump piped through tcpdump. Verify capture list/stop lifecycle, concurrent captures on different interfaces, same-interface rejection, snaplen truncation, and post-stop reuse. Also test native tcpdump via the pcap-grout.so plugin: single interface capture, all-interfaces capture, device enumeration (tcpdump -D), and BPF datapath filtering. The BPF test sends mixed ICMP and UDP traffic with an "icmp" filter and checks that the ring packet count stays low, confirming the JIT filter runs in the datapath rather than in libpcap. Signed-off-by: Vincent Jardin <[email protected]>
Add shared memory packet capture with native tcpdump integration through a libpcap plugin architecture (currently WIP upstream as libpcap).
Once the plugin loader lands in libpcap, tcpdump can capture directly from grout's datapath cores using
tcpdump -i grout:p0without any need of a secondary DPDK process required. Until then, grcli capture dump provides the samefunctionality by writing pcapng to stdout.
Unlike the previous series, this implementation has no dependency on DPDK's pdump infrastructure or secondary processes. Packets are copied once into a lock-free shared memory ring (MPSC, ~290 Mslots/s) and read directly by the consumer (pcap context) via mmap.
BPF filters from tcpdump are pushed to grout's datapath where DPDK's rte_bpf JIT compiles them to native code, so only matching packets enter the ring. Currently it is tested on x86-64 only.
Shared-Memory Capture Ring
POSIX shared-memory, lock-free MPSC capture ring added (api/gr_capture_ring.h). Fixed slot layout with 4KiB slot size, explicit memory layout helpers, and gr_capture_ring_memsize(). Producers reserve slots via atomic fetch_add on prod_head and publish each slot by storing slot->sequence with release semantics; single consumer dequeues using gr_capture_ring_dequeue which reads prod_head, masks the slot index, validates per-slot sequence with acquire ordering, handles producer lap-over by advancing cons_head when prod - pos > slot_count, and detects torn/overwritten reads by re-checking sequence after memcpy. Timestamp conversion helper gr_capture_slot_timestamp_ns uses ring-stored tsc_hz/tsc_ref and realtime_ref_ns. Multithreaded C11 stress test (api/capture_ring_test.c) exercises producers/consumer, verifies per-slot signatures, measures corruption/out-of-order counts, and reports throughput (~290M slots/s in stress runs).
Control Plane & Session Management
Control-plane capture session lifecycle and API surface introduced:
Datapath BPF Filtering & Packet Capture
Datapath inline capture path implemented (modules/infra/datapath/gr_dp_capture.h):
API, CLI & libpcap Plugin
API:
CLI:
libpcap plugin:
Smoke Tests and Build Integration
Smoke test script smoke/capture_test.sh added to exercise per-interface and all-interfaces captures using grcli capture dump, verify pcapng outputs are non-empty and tcpdump-readable, assert single-session enforcement and snaplen truncation, and conditionally stage and test native tcpdump plugin behavior when pcap-grout.so and libpcap plugin support are available (including a datapath BPF/JIT verification that captures with an ICMP filter while generating mixed ICMP/UDP traffic and asserts UDP is filtered).
Build integration:
API/ABI and Public Symbols