stack: mark conntrack seed nosave for checkpoint#13181
Conversation
nybidari
left a comment
There was a problem hiding this comment.
The buckets in connTrack are identified based on the key which is generated using this seed and tupleID. If the seed changes, during restore, the buckets should be updated as well with the new seed. I suggest keeping this the same way.
|
Thanks for the review @nybidari. You are right that A few notes on the current state, then three options. Today the regeneration is a no-op. The patch was scoped as forward-compat for the day issue You are correct that with Three options. Happy to take whichever you prefer. A. Drop the regen, keep only the tag plus comment. Replace the afterLoad line with a comment that documents the bucket-rehash requirement, deferring the implementation to whoever lands B. Add bucket rehash to afterLoad. When C. Close this PR and wait for My preference between A and C is whichever lines up with the team's intent for Let me know which way to take it. |
The ConnTrack struct in pkg/tcpip/stack/conntrack.go is +stateify savable. seed is a 32-bit value used in conntrack bucket index hashing via jenkins.Sum32 at conntrack.go:613. It does not carry the state:nosave tag, so stateify codegen treats it as a serializable field and emits Save/Load calls in stack_state_autogen.go. Sibling fields in the same struct already carry the tag (rand at conntrack.go:232, mu at conntrack.go:234). The absence of the tag on seed is an omission against the established convention. Today the leak is contingent: Stack.tables is itself tagged state:nosave at stack.go:120 with TODO gvisor.dev/issue/4595, so an end-to-end checkpoint never reaches ConnTrack. When issue google#4595 lands and IPTables becomes savable, every unsaved field on ConnTrack will start flowing into checkpoint state, and a reader of the checkpoint blob will recover the seed and predict bucket assignment in the restored sandbox until the stack is destroyed. Tag seed state:nosave. The post-restore regeneration is NOT implemented in this commit: bucket_index = jenkins.Sum32(seed) % len(buckets) couples the seed value to bucket layout, so redrawing seed without rehashing entries leaves restored entries unreachable by Lookup. The rehash strategy is deferred to the implementer of issue google#4595, who will make ConnTrack savable and decide whether to rehash entries or preserve the seed value. A TODO comment in (*Stack).afterLoad in save_restore.go documents the requirement. Stateify codegen diff after the change (verbatim): ConnTrack.StateFields() pre-patch: [seed, clock, buckets] ConnTrack.StateFields() post-patch: [clock, buckets] 82-byte reduction in pkg/tcpip/stack/stack_state_autogen.go. Adds one regression test in save_restore_test.go: a reflection check that the tag is present. Tested: bazelisk build //pkg/tcpip/stack:stack bazelisk test //pkg/tcpip/stack:stack_test --test_filter='TestConnTrackSeedHasNosaveTag' Related: commit b027f39 "tcp: mark ISN and timestamp-offset secrets nosave for checkpoint" closed the equivalent gap for the TCP transport protocol secrets. Same pattern applied here. Review: addresses nybidari@ feedback on the original PR. The afterLoad regen line is dropped because the bucket-seed coupling makes a standalone redraw incorrect once entries are present in checkpoint state. The state:"nosave" tag remains as a safety contract so the seed cannot accidentally land in checkpoint bytes when google#4595 is implemented.
e69074f to
e57b659
Compare
|
Going with Option A per your suggestion @nybidari. Updated commit drops the This keeps the PR minimal and avoids designing the locking model around Force-with-lease push moved the branch from
The stateify codegen effect is unchanged: |
Summary
pkg/tcpip/stack/conntrack.go:227declares the conntrack bucket hash seed:The field has no
state:"nosave"tag. Sibling fields in the same struct already carry the tag (randat conntrack.go:232,muat conntrack.go:234). When IPTables save/restore lands (TODO at stack.go:120 referencesgvisor.dev/issue/4595), the absence of the tag will let stateify serialize this 32-bit hash seed verbatim into the checkpoint state stream, and a reader of the checkpoint will be able to recover it and predict bucket assignment in the restored sandbox.This PR tags the field
state:"nosave"and extends(*Stack).afterLoadinpkg/tcpip/stack/save_restore.goto redraw the seed fromsecureRNGon restore. Two regression tests are added.The pattern is the same as commit
b027f39193"tcp: mark ISN and timestamp-offset secrets nosave for checkpoint" which closed the equivalent gap for the TCP protocol'sseqnumSecretandtsOffsetSecret.Affected code at HEAD
pkg/tcpip/stack/conntrack.go:222-241:pkg/tcpip/stack/stack.go:120:pkg/tcpip/stack/save_restore.go:50-53:pkg/tcpip/stack/conntrack.go:613(the seed consumer):Change
pkg/tcpip/stack/conntrack.go: tagseedstate:"nosave". Update the field comment to describe the regeneration contract and reference issue Restore netstack state on save/restore #4595.pkg/tcpip/stack/save_restore.go: extend(*Stack).afterLoadto redraws.tables.connections.seedfroms.secureRNGafter the existingsecureRNGre-init. Nil-guard ons.tablesbecauseStack.tablesis itself taggedstate:"nosave"and may be nil post-restore until IPTables S/R is wired up.pkg/tcpip/stack/save_restore_test.go(new file): two regression tests.pkg/tcpip/stack/BUILD: add the new test file to thestack_testsrcs.Test plan
bazelisk build //pkg/tcpip/stack:stackpasses on the patched tree.bazelisk test //pkg/tcpip/stack:stack_test --test_filter='TestConnTrackSeedHasNosaveTag|TestStackAfterLoadRegeneratesConnTrackSeed'passes.gofmt -l pkg/tcpip/stack/{conntrack.go,save_restore.go,save_restore_test.go}returns clean.ConnTrack.StateFields()returns["seed","clock","buckets"]; post-patch returns["clock","buckets"]. Seed is no longer Save'd or Load'd. 82-byte reduction instack_state_autogen.go.Why now, before IPTables S/R lands
Today
Stack.tablesisstate:"nosave"so an end-to-end checkpoint never reachesConnTrack. The seed is not currently leaked.The TODO at
stack.go:120plans to make this field savable under issue #4595. At that point, every field onConnTrackthat lacksstate:"nosave"will start flowing into checkpoint state. Taggingseednow keeps the eventualStack.tablessave change a minimal lift on the maintainer's side and avoids a window where shipped restorations of pre-fix checkpoints freeze the bucket layout.The same forward-compatible argument applies to the regenerative afterLoad: adding it now is one line plus a nil-guard, and it activates automatically when issue #4595 lands.
Linux / BSD reference
Linux
nf_conntrack_hash_rndis allocated in kernel memory and is not part of any userspace save / restore file. There is no equivalent leak surface on Linux. FreeBSDpfandnftablessimilarly do not have a userspace save / restore stream that would expose this internal hash seed. The exposure surface here is gVisor-specific because the gVisor netstack lives in user memory and is checkpointable.Related
b027f39193"tcp: mark ISN and timestamp-offset secrets nosave for checkpoint" - sister-class fix in the TCP transport protocol. Same pattern: tag the secret, regenerate from secureRNG on afterLoad.gvisor.dev/issue/4595- tracking IPTables save / restore. This PR is forward-compatible hardening for that work.Notes
This PR is hardening only. It does not claim a CVE and does not request a CVE. The seed is not currently leaked at HEAD because
Stack.tablesisstate:"nosave". The patch closes a gap before the parent pointer becomes savable.