Skip to content

stack: mark conntrack seed nosave for checkpoint#13181

Open
ibondarenko1 wants to merge 1 commit into
google:masterfrom
ibondarenko1:hardening/conntrack-seed-nosave
Open

stack: mark conntrack seed nosave for checkpoint#13181
ibondarenko1 wants to merge 1 commit into
google:masterfrom
ibondarenko1:hardening/conntrack-seed-nosave

Conversation

@ibondarenko1
Copy link
Copy Markdown

Summary

pkg/tcpip/stack/conntrack.go:227 declares the conntrack bucket hash seed:

// +stateify savable
type ConnTrack struct {
    // seed is a one-time random value initialized at stack startup
    // and is used in the calculation of hash keys for the list of buckets.
    // It is immutable.
    seed uint32

The field has no state:"nosave" tag. Sibling fields in the same struct already carry the tag (rand at conntrack.go:232, mu at conntrack.go:234). When IPTables save/restore lands (TODO at stack.go:120 references gvisor.dev/issue/4595), the absence of the tag will let stateify serialize this 32-bit hash seed verbatim into the checkpoint state stream, and a reader of the checkpoint will be able to recover it and predict bucket assignment in the restored sandbox.

This PR tags the field state:"nosave" and extends (*Stack).afterLoad in pkg/tcpip/stack/save_restore.go to redraw the seed from secureRNG on restore. Two regression tests are added.

The pattern is the same as commit b027f39193 "tcp: mark ISN and timestamp-offset secrets nosave for checkpoint" which closed the equivalent gap for the TCP protocol's seqnumSecret and tsOffsetSecret.

Affected code at HEAD

pkg/tcpip/stack/conntrack.go:222-241:

// +stateify savable
type ConnTrack struct {
    // seed is a one-time random value initialized at stack startup
    // and is used in the calculation of hash keys for the list of buckets.
    // It is immutable.
    seed uint32

    // clock provides timing used to determine conntrack reapings.
    clock tcpip.Clock
    // TODO(b/341946753): Restore when netstack is savable.
    rand *rand.Rand `state:"nosave"`

    mu connTrackRWMutex `state:"nosave"`

pkg/tcpip/stack/stack.go:120:

// tables are the iptables packet filtering and manipulation rules.
// TODO(gvisor.dev/issue/4595): S/R this field.
tables *IPTables `state:"nosave"`

pkg/tcpip/stack/save_restore.go:50-53:

// afterLoad is invoked by stateify.
func (s *Stack) afterLoad(context.Context) {
    s.insecureRNG = rand.New(rand.NewSource(time.Now().UnixNano()))
    s.secureRNG = cryptorand.RNGFrom(cryptorand.Reader)
}

pkg/tcpip/stack/conntrack.go:613 (the seed consumer):

h := jenkins.Sum32(ct.seed)

Change

  1. pkg/tcpip/stack/conntrack.go: tag seed state:"nosave". Update the field comment to describe the regeneration contract and reference issue Restore netstack state on save/restore #4595.

  2. pkg/tcpip/stack/save_restore.go: extend (*Stack).afterLoad to redraw s.tables.connections.seed from s.secureRNG after the existing secureRNG re-init. Nil-guard on s.tables because Stack.tables is itself tagged state:"nosave" and may be nil post-restore until IPTables S/R is wired up.

  3. pkg/tcpip/stack/save_restore_test.go (new file): two regression tests.

  4. pkg/tcpip/stack/BUILD: add the new test file to the stack_test srcs.

Test plan

  • bazelisk build //pkg/tcpip/stack:stack passes on the patched tree.
  • bazelisk test //pkg/tcpip/stack:stack_test --test_filter='TestConnTrackSeedHasNosaveTag|TestStackAfterLoadRegeneratesConnTrackSeed' passes.
  • gofmt -l pkg/tcpip/stack/{conntrack.go,save_restore.go,save_restore_test.go} returns clean.
  • Stateify codegen verified: pre-patch ConnTrack.StateFields() returns ["seed","clock","buckets"]; post-patch returns ["clock","buckets"]. Seed is no longer Save'd or Load'd. 82-byte reduction in stack_state_autogen.go.

Why now, before IPTables S/R lands

Today Stack.tables is state:"nosave" so an end-to-end checkpoint never reaches ConnTrack. The seed is not currently leaked.

The TODO at stack.go:120 plans to make this field savable under issue #4595. At that point, every field on ConnTrack that lacks state:"nosave" will start flowing into checkpoint state. Tagging seed now keeps the eventual Stack.tables save change a minimal lift on the maintainer's side and avoids a window where shipped restorations of pre-fix checkpoints freeze the bucket layout.

The same forward-compatible argument applies to the regenerative afterLoad: adding it now is one line plus a nil-guard, and it activates automatically when issue #4595 lands.

Linux / BSD reference

Linux nf_conntrack_hash_rnd is allocated in kernel memory and is not part of any userspace save / restore file. There is no equivalent leak surface on Linux. FreeBSD pf and nftables similarly do not have a userspace save / restore stream that would expose this internal hash seed. The exposure surface here is gVisor-specific because the gVisor netstack lives in user memory and is checkpointable.

Related

  • Commit b027f39193 "tcp: mark ISN and timestamp-offset secrets nosave for checkpoint" - sister-class fix in the TCP transport protocol. Same pattern: tag the secret, regenerate from secureRNG on afterLoad.
  • Issue gvisor.dev/issue/4595 - tracking IPTables save / restore. This PR is forward-compatible hardening for that work.

Notes

This PR is hardening only. It does not claim a CVE and does not request a CVE. The seed is not currently leaked at HEAD because Stack.tables is state:"nosave". The patch closes a gap before the parent pointer becomes savable.

Copy link
Copy Markdown
Contributor

@nybidari nybidari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buckets in connTrack are identified based on the key which is generated using this seed and tupleID. If the seed changes, during restore, the buckets should be updated as well with the new seed. I suggest keeping this the same way.

@ibondarenko1
Copy link
Copy Markdown
Author

Thanks for the review @nybidari. You are right that bucket_index = jenkins.Sum32(ct.seed) % len(buckets) couples seed to bucket layout. Regenerating seed without rehashing existing entries makes restored entries unreachable by lookup.

A few notes on the current state, then three options.

Today the regeneration is a no-op. Stack.tables is itself state:"nosave" at pkg/tcpip/stack/stack.go:120 (TODO at the same line references gvisor.dev/issue/4595). After state.Load, s.tables is nil. The boot path then calls opts.DefaultIPTables(clock, insecureRNG) which constructs a fresh IPTables with a fresh ConnTrack with empty buckets and a fresh rand.Uint32() seed. The nil-guard if s.tables != nil in (*Stack).afterLoad means the new line in this PR does not execute today.

The patch was scoped as forward-compat for the day issue #4595 lands. The reasoning was: tag seed state:"nosave" now and add the regenerative hook now, so that when Stack.tables becomes savable, both pieces are already in place and we avoid a window where existing checkpoint files persist the seed verbatim.

You are correct that with #4595 resolved, regenerating seed without rehashing buckets is the wrong behavior. That makes this PR incomplete for the future-savable case.

Three options. Happy to take whichever you prefer.

A. Drop the regen, keep only the tag plus comment. Replace the afterLoad line with a comment that documents the bucket-rehash requirement, deferring the implementation to whoever lands #4595. The state:"nosave" annotation still prevents accidental seed serialization once the parent becomes savable. Reviewer will see the comment and either implement the rehash alongside the parent-tag flip or override the design.

B. Add bucket rehash to afterLoad. When s.tables != nil, walk s.tables.connections.buckets, recompute the hash for each conn under the new seed, and re-insert into the new bucket. Concurrency is not a concern because afterLoad runs before the netstack accepts traffic. This makes the patch correct for the future-savable case but adds non-trivial logic ahead of the parent-tag change.

C. Close this PR and wait for #4595. The seed-handling design belongs in the #4595 implementation. Open a small companion change at that time.

My preference between A and C is whichever lines up with the team's intent for #4595. B is also fine if you want the patch landed standalone but I do not want to design the rehash logic without your input on the locking model around connections.mu during afterLoad.

Let me know which way to take it.

The ConnTrack struct in pkg/tcpip/stack/conntrack.go is +stateify
savable. seed is a 32-bit value used in conntrack bucket index
hashing via jenkins.Sum32 at conntrack.go:613. It does not carry the
state:nosave tag, so stateify codegen treats it as a serializable
field and emits Save/Load calls in stack_state_autogen.go.

Sibling fields in the same struct already carry the tag (rand at
conntrack.go:232, mu at conntrack.go:234). The absence of the tag
on seed is an omission against the established convention.

Today the leak is contingent: Stack.tables is itself tagged
state:nosave at stack.go:120 with TODO gvisor.dev/issue/4595,
so an end-to-end checkpoint never reaches ConnTrack. When issue
google#4595 lands and IPTables becomes savable, every unsaved field on
ConnTrack will start flowing into checkpoint state, and a reader
of the checkpoint blob will recover the seed and predict bucket
assignment in the restored sandbox until the stack is destroyed.

Tag seed state:nosave. The post-restore regeneration is NOT
implemented in this commit: bucket_index = jenkins.Sum32(seed) %
len(buckets) couples the seed value to bucket layout, so redrawing
seed without rehashing entries leaves restored entries unreachable
by Lookup. The rehash strategy is deferred to the implementer of
issue google#4595, who will make ConnTrack savable and decide whether
to rehash entries or preserve the seed value. A TODO comment in
(*Stack).afterLoad in save_restore.go documents the requirement.

Stateify codegen diff after the change (verbatim):

  ConnTrack.StateFields() pre-patch:  [seed, clock, buckets]
  ConnTrack.StateFields() post-patch: [clock, buckets]

82-byte reduction in pkg/tcpip/stack/stack_state_autogen.go.

Adds one regression test in save_restore_test.go: a reflection
check that the tag is present.

Tested:
  bazelisk build //pkg/tcpip/stack:stack
  bazelisk test //pkg/tcpip/stack:stack_test       --test_filter='TestConnTrackSeedHasNosaveTag'

Related: commit b027f39 "tcp: mark ISN and timestamp-offset
secrets nosave for checkpoint" closed the equivalent gap for the
TCP transport protocol secrets. Same pattern applied here.

Review: addresses nybidari@ feedback on the original PR. The
afterLoad regen line is dropped because the bucket-seed coupling
makes a standalone redraw incorrect once entries are present in
checkpoint state. The state:"nosave" tag remains as a safety
contract so the seed cannot accidentally land in checkpoint bytes
when google#4595 is implemented.
@ibondarenko1 ibondarenko1 force-pushed the hardening/conntrack-seed-nosave branch from e69074f to e57b659 Compare May 14, 2026 21:05
@ibondarenko1
Copy link
Copy Markdown
Author

Going with Option A per your suggestion @nybidari.

Updated commit drops the afterLoad regen line and replaces it with a TODO(gvisor.dev/issue/4595) comment in (*Stack).afterLoad documenting the bucket-rehash requirement that needs to land alongside the parent-tag flip. The state:"nosave" tag stays on ConnTrack.seed as a safety contract. When Stack.tables becomes savable, the implementer will see the comment and either wire up the rehash logic or override the design.

This keeps the PR minimal and avoids designing the locking model around connections.mu without your input. The post-restore correctness work belongs in the same change that flips Stack.tables to savable.

Force-with-lease push moved the branch from e69074f491 to e57b6596bf. Diff vs original commit is two files:

  • pkg/tcpip/stack/save_restore.go lost the regen line and gained the TODO.
  • pkg/tcpip/stack/save_restore_test.go dropped TestStackAfterLoadRegeneratesConnTrackSeed and tightened the doc on TestConnTrackSeedHasNosaveTag.

The stateify codegen effect is unchanged: ConnTrack.StateFields() still goes from ["seed", "clock", "buckets"] to ["clock", "buckets"].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants