Skip to content

Conversation

@bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Dec 19, 2025

Originally, Persist was built with the assumption that callers would reliably call the ReadHandle methods (like maybe_downgrade_since) with some reasonable frequency - say, at least every few minutes. This assumption turns out not to be true for a variety of reasons... and hasn't even been true for Persist's own internal use of ReadHandle! (For example, calls to snapshot can block indefinitely, so you'd need to add timeouts to those calls and call them in a loop with a downgrade operation, like we do in this recent PR. Not a nice pattern!) We shifted heartbeating to a background thread, which helped quite a bit with lease timeouts, but didn't entirely solve the issue - failing to downgrade regularly could cause unnecessary seqno holds, for example.

This PR shifts all interaction with the reader lease to the heartbeat task, which is now responsible for downgrading both the since and seqno holds, and manages the final expiry of the state. Communication between the handle and the task is via some shared state, which is managed with an abstraction similar to Persist's Applier for shard-level state. This ensures that the state is updated regularly, and with the most recent possible information, even when whomever holds the handle is busy with other stuff.

Motivation

https://github.com/MaterializeInc/incidents-and-escalations/issues/224

A core issue with this stuff is that we can't downgrade our seqno hold when nobody is interacting with the client (downgrading frontiers etc.) and in practice it's difficult to ensure that every handle is interacted with promptly. The background heartbeat task was a partial solution for this, which handled only extending leases but not releasing any unnecessarily held resources. Automatically downgrading the seqno hold without requiring the caller to do anything proactively should help avoid most cases of leaked seqnos.

This is a more general solution than the recent #34458, which times out certain long-running operations so we can run downgrades etc. With this PR, the downgrades will run in the background heartbeat task no matter what long-running operations prevents progress in the frontend.

Tips for reviewer

After this PR, the only way to have a long-running seqno hold should be if a caller has leased a part and not returned it. I'm optimistic that that case is rare - it involves more obviously-wrong client behaviour, where the case this PR fixes is pretty subtle - and anyways there's not much we could do about that in Persist internals, since holding back the seqno is needed for correctness.

@bkirwi bkirwi force-pushed the task-lease branch 6 times, most recently from 5e66761 to 080ec06 Compare January 3, 2026 00:26
@bkirwi bkirwi changed the title [persist] [wip] Task lease [persist] Shift all heartbeat / frontier management for the read handle to a background task Jan 5, 2026
bkirwi added 4 commits January 5, 2026 13:54
This is somewhere between Tokio's `watch` and a normal mutex with a
condvar. Kind of annoying to have to write one's own library for this,
but everything else I tried was either awkward or needed an unbounded
channel...
These two are much more tightly related than the task is to the read
handle, and this allows for some additional privacy for shared state.
Observe new seqnos in both the handle and the heartbeat thread

Always downgrade to the shared seqno

Should be pretty recent!

Use the main downgrade-since method in the background thread

Do all downgrades in the background task

Shift expiry to the background task

Move the heartbeat task to read.rs

This is a bit more read-handle business than machine business, in my
opinion!

Comment fixup
@bkirwi bkirwi marked this pull request as ready for review January 5, 2026 20:16
@bkirwi bkirwi requested review from a team as code owners January 5, 2026 20:16
@bkirwi
Copy link
Contributor Author

bkirwi commented Jan 5, 2026

Deployed to staging. Overall metrics look good - the rate of downgrades is less spiky and they see fewer conflicts.

One negative metric change is that the heartbeat task wakes up substantially more often. I believe this is because frequent downgrading of the since will trigger the background task to wake, even if there's nothing to do yet. I think I could rework the code to avoid this, though it's a bit tedious, and it's only a mild drawback, so I will wait to be asked first. :)

self.since = new_since.clone();
self.hold_state.modify(|s| {
s.downgrade_since(new_since);
s.request_sync = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably peek the outstanding seqno here to decide if you wanted to request a wakeup... but I'm not too worried

Copy link
Member

@DAlperin DAlperin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants