Rotate subscription and tick tables to avoid held-xmin bloat (#61)#62
Rotate subscription and tick tables to avoid held-xmin bloat (#61)#62
Conversation
…at (#61) PgQ's event tables already rotate across event_<queue>_0/1/2 children so dead tuples from long-held xmin stay bounded. The sibling metadata tables (pgque.subscription, pgque.tick) are still plain UPDATE-heavy tables, and R5 measured ~14k / ~7k peak dead tuples on them during a 30-min held-xmin window -- which drags iter-TPS 2-5x below the clean baseline even though the event tables themselves stay clean. This commit applies the same rotation pattern to both metadata tables: * pgque.subscription becomes a UNION ALL view over three physical children subscription_0/1/2, filtered to the currently-active child via pgque.meta_rotation.cur_subscription_table. INSTEAD OF triggers route INSERT/UPDATE/DELETE to that active child. * pgque.tick becomes a UNION ALL view over tick_0/1/2 (no active- child filter -- consumer sub_last_tick may legitimately reference an older child). INSTEAD OF triggers route INSERT to the active child. * pgque.maint_rotate_metadata() performs the flip: truncate the (cur+1)%3 slot, copy live subscription rows from cur to the new slot, flip the pointer. Tick's flip is conditional: we only truncate the target tick slot if no live sub_last_tick references a tick_id that lives there. * pgque.maint_rotate_metadata_step2() ticks last_rotation_step2_txid in a separate transaction (same two-step pattern PgQ uses for event rotation), and is scheduled alongside pgque.maint_rotate_tables_step2 by pgque.start(). Rotation period defaults to 30 seconds (tunable via GUC `pgque.meta_rotation_period`). maint_operations() emits both the step1 and step2 calls for the maintenance orchestrator; maint() skips the step2 calls because they must run in their own transactions. pgque_uninstall.sql picks up the new children transitively via DROP SCHEMA ... CASCADE -- no change to the public uninstall shape. References #61 (issue). Acceptance criteria verified by R6 smoke test (separate commit in tests-and-benchmarks repo). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up suggestion: decouple metadata rotation cadence from event rotationThe PR currently ties subscription/tick rotation to the existing Observed in R6 smoke (benchmark
So metadata rotation at 30 s is 4-6× more frequent than it needs to be. Costs:
ProposalAdd a separate config field on ALTER TABLE pgque.queue
ADD COLUMN IF NOT EXISTS queue_metadata_rotation_period interval NOT NULL DEFAULT '2 minutes';Default to 2 min; the rotation trigger in the ticker uses this field for subscription/tick rotation specifically, while event-table rotation keeps using At 2 min cadence the expected peak bloat is:
Still meets acceptance criteria while cutting pg_class bloat and TRUNCATE churn by 4×. Benchmark override: Rationale for separationEvent tables are bulk-storage tables where rotation = capacity management (drop old events you no longer need). Metadata tables are tiny-hot tables where rotation = bloat mitigation. Same mechanism, different economics — deserve independent knobs. Scope for this PR vs follow-upThis is a design-polish suggestion, not a correctness blocker for landing the current PR. Options:
I'd lean toward (1) since the acceptance criteria for the metadata-bloat fix (pgque#61) arguably includes "reasonable default cadence for production" — and 30 s matching events is a debatable default. Either way, the rotation design itself is validated by the R6 smoke; this is a tuning refinement. |
REV Code Review — #62Summary: Fixes #61 by extending PgQ's 3-table rotation pattern to Installed on PG18 locally — full BLOCKING — 0NON-BLOCKINGMEDIUM No new regression tests for rotation logic
MEDIUM
LOW
LOW
LOW
POTENTIAL ISSUESMEDIUM Hot-path overhead: every
LOW The subscription view's
INFO The tick rotation's "defer if any consumer's sub_last_tick lives in the target slot" rule means a single stuck consumer can block tick rotation indefinitely while subscription rotation keeps happening
Verified on a local PG18 install
No crashes, no FK violations, no grant errors on install, no issues with the INSTEAD OF triggers. The design holds together in practice. Summary
Result: REQUEST CHANGES (non-blocking). The design is correct, the manual validation holds, and the existing test suite keeps passing. But shipping a change of this scope without a dedicated REV-assisted review (SOC2 checks skipped per request). |
…ark/ (issue #77) Adds a strictly-additive benchmark/ directory documenting the methodology, tooling, and operational lessons from the pgque-vs-pgq-vs-pgmq-vs-river-vs-que-vs-pgboss-vs-pgmq-partitioned bench that backs #61 and PR #62. - README.md: entry point + quick-start - METHODOLOGY.md: adapted from GitLab #77 note 3263767264 - OPS_GOTCHAS.md: 15 operational lessons (NEW — NVMe mount, partman stale rows, que func leftovers, pgboss covering index, pgq ticker, pgque xid8 bug, spot reclaim, ASH prereqs, NOTICE instrumentation, etc.) - HARDWARE.md: i4i.2xlarge specs, PG tuning, microbench baselines - tooling/, runners/, consumers/, producers/, install/, charts/, gifs/ No pgque production SQL is touched. Refs: #61, #62. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Fixes #61.
Applies PgQ's own 3-table rotation pattern to the two PgQ metadata tables that are NOT already rotated —
pgque.subscriptionandpgque.tick— so that held-xmin bloat on those tables is bounded instead of unbounded.The event tables (
pgque.event_<queue>_0/_1/_2) already rotate, and R4/R5 showed their dead-tuple count stays at 0 even under a 10-minute idle-in-transaction. But the metadata tables don't rotate; under the same R5 run they peaked at ~14 312 dead tuples onpgque.subscriptionand ~7 154 onpgque.tick, which dragged pgque's consumer iter-TPS from ~3 500 down to ~1 540 during the held-xmin window.This PR extends the rotation pattern to those two tables.
Design
pgque.subscriptionbecomes a view (UNION ALLover three physical childrensubscription_0 / _1 / _2), filtered to the currently-active child viapgque.meta_rotation.cur_subscription_table.INSTEAD OFtriggers route everyINSERT/UPDATE/DELETEto the active child.pgque.tickbecomes a view (UNION ALLovertick_0 / _1 / _2) with no active-child filter — a consumer'ssub_last_tickcan legitimately reference a tick that was inserted before the most recent rotation.INSTEAD OFtriggers routeINSERTs to the active child.DELETEs (as issued by the existingmaint_rotate_tables_step1) fan out to all three children.pgque.maint_rotate_metadata()performs the rotation step1:TRUNCATEthe(cur + 1) % 3slot,INSERT … SELECTlive subscription rows fromcur→ the new slot, flipcur. For tick, truncation is gated on "no livesub_last_tickreferences rows on the target slot."pgque.maint_rotate_metadata_step2()is the step2 counterpart (same pattern PgQ uses for event rotation), scheduled bypgque.start()alongside the existingpgque_rotate_step2cron job.pgque.meta_rotation_period.maint_operations()emits both rotation calls;maint()skips the step2 call (it needs its own transaction, same as the eventmaint_rotate_tables_step2case).No external dependencies added. Install path stays
psql -f sql/pgque.sql. Uninstall picks up the new children transitively viaDROP SCHEMA … CASCADE. No C accelerator.Correctness trade-offs
The subscription view filters to
cur, which means that any transaction whose MVCC snapshot was taken before a rotation committed will (viapgque.meta_rotation) see the oldcur's child. That matches how PgQ's event tables already work: long-running read snapshots that predate a rotation see the pre-rotation layout. pgque itself has no code path that readspgque.subscriptionunder a long-lived snapshot;next_batch_custom/finish_batchare short transactions.Tick rotation is more conservative than subscription rotation: we only truncate the target tick slot if no live
sub_last_tickcurrently resolves to a row in it. In the (rare) case where a consumer has lagged for longer than two rotation periods, the tick flip is deferred until the consumer catches up. Subscription rotation still happens in that case — which is the dominant bloat source.Acceptance criteria
pgque.subscription_*peak dead tuples ≤ 500 (was 14 312 on R5 without this patch).pgque.tick_*peak dead tuples ≤ 200 (was 7 154 on R5 without this patch).tests/*.sqlcontinue to pass.Test plan
psql -f sql/pgque.sql) produces no errors or warnings.pgque.create_queue,pgque.register_consumer, send → ticker → next_batch → finish_batch happy path works across multiple forced rotations.cur = 0 → 1 → 2 → 0),pgque.subscriptionview returns exactly one row per(sub_queue, sub_consumer)andsub_last_tickresolves via the view.subscription_<cur>are cleared to 0 after rotation (heap-only — tuples on the previous slot stick around until that slot is truncated two rotations later).R6 smoke results will be posted as a follow-up comment on this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com