Skip to content

Do not attribute logical decoding aborts to xact_rollback#28

Open
NikolayS wants to merge 2854 commits into
masterfrom
fix/xact-rollback-spike-logical-decoding
Open

Do not attribute logical decoding aborts to xact_rollback#28
NikolayS wants to merge 2854 commits into
masterfrom
fix/xact-rollback-spike-logical-decoding

Conversation

@NikolayS

@NikolayS NikolayS commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Summary

v2 — reworked after a thorough synthetic review pass. Key shape changes vs. v1:

  • API: the fix no longer exposes pgstat_cancel_xact_rollback() as a "decrement the counter" helper. It now uses a module-local flag in pgstat_database.c (pgstat_begin_internal_xact_cleanup() / pgstat_end_internal_xact_cleanup()), which AtEOXact_PgStat_Database() reads to skip the bump entirely. No more bump-and-undo.
  • Covers more sites: also fixes SnapBuildClearExportedSnapshot() (same-shape bug, hit on replication-snapshot cleanup).
  • Test: folded into src/test/subscription/t/100_bugs.pl as a new bug section (not a new file). Asserts a delta-from-baseline (not a literal 0 after pg_stat_reset), runs only 5 INSERTs, and polls walsender-gone filtered by application_name.
  • Commit message: "Do not …" subject, Backpatch-through: 15 trailer, no Co-Authored-By in the PG commit body.

Fixes report CAG0ozMo_xWQn+Avv8jzbbhePGp5OnhdO+YWTkdg4faWSXz0Jzg@mail.gmail.com (Rafael Thofehrn Castro, 2024-06-14), still present in master as of 2026-04-16. See also postgres-ai/tests-and-benchmarks#39 for the production-incident context.

Root cause

  • ReorderBufferProcessTXN() ends each decoded transaction with AbortCurrentTransaction() for catalog cleanup.
  • In the walsender (using_subtxn == false, entered via StartTransactionCommand()), that abort is top-level, so AtEOXact_PgStat_Database(isCommit=false) bumps the backend-local pgStatXactRollback. Counts flush to shared stats on walsender exit → the spike.
  • In the pg_logical_slot_get_changes() path (using_subtxn == true), the abort is of an internal subtransaction and goes through AtEOSubXact_PgStat, which never touches the counter.

Fix

  • New module-local pgStatInternalXactCleanup flag in pgstat_database.c, set/cleared by two public helpers.
  • AtEOXact_PgStat_Database() early-returns when the flag is set (same slot as the existing parallel skip).
  • Two call sites in ReorderBufferProcessTXN() bracket AbortCurrentTransaction() with the begin/end pair, gated on !using_subtxn.
  • SnapBuildClearExportedSnapshot() gets the same bracket (unconditionally — that path is always top-level).
  • applyparallelworker.c:1458 is intentionally left alone: that is a real user-visible rollback on the subscriber side.

Reproduction evidence

Two postgres:17 Docker containers:

                            xact_commit | xact_rollback
baseline (after reset)     |          1 |             0
after 1000 autocommit INS  |       1001 |             0    <- walsender still running
after ALTER SUBSCRIPTION   |       1005 |          1000    <- walsender gone, spike

With the patch, the post-DISABLE rollback value stays at baseline.

TAP test (RED / GREEN)

New section in src/test/subscription/t/100_bugs.pl: 5 autocommit INSERTs, wait_for_catchup, ALTER SUBSCRIPTION xrb_sub DISABLE, poll walsender-gone filtered by application_name='xrb_sub', assert cmp_ok($final - $baseline, '==', 0).

  • Against unpatched master: got: 5 / expected: 0 — FAIL.
  • With the patch: PASS.

Test plan

  • meson test --suite subscription --suite test_decoding — 42/42 + 3/3 pass (incl. the new 100_bugs.pl section).
  • meson test postgresql:recovery/006_logical_decoding postgresql:recovery/010_logical_decoding_timelines postgresql:pg_basebackup/030_pg_recvlogical — pass.
  • meson test postgresql:recovery/029_stats_restart postgresql:pg_stat_statements/regress — pass.
  • meson test postgresql:regress/regress — 248/248 pass.

Review trail

  • v1 review round — three agents, verdicts: needs revision / accept with minor / needs work. Consensus blockers: API shape (don't expose pgStatXactRollback through a public helper), > 0 silent-flooring guard, too-tight is(…, '0') assertion, missing fix in snapbuild.c. All addressed here.

🤖 Generated with Claude Code

Richard Guo and others added 30 commits April 10, 2026 15:51
var_is_nonnullable() failed to consider varreturningtype, which meant
it could incorrectly claim a Var is non-nullable based on a column's
NOT NULL constraint even when the Var refers to a non-existent row.
Specifically, OLD.col is NULL for INSERT (no old row exists) and
NEW.col is NULL for DELETE (no new row exists), regardless of any NOT
NULL constraint on the column.

This caused the planner's constant folding in eval_const_expressions
to incorrectly simplify IS NULL / IS NOT NULL tests on such Vars.  For
example, "old.a IS NULL" in an INSERT's RETURNING clause would be
folded to false when column "a" has a NOT NULL constraint, even though
the correct result is true.

Fix by returning false from var_is_nonnullable() when varreturningtype
is not VAR_RETURNING_DEFAULT, since such Vars can be NULL regardless
of table constraints.

Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDfaAipL6YzOq2H=gAhKBbcUTYmfbAv+W1zueOfRKH43FQ@mail.gmail.com
The static variable afterTriggerFiringDepth introduced by commit
5c54c3e is logically part of the after-trigger state and in
hindsight should have been a field in AfterTriggersData alongside
query_depth and the other per-transaction after-trigger state.
Move it there as firing_depth.  Also update its comment to
accurately reflect its sole remaining purpose: signaling to
AfterTriggerIsActive() that after-trigger firing is active.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFt4NGTNk7BinOsHHM48E9zGAa852vCfGoSe1bbL=JNFQ@mail.gmail.com
The test added in 980c1a8 covered reordered FK columns with
different types, which triggered an "operator not a member of opfamily"
error in the fast-path prior to that commit.  Add a test for the
same-type case, which is also fixed by that commit but where the wrong
scan key ordering instead produced a spurious FK violation without any
internal error.

Reported-by: Fredrik Widlert <fredrik.widlert@digpro.se>
Discussion: https://postgr.es/m/CADfhSr8hYc-4Cz7vfXH_oV-Jq81pyK9W4phLrOGspovsg2W7Kw@mail.gmail.com
When the incremental JSON parser splits a numeric token across chunk
boundaries, it accumulates continuation characters into the partial
token buffer.  The accumulator's switch statement unconditionally
accepted '+', '-', '.', 'e', and 'E' as valid numeric continuations
regardless of position, which violated JSON number grammar
(-? int [frac] [exp]).  For example, input "4-" fed in single-byte
chunks would accumulate the '-' into the numeric token, producing an
invalid token that later triggered an assertion failure during
re-lexing.

Fix by tracking parser state (seen_dot, seen_exp, prev character)
across the existing partial token and incoming bytes, so that each
character class is accepted only in its grammatically valid position.
When decoding a match tag, pglz_decompress() reads 2 bytes (or 3
for extended-length matches) from the source buffer before checking
whether enough data remains.  The existing bounds check (sp > srcend)
occurs after the reads, so truncated compressed data that ends
mid-tag causes a read past the allocated buffer.

Fix by validating that sufficient source bytes are available before
reading each part of the match tag.  The post-read sp > srcend
check is no longer needed and is removed.

Found by fuzz testing with libFuzzer and AddressSanitizer.
Add 12 libFuzzer-compatible fuzzing harnesses behind a new -Dfuzzing=true
meson option.  Each harness implements LLVMFuzzerTestOneInput() and can
also be built in standalone mode (reading from files) when no fuzzer
engine is detected.

Frontend targets (no backend dependencies):
  fuzz_json            - non-incremental JSON parser (pg_parse_json)
  fuzz_json_incremental - incremental/chunked JSON parser
  fuzz_conninfo        - libpq connection string parser (PQconninfoParse)
  fuzz_pglz            - PGLZ decompressor (pglz_decompress)
  fuzz_unescapebytea   - libpq bytea unescape (PQunescapeBytea)
  fuzz_b64decode       - base64 decoder (pg_b64_decode)
  fuzz_saslprep        - SASLprep normalization (pg_saslprep)
  fuzz_parsepgarray    - array literal parser (parsePGArray)
  fuzz_pgbench_expr    - pgbench expression parser (via Bison/Flex)

Backend targets (link against postgres_lib):
  fuzz_rawparser       - SQL raw parser (raw_parser)
  fuzz_regex           - regex engine (pg_regcomp/pg_regexec)
  fuzz_typeinput       - type input functions (numeric/date/timestamp/interval)
In commit b15c151 I missed the memo about not using Size in new
code.

Per complaint from Thomas Munro

Discussion: https://postgr.es/m/CA+hUKGJkeTVuq5u5WKJm6xkwmW577UuQ7fA=PyBCSR3h9g2GtQ@mail.gmail.com
This reverts commit 4a18907.

inadvertenly pushed, mea culpa
Commit 21b018e lowered some logical decoding messages from LOG to DEBUG1.
However, per discussion on pgsql-hackers, messages from background activity
(e.g., walsender or slotsync worker) should remain at LOG, as they are less
frequent and more likely to indicate issues that DBAs should notice.

For foreground SQL functions (e.g., pg_logical_slot_peek_binary_changes()),
keep these messages at DEBUG1 to avoid excessive log noise. They can still be
enabled by lowering client_min_messages or log_min_messages for the session.

This commit updates logical decoding to log these messages at LOG for
background activity and at DEBUG1 for foreground execution.

Suggested-by: Robert Haas <robertmhaas@gmail.com>
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CA+TgmoYsu2+YAo9eLGkDp5VP-pfQ-jOoX382vS4THKHeRTNgew@mail.gmail.com
Use consistent phrasing for parallel vacuum descriptions between
manual VACUUM and autovacuum. Specifically, clarify that the parallel
worker count is limited by the respective options only if they are
explicitly specified.

Also, fix a typo in the parallel vacuum section.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://postgr.es/m/CAJ7c6TPcSqzhbhrsiCMmVwmE8F7pwS7i9J49SP1zPKS_ER+vcA@mail.gmail.com
The test in test_autovacuum was unstable because it called
log_contains() immediately after verifying autovacuum_count in
pg_stat_user_tables. This created a race condition where the
statistics could be updated before the autovacuum logs were fully
flushed to disk.

This commit replaces log_contains() with wait_for_log() to ensure the
test waits for the parallel vacuum messages to appear. Additionally,
remove the checks of the autovacuum count. Verifying the log messages
is sufficient to confirm parallel autovacuum behavior, as logging is
only enabled for the specific table under test.

Per report from buildfarm member flaviventris.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/525d0f48-93f7-493f-a988-f39b460a79bc@gmail.com
This comment was describing the v17 implementation (or io_method=sync).

Backpatch-through: 18
When a nested set operation's output type doesn't match the parent's
expected type, recurse_set_operations builds a projection target list
using generate_setop_tlist with varno 0.  If the required type
coercion involves an ArrayCoerceExpr, estimate_array_length could be
called on such a Var, and would pass it to examine_variable, which
errors in find_base_rel because varno 0 has no valid relation entry.

Fix by skipping the statistics lookup for Vars with varno 0.

Bug introduced by commit 9391f71.  Back-patch to v17, where
estimate_array_length was taught to use statistics.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/adjW8rfPDkplC7lF@pryzbyj2023
Backpatch-through: 17
Three routines in pgstat_database.c incorrectly ignore the database OID
provided by their caller, using MyDatabaseId instead:
- pgstat_report_connect()
- pgstat_report_disconnect()
- pgstat_reset_database_timestamp()

The first two functions, for connection and disconnection, each have a
single caller that already passes MyDatabaseId.  This was harmless,
still incorrect.

The timestamp reset function also has a single caller, but in this case
the issue has a real impact: it fails to reset the timestamp for the
shared-database entry (datid=0) when operating on shared objects.  This
situation can occur, for example, when resetting counters for shared
relations via pg_stat_reset_single_table_counters().

There is currently one test in the tree that checks the reset of a
shared relation, for pg_shdescription, we rely on it to check what is
stored in pg_stat_database.  As stats_reset may be NULL, two resets are
done to provide a baseline for comparison.

Author: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dapeng Wang <wangdp20191008@gmail.com>
Discussion: https://postgr.es/m/ABBD5026-506F-4006-A569-28F72C188693@gmail.com
Backpatch-through: 15
Per the precedent set by 04539e7, adjust article prefixes for "SQL" to
use "an" consistently rather than "a", i.e., "an es-que-ell" rather than
"a sequel".

Also see b51f86e, b1b13d2, d866f03 and 7bdd489.

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvp3osQwQam+wNTp9BdhP+QfWO6aY6ZTixQQMfM-UArKCw@mail.gmail.com
6d0eba6 already did most of the changes, but some new ones snuck in
just prior to that commit, so these got missed.

Having these short-lived StringInfoDatas on the stack rather than having
them get palloc'd by makeStringInfo() is simply for performance as it
saves doing a 2nd palloc.

Since this code is new to v19, it makes sense to improve it now rather
than wait until we branch as having v19 and v20 differ here just makes it
harder to backpatch fixes in this area.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/adt4wpj4FZwR+S7I@ip-10-97-1-34.eu-west-3.compute.internal
... and bms_prev_member().

Both of these functions won't work correctly when given a prevbit of
INT_MAX and would crash when operating on a Bitmapset that happened to
have a member with that value.

Here we fix that by using an unsigned int to calculate which member to
look for next.

I've also adjusted bms_prev_member() to check for < 0 rather than == -1
for starting the loop.  This was done as it's safer and comes at zero
extra cost.

With our current use cases, it's likely impossible to have a Bitmapset
with an INT_MAX member, so no backpatch here.  I only noticed this issue
when working on a bms function to bitshift a Bitmapset.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAApHDvr1B2gbf6JF69QmueM2QNRvbQeeKLxDnF=w9f9--022uA@mail.gmail.com
The data given in input of the function may not be null-terminated,
causing strlcpy() to complain with an invalid read.

Issue spotted using valgrind.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/09df9d75-13e7-45fe-89af-33fe118e797b@gmail.com
Similar to 928394b and 8461424, here we adjust a few new locations
which were not using the most suitable appendStringInfo* or
appendPQExpBuffer* function for the intended purpose.

Author: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvohYOdrvhVxXzCJNX_GYMSWBfjTTtB6hgDauEtZ8Nar2A@mail.gmail.com
The slotsync worker was incorrectly identifying no-op states as successful
updates, triggering a busy loop to sync slots that logged messages every
200ms. This patch corrects the logic to properly classify these states,
enabling the worker to respect normal sleep intervals when no work is
performed.

Reported-by: Fujii Masao <masao.fujii@gmail.com>
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAHGQGwF6zG9Z8ws1yb3hY1VqV-WT7hR0qyXCn2HdbjvZQKufDw@mail.gmail.com
Consistent with existing psql metadata display conventions, update the
description tags for EXCEPT publications to use lowercase for the second
word (e.g., "Except tables" instead of "Except Tables"). This aligns the
output style with other publication describe commands.

Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pt3t_tCYwDStkj5fG4Z=YXrHvPBA7iGdh745QipC5zKeg@mail.gmail.com
This adds the ability for users of logging.c to provide a file handle
for a log file, where log messages are also written in addition to
stderr.

Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEqnbaUthOQARV1dscGvB_EsqC-YfxiM6rWkVDHc%2BG%2Bf4oSUHw%40mail.gmail.com
This reverts commit 6b5b7ea, where a new logging API layer was
introduced locally in pg_createsubscriber.  Instead, use the log file
callback introduced in logging.c.  This new approach is simpler,
eliminates code duplication, and doesn't require any caller changes or
NLS updates (which the previous commit missed).

Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEqnbaUthOQARV1dscGvB_EsqC-YfxiM6rWkVDHc%2BG%2Bf4oSUHw%40mail.gmail.com
No actual changes result.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/2a668979-ed92-49a3-abf9-a3ec2d460ec2%40eisentraut.org
Previously we were relying on a snapshot-based check to detect invalid
execution contexts.  However, when WAIT FOR is wrapped into a stored
procedure or a DO block, it could pass this check, causing an error
elsewhere.

This commit implements an explicit isTopLevel check to reject WAIT FOR
when called from within a function, procedure, or DO block.  The
isTopLevel check catches these cases early with a clear error message,
matching the pattern used by other utility commands like VACUUM and
REINDEX.  The snapshot check is retained for the remaining case:
top-level execution within a transaction block using an isolation level
higher than READ COMMITTED.

Also adds tests for WAIT FOR LSN wrapped in a procedure and DO block,
complementing the existing test that uses a function wrapper.  Relevant
documentation paragraph is also added.

Reported-by: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg%2BQDcN-n3NUqgRtj%3DBQb9fFQmH8-DeEROCr%3DPDbw_BBRKOYA%40mail.gmail.com
Author: Satyanarayana Narlapuram <satyanarlapuram@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
When a tablesample routine says that it is not repeatable across
scans, set_tablesample_rel_pathlist will (usually) materialize it,
confusing pg_plan_advice's plan walker machinery. To fix, update that
machinery to view such Material paths as essentially an extension of
the underlying scan.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/CA+TgmobOOmmXSJz3e+cjTY-bA1+W0dqVDqzxUBEvGtW62whYGg@mail.gmail.com
If a subquery is proven empty, and if that subquery contained a
semijoin, and if making one side or the other of that semijoin
unique and performing an inner join was a possible strategy, then
the previous code would fail with ERROR: no rtoffset for plan %s
when attempting to generate advice. Fix that.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/CA+TgmobOOmmXSJz3e+cjTY-bA1+W0dqVDqzxUBEvGtW62whYGg@mail.gmail.com
tglsfdc and others added 27 commits May 11, 2026 05:13
Some UTF8 characters decompose to more than a dozen codepoints.
It is possible for an input string that fits into well under
1GB to produce more than 4G decomposed codepoints, causing
unicode_normalize()'s decomp_size variable to wrap around to a
small positive value.  This results in a small output buffer
allocation and subsequent buffer overrun.

To fix, test after each addition to see if we've overrun MaxAllocSize,
and break out of the loop early if so.  In frontend code we want to
just return NULL for this failure (treating it like OOM).  In the
backend, we can rely on the following palloc() call to throw error.

I also tightened things up in the calling functions in varlena.c,
using size_t rather than int and allocating the input workspace
with palloc_array().  These changes are probably unnecessary
given the knowledge that the original input and the normalized
output_chars array must fit into 1GB, but it's a lot easier to
believe the code is safe with these changes.

Reported-by: Xint Code
Reported-by: Bruce Dang <bruce@calif.io>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 14
Security: CVE-2026-6473
multirange_recv and BlockRefTableReaderNextRelation were incautious
about multiplying a possibly-large integer by a factor more than 1
and then using it as an allocation size.  This is harmless on 64-bit
systems where we'd compute a size exceeding MaxAllocSize and then
fail, but on 32-bit systems we could overflow size_t leading to an
undersized allocation and buffer overrun.

Fix these places by using palloc_array() instead of a handwritten
multiplication.  (In HEAD, some of them were fixed already, but
none of that work got back-patched at the time.)

In addition, BlockRefTableReaderNextRelation passes the same value
to BlockRefTableRead's "int length" parameter.  If built for
64-bit frontend code, palloc_array() allows a larger array size
than it otherwise would, potentially allowing that parameter to
overflow.  Add an explicit check to forestall that and keep the
behavior the same cross-platform.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
The handling of SSL and GSS negotiation messages in
ProcessStartupPacket() could cause a recursion of the backend,
ultimately crashing the server as the negotiation attempts were not
tracked across multiple calls processing startup packets.

A malicious client could therefore alternate rejected SSL and GSS
requests indefinitely, each adding a stack frame, until the backend
crashed with a stack overflow, taking down a server.

This commit addresses this issue by modifying ProcessStartupPacket() so
as processed negotiation attempts are tracked, preventing infinite
recursive attempts.  A TAP test is added to check this problem, where
multiple SSL and GSS negotiated attempts are stacked.

Reported-by: Calif.io in collaboration with Claude and Anthropic
Research
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Security: CVE-2026-6479
Backpatch-through: 14
contrib/intarray's query_int type uses an int16 field to hold the
offset from a binary operator node to its left operand.  However, it
allows the number of nodes to be as much as will fit in MaxAllocSize,
so there is a risk of overflowing int16 depending on the precise shape
of the tree.  Simple right-associative cases like "a | b | c | ..."
work fine, so we should not solve this by restricting the overall
number of nodes.  Instead add a direct test of whether each individual
offset is too large.

contrib/ltree's ltxtquery type uses essentially the same logic and
has the same 16-bit restriction.

(The core backend's tsquery.c has a variant of this logic too, but
in that case the target field is 32 bits, so it is okay so long
as varlena datums are restricted to 1GB.)

In v16 and up, these types support soft error reporting, so we have
to complicate the recursive findoprnd function's API a bit to allow
the complaint to be reported softly.  v14/v15 don't need that.

Undocumented and overcomplicated code like this makes my head hurt,
so add some comments and simplify while at it.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 14
Security: CVE-2026-6473
This commit applies timingsafe_bcmp() to authentication paths that
handle attributes or data previously compared with memcpy() or strcmp(),
which are sensitive to timing attacks.

The following data is concerned by this change, some being in the
backend and some in the frontend:
- For a SCRAM or MD5 password, the computed key or the MD5 hash compared
with a password during a plain authentication.
- For a SCRAM exchange, the stored key, the client's final nonce and the
server nonce.
- RADIUS (up to v18), the encrypted password.
- For MD5 authentication, the MD5(MD5()) hash.

Reported-by: Joe Conway <mail@joeconway.com>
Security: CVE-2026-6478
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
ALTER SUBSCRIPTION ... REFRESH PUBLICATION interpolates schema and
relation names into SQL without quoting them.  A crafted subscriber
relation name can inject arbitrary SQL on the publisher.  Test such a
name.  Back-patch to v16, where commit
8756930 first appeared.

Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Pavel Kohout <pavel.kohout@aisle.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Backpatch-through: 16
Security: CVE-2026-6638
timeofday() assumed that the output of pg_strftime() could not contain
% signs, other than the one it explicitly asks for with %%.  However,
we don't have that guarantee with respect to the time zone name (%Z).
A crafted time zone setting could abuse the subsequent snprintf()
call, resulting in crashes or disclosure of server memory.

To fix, split the pg_strftime() call into two and then treat the
outputs as literal strings, not a snprintf format string.  The
extra pg_strftime() call doesn't really cost anything, since the
bulk of the conversion work was done by pg_localtime().

Also, adjust buffer widths so that we're not risking string truncation
during the snprintf() step, as that would create a hazard of producing
mis-encoded output.

This also fixes a latent portability issue: the format string expects
an int, but tp.tv_usec is long int on many platforms.

Reported-by: Xint Code
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
Although pg_strftime() has defined error conditions, no callers bother
to check for errors.  This is problematic because the output string is
very likely not null-terminated if an error occurs, so that blindly
using it is unsafe.  Rather than trusting that we can find and fix all
the callers, let's alter the function's API spec slightly: make it
guarantee a null-terminated result so long as maxsize > 0.

Furthermore, if we do get an error, let's make that null-terminated
result be an empty string.  We could instead truncate at the buffer
length, but that risks producing mis-encoded output if the tz_name
string contains multibyte characters.  It doesn't seem reasonable for
src/timezone/ to make use of our encoding-aware truncation logic.
Also, the only really likely source of a failure is a user-supplied
timezone name that is intentionally trying to overrun our buffers.
I don't feel a need to be particularly friendly about that case.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6474
The SQL functions for the restore of attribute and expression statistics
accept "most_common_vals" and "most_common_freqs" as independent arrays.
The planner assumes these have the same number of elements, but it was
possible to insert in the catalogs data that would cause an over-read
when the catalog data is loaded in the planner.

There were two holes in the stats restore logic:
- Both arrays should match in size.
- The input array must be one-dimensional, and it should match with what
is delivered by pg_dump when scanning the pg_stats catalogs.

The multivariate extended statistics MCV path (import_mcv) already
validated these inputs via check_mcvlist_array(), and is not affected.
These problems exist in v18 and newer versions for the restore of
attribute statistics.  These problems affect only HEAD for the restore
of the expression statistics.

Reported-by: Jeroen Gui <jeroen.gui1@proton.me>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Security: CVE-2026-6575
Backpatch-through: 18
drop_existing_subscription() neglected to escape the subscription
name when generating its query string.  To fix, use
PQescapeIdentifier() to construct a properly escaped name, and use
it in the ALTER SUBSCRIPTION and DROP SUBSCRIPTION commands.

Reported-by: Yu Kunpeng <yu443940816@live.com>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Security: CVE-2026-6476
Backpatch-through: 17
This omission allowed roles to create multirange types in any
schema, potentially leading to privilege escalations.  Note that
when a multirange type name is not specified in CREATE TYPE, it is
automatically placed in the range type's schema, which is checked
at the beginning of DefineRange().

Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Security: CVE-2026-6472
Backpatch-through: 14
A few functions in this file were incautious about multiplying a
possibly large integer by a factor more than 1 and then using it as
an allocation size.  This is harmless on 64-bit systems where we'd
compute a size exceeding MaxAllocSize and then fail, but on 32-bit
systems we could overflow size_t, leading to an undersized
allocation and buffer overrun.  To fix, use palloc_array() or
mul_size() instead of handwritten multiplication.

Reported-by: Sven Klemm <sven@tigerdata.com>
Reported-by: Xint Code
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Security: CVE-2026-6473
Backpatch-through: 14
pg_rewind and pg_basebackup could be fed paths from rogue endpoints that
could overwrite the contents of the client when received, achieving path
traversal.

There were two areas in the tree that were sensitive to this problem:
- pg_basebackup, through the astreamer code, where no validation was
performed before building an output path when streaming tar data.  This
is an issue in v15 and newer versions.
- pg_rewind file operations for paths received through libpq, for all
the stable branches supported.

In order to address this problem, this commit adds a helper function in
path.c, that reuses path_is_relative_and_below_cwd() after applying
canonicalize_path().  This can be used to validate the paths received
from a connection point.  A path is considered invalid if any of the two
following conditions is satisfied:
- The path is absolute.
- The path includes a direct parent-directory reference.

Reported-by: XlabAI Team of Tencent Xuanwu Lab
Reported-by: Valery Gubanov <valerygubanov95@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14
Security: CVE-2026-6475
pg_locale_icu.c was full of places where a very long input string
could cause integer overflow while calculating a buffer size,
leading to buffer overruns.

It also was cavalier about using char-type local arrays as buffers
holding arrays of UChar.  The alignment of a char[] variable isn't
guaranteed, so that this risked failure on alignment-picky platforms.
The lack of complaints suggests that such platforms are very rare
nowadays; but it's likely that we are paying a performance price on
rather more platforms.  Declare those arrays as UChar[] instead,
keeping their physical size the same.

pg_locale_libc.c's strncoll_libc_win32_utf8() also had the
disease of assuming it could double or quadruple the input
string length without concern for overflow.

Reported-by: Xint Code
Reported-by: Pavel Kohout <pavel.kohout@aisle.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
If you accumulate many arrays full of NULLs, you could overflow
'nitems', before reaching the MaxAllocSize limit on the allocations.
Add an explicit check that the number of items doesn't grow too large.
With more than MaxArraySize items, getting the final result with
makeArrayResultArr() would fail anyway, so better to error out early.

Reported-by: Xint Code
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Security: CVE-2026-6473
When result_is_int is set to 0, PQfn() cannot validate that the
result fits in result_buf, so it will write data beyond the end of
the buffer when the server returns more data than requested.  Since
this function is insecurable and obsolete, add a warning to the top
of the pertinent documentation advising against its use.

The only in-tree caller of PQfn() is the frontend large object
interface.  To fix that, add a buf_size parameter to
pqFunctionCall3() that is used to protect against overruns, and use
it in a private version of PQfn() that also accepts a buf_size
parameter.

Reported-by: Yu Kunpeng <yu443940816@live.com>
Reported-by: Martin Heistermann <martin.heistermann@unibe.ch>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Security: CVE-2026-6477
Backpatch-through: 14
Maliciously crafted key value updates could achieve SQL injection
within check_foreign_key().  To fix, ensure new key values are
properly quoted and escaped in the internally generated SQL
statements.  While at it, avoid potential buffer overruns by
replacing the stack buffers for internally generated SQL statements
with StringInfo.

Reported-by: Nikolay Samokhvalov <nik@postgres.ai>
Author: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Security: CVE-2026-6637
Backpatch-through: 14
These checks are failing in the buildfarm, reporting stack overflows
rather than the expected errors, though seemingly only on ppc64 and
s390x platforms.  Perhaps there is something off about our tests
for stack depth on those architectures?  But there's no time to
debug that right now, and surely these tests aren't too essential.
Revert for now and plan to revisit after the release dust settles.

Backpatch-through: 14
Security: CVE-2026-6473
REPACK replay builds scan keys for the replica identity index, but it
hard-coded BTEqualStrategyNumber when looking up the equality operator.
That is not correct for non-btree identity indexes, such as the GiST
indexes created for WITHOUT OVERLAPS primary keys.  In addition,
find_target_tuple() accepted the first tuple returned by the identity
index scan, which is unsafe for lossy index scans because the index AM may
return false positives with xs_recheck set.

Fix this by using IndexAmTranslateCompareType() to translate COMPARE_EQ
to the equality strategy number for the index AM, and by continuing the
scan when recheck is required until a candidate tuple matches the locator
tuple on all replica identity key columns.

The recheck uses the same equality operator functions as the identity
index scan keys, preserving ScanKey argument ordering.

Author: Chao Li <lic@highgo.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/7B0EC0EC-5461-41EF-9B31-F9BBE608DEA5@gmail.com
These could overflow on 32-bit systems.

Backpatch-through: 14
Security: CVE-2026-6473
pgss_ProcessUtility() included a reference to a portion of a PlannedStmt
after the point where this data's structure could have been freed,
causing an incorrect memory access.  There was a comment documenting
this requirement, missed in 3357471.

This commit includes a test able to make valgrind complain with a
PlannedStmt freed by an internal ROLLBACK query.  Similarly to what is
mentioned in 495e73c, this can be triggered by using the extended
query protocol, something that can be now tested thanks to the recent
meta-command additions in psql.  This commit mentions potential other
cases, but as far as I can see the extended protocol case with an
internal ROLLBACK is the only problematic pattern reachable in practice.

Issue introduced by 3357471, gone unnoticed due to a lack of test
coverage.  The fix is authored by Chao, my contribution being the new
test.

Author: Chao Li <li.evan.chao@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/2F91906A-F2B5-4A6B-9695-D136957D4545@gmail.com
Add tab completion support in psql for the FOR PORTION OF clause
used in UPDATE and DELETE statements with temporal tables.

For both UPDATE and DELETE, completion now guides users through:
  <table> FOR -> PORTION -> OF -> <column> -> FROM

Author: Kiran Kaki <itskkpg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAD0dvCQLqLzPrQJRjjA2qXDH%3DD%2BXShcxhbSPxNhVruC8HGhkbQ%40mail.gmail.com
Oleg's original comment was intelligible only to him.
Aleksander has reverse-engineered what seems like a plausible
explanation of what the code is trying to do, so replace the
comment with that.  (Also, re-order the final expression to
match the new comment.)

In passing, this makes the comment satisfy our usual formatting
conventions.  pgindent has let it pass as-is so far, but planned
changes would mess it up without some sort of intervention.

Author: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAJ7c6TO0xvunpeOv89i1eKQBhKF9=GEETkTz+yAGs1xGYH25MQ@mail.gmail.com
ReorderBufferProcessTXN() aborts the current transaction after each
decoded commit to release locks and clean up catalog access.  In a
logical walsender that is a top-level abort, so every decoded commit
bumps pg_stat_database.xact_rollback; the counts surface as a spike on
walsender exit (e.g. when a subscription is disabled).

Add AbortCurrentTransactionWithoutXactStats(), a wrapper that suppresses
only the DB-level xact_commit/xact_rollback counter and leaves
per-relation and subxact stat handling intact.  Use it from the
top-level cleanup paths in ReorderBufferProcessTXN() (gated on
!using_subtxn) and SnapBuildClearExportedSnapshot().

A TAP test asserts a publisher xact_rollback delta of 0 across five
decoded transactions (delta is 5 without the fix).

Reported-by: Rafael Thofehrn Castro
Discussion: https://postgr.es/m/CAM527d_EbU5Li4a5FdKQjYsdF-4Lqr_i3jXmZOm7Wbb%3DQ2KzTw%40mail.gmail.com
@NikolayS NikolayS force-pushed the fix/xact-rollback-spike-logical-decoding branch from 72a73f5 to 5255d6b Compare May 27, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.