Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions audit/findings/ri-fastpath-hackers-draft.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
Subject: PG19 FK fast path: OOB write and missed FK checks during batched
flush

Hackers,

I identified these issues during recent security research. While they have
clear security implications (OOB write, integrity bypass), I am reporting
them directly to -hackers rather than security@ because they are isolated to
19beta1, absent in PG18 and earlier; I don't have patches, only
reproducibility.

The new FK existence-check fast path in src/backend/utils/adt/ri_triggers.c
(ri_FastPath*) runs user-defined code in the middle of a deferred batch
flush, which yields three defects reachable by an unprivileged table owner.
Present in master and verified in REL_19_BETA1; absent from REL_18_4 and
REL_17_6 (PG19-only, so pre-GA, not a security@ matter).

Mechanism. For an INSERT/UPDATE on the referencing side the fast path
buffers rows in a transaction-lived cache (ri_fastpath_cache, keyed by
pg_constraint OID) and probes the PK index in groups, flushing when a
per-constraint buffer reaches RI_FASTPATH_BATCH_SIZE (64) or when the
trigger-firing pass ends (ri_FastPathEndBatch, an AfterTriggerBatchCallback).
For a cross-type FK the flush calls the column's cast function
(ri_FastPathFlushArray, the FunctionCall3 at line 3069) and the equality
operator -- arbitrary user code, mid-flush. Line numbers below are from a
REL_19_BETA1 build (commit 4b0bf07).

Unprivileged vehicle (defects 1 and 3). No superuser, no contrib: a role
creates a type it owns and an IMPLICIT cast from it to the PK type with a
PL/pgSQL function, which ri_HashCompareOp wires into the fast path's cast
slot. Below uses a composite type. Default btree opclass, ordinary
single-column FK, no GUC (fast path is unconditional for non-partitioned,
non-temporal FKs, per ri_fastpath_is_applicable).


[1] ri_FastPathBatchAdd (line 2859): out-of-bounds write on re-entry

The write precedes the bound check, and batch_count is reset to 0 only at
end of flush (ri_FastPathBatchFlush, line 2971), so it is 64 throughout a
full-batch flush:

fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
fpentry->batch_count++;
if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);

There is no re-entrancy guard and ri_FastPathGetEntry returns the same
entry, so user code that does DML on the same table during a full-batch
flush re-enters with batch_count == 64 and writes batch[64], one past the
array, overwriting the adjacent batch_count field (struct layout, lines
250-251). A single re-entrant row only stomps batch_count, which is then
reset to 0 before reuse; the crash manifests once the re-entrant insert is
itself large enough to fill and flush a batch, so the stomped batch_count is
used as an array index (batch[garbage]) and as nvals in
memset(matched, 0, nvals * sizeof(bool)) (line 3054).

Repro (non-superuser; reliable SIGSEGV on --enable-cassert -O0; under -O2
the out-of-bounds write is of undefined effect, so a clean run there does
not disprove it):

create table parent(id int primary key);
insert into parent select g from generate_series(1,2000) g;
create type vch as (v int);
create function vcast(vch) returns int language plpgsql as $$
begin
if $1.v = 64 then
insert into child select row(g)::vch from generate_series(1001,1064) g;
end if;
return $1.v;
end$$;
create cast (vch as int) with function vcast(vch) as implicit;
create table child(a vch);
alter table child add constraint child_fkey
foreign key (a) references parent(id);
insert into child select row(g)::vch from generate_series(1,64) g; -- crash
-- gdb: crash at ri_FastPathBatchAdd line 2866 with batch_count holding a
-- stomped HeapTuple pointer's low bits, i.e. batch[64] overwrote
-- batch_count; backend SIGSEGVs and the cluster restarts.


[2] ri_FastPathSubXactCallback (line 4208): batch dropped on subxact abort

On SUBXACT_EVENT_ABORT_SUB the callback discards the whole cache:

ri_fastpath_cache = NULL;
ri_fastpath_callback_registered = false;

But batch[] holds outstanding rows of the enclosing transaction, not the
aborting subxact. An internal subxact abort during after-trigger firing
(PL/pgSQL BEGIN ... EXCEPTION) drops the buffered rows unflushed; their FK
checks never run and orphans commit behind a constraint that still reports
itself valid. No cast needed:

create table pk(id int primary key);
create table fk(a int, tag text);
insert into pk select g from generate_series(1,10) g;
alter table fk add constraint fk_a_fkey foreign key (a) references pk(id);
create function abort_subxact() returns trigger language plpgsql as $$
begin
if NEW.tag = 'boom' then
begin perform 1/0; exception when others then null; end;
end if;
return NEW;
end$$;
create trigger fk_after after insert on fk
for each row execute function abort_subxact();
insert into fk values (999,'bad'),(0,'boom'),(1,'ok'),(2,'ok'),(3,'ok');
-- INSERT 0 5, no error
select f.a from fk f left join pk p on f.a=p.id where p.id is null;
-- a
-- -----
-- 999
-- 0 (orphans)

-- the constraint still reports itself valid, and re-validation passes
-- while the orphans remain:
select convalidated from pg_constraint where conname = 'fk_a_fkey';
-- convalidated
-- --------------
-- t
alter table fk validate constraint fk_a_fkey;
-- ALTER TABLE (succeeds; does not re-scan committed rows)
select f.a from fk f left join pk p on f.a=p.id where p.id is null;
-- 999, 0 (orphans still present)

Controls (no EXCEPTION; between-statement SAVEPOINT; DEFERRABLE INITIALLY
DEFERRED) all behave correctly (FK violation raised, no orphans). The whole
statement's buffered batch is discarded, not just the aborting row's check.
The abort path also emits "WARNING: resource was not closed" (relation /
index / TupleDesc), a resource leak consistent with the missing flush.


[3] ri_FastPathEndBatch (line 4133): cross-table re-entry drops a check

EndBatch flushes by iterating the cache with hash_seq_search (line 4143).
If flush-time user code INSERTs into a different fast-path FK table,
ri_FastPathGetEntry adds a new cache entry mid-scan; it can land in a bucket
hash_seq_search already passed and is never reached. ri_FastPathTeardown
(line 4165) then hash_destroys the cache (line 4188) without flushing entries
that still have batch_count > 0, so that buffered check is discarded. This
survives a
per-entry guard for [1] (different entry, not a re-entry of the busy one):

create table parent(id int primary key);
insert into parent select g from generate_series(1,64) g;
create table child2(a int);
alter table child2 add constraint child2_fkey
foreign key (a) references parent(id);
create type vch as (v int);
create function vcast(vch) returns int language plpgsql as $$
begin
if $1.v = 1 then
insert into child2 values (999999); -- orphan into a *different* FK
end if;
return $1.v;
end$$;
create cast (vch as int) with function vcast(vch) as implicit;
create table child(a vch);
alter table child add constraint child_fkey
foreign key (a) references parent(id);
insert into child values (row(1)::vch); -- flushed at ri_FastPathEndBatch
select a from child2 where a not in (select id from parent); -- => 999999
-- control: INSERT INTO child2 VALUES (999999); -- correctly raises FK error


Root cause and fix directions.

All three stem from invoking user cast/operator code inside a deferred
batch flush: while a per-entry batch is half-updated [1], while a cache-wide
hash_seq_search is in progress and teardown drops non-empty entries [3], and
against a subxact-abort invalidation that cannot tell parent-xact rows from
aborted-subxact rows [2].

- [1] Bound-check before the write in ri_FastPathBatchAdd, and add a
"flushing" flag to RI_FastPathEntry, rejecting re-entrant modification of
a busy entry (a nested per-row probe is unsafe: the flush may hold PK-index
buffer locks).
- [3] Loop-flush in ri_FastPathEndBatch until no entry has batch_count > 0,
and/or flush non-empty entries in ri_FastPathTeardown before hash_destroy.
- [2] Do not discard outstanding parent-xact rows on SUBXACT_EVENT_ABORT_SUB;
track the buffering subxact, or flush immediate-constraint batches at
subxact boundaries.
- Unifying: a global "in fast-path flush" guard routing any re-entrant FK
check to the immediate per-row path, and reconsidering running user code
mid-flush at all.

Regards,
184 changes: 184 additions & 0 deletions audit/findings/ri-fastpath-public-issue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# PG19 FK fast-path (`ri_FastPath*`): three reachable defects from running user-defined cast/operator code inside the deferred batch flush

*Public bug report drafted for the PostgreSQL 19 beta cycle. The `ri_FastPath*`
foreign-key existence-check fast path is new in PG 19 (master) and is now in
`REL_19_BETA1`; it is absent from all released branches (18, 17). None of these
is a `security@`/CVE matter — the code has not shipped in a GA release — but all
three are reachable by an **unprivileged table owner** and should be fixed
before PG 19 GA.*

## Affected code / provenance

- File: `src/backend/utils/adt/ri_triggers.c`, the `ri_FastPath*` machinery.
- Present in upstream `master` (verified against `postgres/postgres@193a4ded`)
and in `REL_19_BETA1` (identical for this file); **absent** from `REL_18_4`
and `REL_17_6` (0 occurrences of `ri_FastPath`). So this is a **PG 19-only**
feature.
- Line numbers below are from `REL_19_BETA1`.

## Background

PG 19 adds a fast path for the FK *existence* check (INSERT/UPDATE on the
referencing side) that probes the PK unique index directly instead of going
through SPI. For throughput it **buffers** referencing-side rows in a
transaction-lived cache (`ri_fastpath_cache`, keyed by `pg_constraint` OID) and
only runs the actual checks when the per-constraint buffer fills (64 rows) or
when the after-trigger firing pass ends (`ri_FastPathEndBatch`, an
`AfterTriggerBatchCallback`).

The flush runs **arbitrary user-defined code**: for a cross-type FK,
`ri_FastPathFlushArray()` / `ri_FastPathFlushLoop()` call the column's
implicit-cast function (`FunctionCall3(cast_func_finfo, …)`) and the FK equality
operator (`FunctionCall2Coll(eq_opr_finfo, …)`). All three defects below stem
from that: user code runs in the middle of a half-updated batch / an in-progress
cache scan.

**Unprivileged reachability (common to all three).** The cast route needs no
superuser and no contrib module: a user may `CREATE TYPE` on a type they own and
`CREATE CAST (ownedtype AS sometype) WITH FUNCTION f(...) AS IMPLICIT` (allowed
for the source-type owner; `f` may be PL/pgSQL). An ordinary single-column FK
whose column implicitly casts to the PK type then wires that IMPLICIT cast into
`cast_func_finfo` (`ri_HashCompareOp()`), and the fast path invokes it during the
flush. The PK uses the default btree opclass; the FK is an ordinary FK; the fast
path is always on for non-partitioned, non-temporal FKs (no GUC).

---

## Defect 1 — re-entrancy → out-of-bounds heap write (memory safety / crash)

`ri_FastPathBatchAdd()` (line 2859) appends to the fixed 64-element array and
bounds-checks only **after** the write:

```c
fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot); /* 2866: write */
fpentry->batch_count++;
if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE) /* then check */
ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
```

`batch_count` is reset to 0 only at the *end* of the flush, so during a
full-batch flush it stays at 64. The flush invokes user code (cast/operator).
`RI_FastPathEntry` has **no re-entrancy guard**, and `ri_FastPathGetEntry()`
returns the *same* cached entry on re-entry. If that user code performs DML on
the same referencing table, the nested `ri_FastPathBatchAdd()` runs with
`batch_count == 64`:

1. `batch[64] = ptr` aliases the `batch_count` field that follows the array →
`batch_count` becomes the low 32 bits of a heap pointer (a large/negative
`int`);
2. the next add indexes `batch[<garbage>]` → **wild OOB heap write**; or the
recursive flush runs `memset(matched, 0, nvals * sizeof(bool))` (line 3054)
with a garbage `nvals` → oversized `memset`.

**Reproduced** as a non-superuser (implicit-cast vehicle) on a `--enable-cassert
-O0` build: SIGSEGV, whole-cluster restart; gdb backtrace
`#1 ri_FastPathFlushArray (...) at ri_triggers.c:3054`. Instrumented trace shows
`batch_count` going `62 → 63 → 64 → -55227815` then a wild write.

Per PostgreSQL's security model this is a **bug, not a vulnerability** (an
authenticated crash/DoS; the written value/index derive from a heap address, so
a *controlled* write is speculative and not demonstrated). It is still a
must-fix memory-safety defect.

---

## Defect 2 — buffered FK checks dropped on subtransaction abort (integrity bypass)

`ri_FastPathSubXactCallback()` (line 4208), on `SUBXACT_EVENT_ABORT_SUB`, simply
NULLs the static cache pointer:

```c
ri_fastpath_cache = NULL;
ri_fastpath_callback_registered = false;
```

The assumption that everything in the cache belongs to the aborting subxact is
wrong: `batch[]` holds outstanding FK rows of the **enclosing** transaction. When
an internal subtransaction aborts during after-trigger firing — canonically a
PL/pgSQL `BEGIN … EXCEPTION … END` block in another AFTER ROW trigger firing in
the same batch — the whole cache is discarded **unflushed**. Those FK existence
checks never run and **orphan rows commit** behind a still-`convalidated`
constraint (`ALTER TABLE … VALIDATE CONSTRAINT` is a no-op, so it stays hidden).

**Reproduced** (single-column array path and multi-column loop path), unprivileged:

```sql
create table pk(id int primary key);
create table fk(a int, tag text);
insert into pk select g from generate_series(1,10) g;
alter table fk add constraint fk_a_fkey foreign key (a) references pk(id);

create function abort_subxact() returns trigger language plpgsql as $$
begin
if NEW.tag = 'boom' then
begin perform 1/0; exception when others then null; end; -- internal subxact abort
end if;
return NEW;
end;$$;
create trigger fk_after after insert on fk for each row execute function abort_subxact();

insert into fk values (999,'bad'),(0,'boom'),(1,'ok'),(2,'ok'),(3,'ok'); -- INSERT 0 5, no error
select f.a from fk f left join pk p on f.a=p.id where p.id is null; -- => 0, 999 (orphans)
```

Controls (no `EXCEPTION`/subxact; between-statement `SAVEPOINT`; `DEFERRABLE
INITIALLY DEFERRED`) all behave correctly. Security-relevant where apps rely on
FK for authz joins / multi-tenant isolation.

---

## Defect 3 — EndBatch cross-table re-entrancy silently drops a check (integrity bypass)

`ri_FastPathEndBatch()` (line 4133) flushes by iterating the cache with
`hash_seq_search` (line 4143). The flush runs user cast/operator code. If that
code `INSERT`s into a **different** fast-path FK table, `ri_FastPathGetEntry()`
(line 4234) adds a **new** cache entry (and registers no new batch callback). That
entry can land in a hash bucket the running `hash_seq_search` has already passed →
it is never flushed. `ri_FastPathEndBatch` then calls `ri_FastPathTeardown()`
(line 4165), which `hash_destroy`s the cache (line 4188) **without flushing
entries that still have `batch_count > 0`**. The buffered FK check is discarded
and an orphan commits.

This **survives the per-entry guard proposed for Defect 1** (it adds a *different*
entry, not a re-entry of the busy one).

**Reproduced**, unprivileged: `child`(type `t`, IMPLICIT cast `evil()`→int) and
`child2`(int), both → `parent(id)`; `evil()`, run during `child`'s EndBatch
flush, inserts an orphan into `child2`. Control `INSERT INTO child2 VALUES (888888)`
is rejected (FK violation); `INSERT INTO child SELECT 'z'::t …` commits and leaves
the orphan in `child2` with `convalidated = t`.

---

## Common root cause

All three are instances of one design issue: **the fast path runs user-defined
cast/operator code inside a deferred batch flush** — once while a per-entry batch
is half-updated (Defect 1), once while a cache-wide `hash_seq_search` is in
progress and the teardown drops non-empty entries (Defect 3), and against a
subxact-abort cache-invalidation that cannot distinguish parent rows from
aborted-subxact rows (Defect 2).

## Suggested fix directions

- **Defect 1:** add a `flushing` flag to `RI_FastPathEntry`, set across the flush
body, and reject re-entrant batch modification of a busy entry (a nested
per-row probe is unsafe — the flush may hold PK-index buffer locks, tripping
`Assert(lockmode == BUFFER_LOCK_UNLOCK)` in `bufmgr.c`). Also reorder
`ri_FastPathBatchAdd()` to bounds-check before the write. (Verified locally:
closes the crash; normal batched FK + genuine-violation detection intact.)
- **Defect 3:** make `ri_FastPathEndBatch` loop-flush until no entry has
`batch_count > 0`, and/or have `ri_FastPathTeardown` flush any non-empty entry
before `hash_destroy`.
- **Defect 2:** do not discard outstanding parent-transaction batch rows on
`SUBXACT_EVENT_ABORT_SUB`; track which subxact buffered each row, or flush
immediate-constraint batches at subxact boundaries.
- **Better, unifying:** promote the per-entry `flushing` guard to a global
"in fast-path flush" guard that routes any re-entrant FK check to the immediate
per-row path — and reconsider invoking arbitrary user code mid-flush at all.

## Disclosure note

PG 19 is unreleased (now `REL_19_BETA1`), so these are pre-GA bugs for the patch
author / pgsql-hackers, not `security@postgresql.org` CVEs. Reproduced live on a
build from the affected code; minimal fix for Defect 1 verified.