-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Two bugs found during checkpoint/restart fault tolerance testing:
ERROR 50022 duplicate SYS.CCOL$— Upstream OLR bug triggered byflags: 4(ADAPTIVE_SCHEMA) combined with any table filter. Not RAC-specific.- Heap-use-after-free on shutdown —
Ctx::freeMemoryChunk()accesses freedWriterStreamvia danglingctx->parserThread. Fixed in commit 6724219d.
Bug 1: duplicate SYS dictionary entries with ADAPTIVE_SCHEMA
Root Cause
When flags: 4 (ADAPTIVE_SCHEMA) is set, Checkpoint.cpp:188 adds a wildcard ".*"/".*" schema element. Combined with any explicit table filter (e.g. "OLR_TEST"/".*"), createSchema() calls readSystemDictionaries() for each schema element. The wildcard loads ALL users' SYS dictionary entries first, then the explicit filter tries to load the same user's entries again.
The SysUser duplicate check in ReplicatorOnline.cpp:1341-1356 is supposed to skip already-loaded users, but doesn't guard all code paths — readSystemDictionariesDetails() gets called twice for overlapping users. addWithKeys() → add() in TablePack.h:146 unconditionally throws on duplicate ROWID.
Key distinction
add()(used bycreateSchema()DB dictionary load and checkpoint deserialization): always throws on duplicateforInsert()(used during redo replay inSystemTransaction.cpp): tolerates duplicates when ADAPTIVE_SCHEMA is set
So ADAPTIVE_SCHEMA only protects redo replay, not the initial dictionary load.
Reproduction
Fails on first fresh start — not related to checkpoint/restart at all:
# RAC (Oracle 23ai)
ERROR 50022 duplicate SYS.CCOL$ (ROWID: AAAAAdAAAAAAAEpAAg, CON#: 147, INTCOL#: 1, OBJ#: 32, SPARE1: [0,0]) for insert
# Single-instance (Oracle XE 21c) — same bug
ERROR 50022 duplicate SYS.CCOL$ (ROWID: AAAAAdAABAAAAEpAAO, CON#: 144, INTCOL#: 1, OBJ#: 31, SPARE1: [0,0]) for insert
Minimal config to reproduce on any Oracle instance:
{
"flags": 4,
"filter": {
"table": [{"owner": "ANY_USER", "table": ".*"}]
}
}Without flags: 4, no error occurs.
Initial misidentification
Originally reported as RAC-specific checkpoint resume issue because the RAC debezium config was the only one with flags: 4. The flag was added as a debugging attempt during checkpoint/restart testing, but it turned out to be the cause of the error, not the fix.
Bug 2: heap-use-after-free on shutdown — FIXED
Fixed in commit 6724219d.
WriterStreamconstructor setsctx->parserThread = this~OpenLogReplicator()deletes writers before builders/parsersBuilder::~Builder()andParser::~Parser()callfreeMemoryChunk(ctx->parserThread, ...)on already-freed memory- Fix: null out
ctx->parserThreadinWriterStream::~WriterStream(), guardfreeMemoryChunk()against null thread
This bug exists in upstream OLR but only triggers when a RuntimeException causes shutdown while using the network writer.
Resolution
- Bug 1: Remove
flags: 4from config. The ADAPTIVE_SCHEMA flag has this upstream bug and is not needed for our use case. - Bug 2: Fixed in commit 6724219d.