Skip to content

bug: duplicate SYS.CCOL$ error caused by ADAPTIVE_SCHEMA (flags: 4) + heap-use-after-free on shutdown #5

@rophy

Description

@rophy

Summary

Two bugs found during checkpoint/restart fault tolerance testing:

  1. ERROR 50022 duplicate SYS.CCOL$ — Upstream OLR bug triggered by flags: 4 (ADAPTIVE_SCHEMA) combined with any table filter. Not RAC-specific.
  2. Heap-use-after-free on shutdownCtx::freeMemoryChunk() accesses freed WriterStream via dangling ctx->parserThread. Fixed in commit 6724219d.

Bug 1: duplicate SYS dictionary entries with ADAPTIVE_SCHEMA

Root Cause

When flags: 4 (ADAPTIVE_SCHEMA) is set, Checkpoint.cpp:188 adds a wildcard ".*"/".*" schema element. Combined with any explicit table filter (e.g. "OLR_TEST"/".*"), createSchema() calls readSystemDictionaries() for each schema element. The wildcard loads ALL users' SYS dictionary entries first, then the explicit filter tries to load the same user's entries again.

The SysUser duplicate check in ReplicatorOnline.cpp:1341-1356 is supposed to skip already-loaded users, but doesn't guard all code paths — readSystemDictionariesDetails() gets called twice for overlapping users. addWithKeys()add() in TablePack.h:146 unconditionally throws on duplicate ROWID.

Key distinction

  • add() (used by createSchema() DB dictionary load and checkpoint deserialization): always throws on duplicate
  • forInsert() (used during redo replay in SystemTransaction.cpp): tolerates duplicates when ADAPTIVE_SCHEMA is set

So ADAPTIVE_SCHEMA only protects redo replay, not the initial dictionary load.

Reproduction

Fails on first fresh start — not related to checkpoint/restart at all:

# RAC (Oracle 23ai)
ERROR 50022 duplicate SYS.CCOL$ (ROWID: AAAAAdAAAAAAAEpAAg, CON#: 147, INTCOL#: 1, OBJ#: 32, SPARE1: [0,0]) for insert

# Single-instance (Oracle XE 21c) — same bug
ERROR 50022 duplicate SYS.CCOL$ (ROWID: AAAAAdAABAAAAEpAAO, CON#: 144, INTCOL#: 1, OBJ#: 31, SPARE1: [0,0]) for insert

Minimal config to reproduce on any Oracle instance:

{
  "flags": 4,
  "filter": {
    "table": [{"owner": "ANY_USER", "table": ".*"}]
  }
}

Without flags: 4, no error occurs.

Initial misidentification

Originally reported as RAC-specific checkpoint resume issue because the RAC debezium config was the only one with flags: 4. The flag was added as a debugging attempt during checkpoint/restart testing, but it turned out to be the cause of the error, not the fix.

Bug 2: heap-use-after-free on shutdown — FIXED

Fixed in commit 6724219d.

  • WriterStream constructor sets ctx->parserThread = this
  • ~OpenLogReplicator() deletes writers before builders/parsers
  • Builder::~Builder() and Parser::~Parser() call freeMemoryChunk(ctx->parserThread, ...) on already-freed memory
  • Fix: null out ctx->parserThread in WriterStream::~WriterStream(), guard freeMemoryChunk() against null thread

This bug exists in upstream OLR but only triggers when a RuntimeException causes shutdown while using the network writer.

Resolution

  • Bug 1: Remove flags: 4 from config. The ADAPTIVE_SCHEMA flag has this upstream bug and is not needed for our use case.
  • Bug 2: Fixed in commit 6724219d.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions