Skip to content

[10.2.x] Self-Describing Binary Log Format (v3) (#13231)#13339

Open
masaori335 wants to merge 3 commits into
apache:10.2.xfrom
masaori335:asf-10.2.x-blog-v3
Open

[10.2.x] Self-Describing Binary Log Format (v3) (#13231)#13339
masaori335 wants to merge 3 commits into
apache:10.2.xfrom
masaori335:asf-10.2.x-blog-v3

Conversation

@masaori335

Copy link
Copy Markdown
Contributor

Backport #13231 with setting default binary log format as v2 for compatibility.

@masaori335 masaori335 requested a review from cmcfarlen June 26, 2026 03:34
@masaori335 masaori335 self-assigned this Jun 26, 2026
@masaori335 masaori335 added the Backport Marked for backport for an LTS patch release label Jun 26, 2026
@masaori335 masaori335 changed the title [10.2.x] Self-Describing Binary Log Format (v3) (#13231 [10.2.x] Self-Describing Binary Log Format (v3) (#13231) Jun 26, 2026
@masaori335 masaori335 added this to the 10.2.0 milestone Jun 26, 2026
* Self-describing binary log format (LogBuffer v3)

Publish each field's type in a per-segment schema so a generic reader can
decode a .blog from the file alone, without an embedded ATS symbol-to-type
table that must track the writer in lockstep. The per-field code is
LogField::Type serialized directly (now an enum class : uint8_t with INVALID=0
reserved and sINT..IP = 1..4 as the frozen wire codes); a static_assert pins
the values. This relies on each field's declared type matching its marshalled
framing, which the parent commit ("Fix mismatched sINT/dINT log field types")
establishes.

Readers (LogBufferIterator, logcat, logstats, the ASCII output paths) accept
both v2 and v3 segments, sizing the header read to the on-disk version, so a v3
build keeps decoding logs written by an older one. Integer values stay in host
byte order, as in v2 (no endianness change). The public TSLogType enum is given
the same values as LogField::Type so TSLogFieldRegister can static_cast between
them; static_asserts in InkAPI.cc (the only TU that sees both) pin the
alignment so a future reorder fails to compile.

The writer version is per-LogObject: logging.yaml "binary_log_version: 2"
pins a binary log to the pre-v3 layout (no schema, shorter header) so a
not-yet-upgraded downstream parser keeps working during a migration; the
default is v3.

Decoding untrusted .blog input is bounded: LogBufferIterator validates
data_offset and each entry against the segment, and the JSON decoder validates
the schema offset alignment and cross-checks field_count against the symbol
list.

* Address Copilot's comment

* Address Copilot's comment

* Cleanup

* Fix logcat for AuTest

* Range-check untrusted .blog offsets before pointer arithmetic

fmt_fieldlist() and fmt_fieldtypes() form a pointer from a header
offset read off disk; an out-of-range value makes the pointer
arithmetic undefined behavior even if never dereferenced. Guard
both against byte_count.

(cherry picked from commit f31bfc5)
ATS 10.2.x downstream log parsers predate the v3 self-describing
format, so binary logs must default to the v2 layout. Split the single
LOG_SEGMENT_VERSION macro (which served as both "latest" and "default
writer") into LOG_SEGMENT_VERSION (3, still the max written/read and
the opt-in target) and LOG_SEGMENT_VERSION_DEFAULT (2, the writer
default). v3 stays fully readable and selectable per-object via
logging.yaml binary_log_version: 3.
@masaori335 masaori335 force-pushed the asf-10.2.x-blog-v3 branch from 5edfc34 to 201934b Compare June 26, 2026 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backport Marked for backport for an LTS patch release Logging

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant