[10.2.x] Self-Describing Binary Log Format (v3) (#13231)#13339
Open
masaori335 wants to merge 3 commits into
Open
[10.2.x] Self-Describing Binary Log Format (v3) (#13231)#13339masaori335 wants to merge 3 commits into
masaori335 wants to merge 3 commits into
Conversation
* Self-describing binary log format (LogBuffer v3)
Publish each field's type in a per-segment schema so a generic reader can
decode a .blog from the file alone, without an embedded ATS symbol-to-type
table that must track the writer in lockstep. The per-field code is
LogField::Type serialized directly (now an enum class : uint8_t with INVALID=0
reserved and sINT..IP = 1..4 as the frozen wire codes); a static_assert pins
the values. This relies on each field's declared type matching its marshalled
framing, which the parent commit ("Fix mismatched sINT/dINT log field types")
establishes.
Readers (LogBufferIterator, logcat, logstats, the ASCII output paths) accept
both v2 and v3 segments, sizing the header read to the on-disk version, so a v3
build keeps decoding logs written by an older one. Integer values stay in host
byte order, as in v2 (no endianness change). The public TSLogType enum is given
the same values as LogField::Type so TSLogFieldRegister can static_cast between
them; static_asserts in InkAPI.cc (the only TU that sees both) pin the
alignment so a future reorder fails to compile.
The writer version is per-LogObject: logging.yaml "binary_log_version: 2"
pins a binary log to the pre-v3 layout (no schema, shorter header) so a
not-yet-upgraded downstream parser keeps working during a migration; the
default is v3.
Decoding untrusted .blog input is bounded: LogBufferIterator validates
data_offset and each entry against the segment, and the JSON decoder validates
the schema offset alignment and cross-checks field_count against the symbol
list.
* Address Copilot's comment
* Address Copilot's comment
* Cleanup
* Fix logcat for AuTest
* Range-check untrusted .blog offsets before pointer arithmetic
fmt_fieldlist() and fmt_fieldtypes() form a pointer from a header
offset read off disk; an out-of-range value makes the pointer
arithmetic undefined behavior even if never dereferenced. Guard
both against byte_count.
(cherry picked from commit f31bfc5)
ATS 10.2.x downstream log parsers predate the v3 self-describing format, so binary logs must default to the v2 layout. Split the single LOG_SEGMENT_VERSION macro (which served as both "latest" and "default writer") into LOG_SEGMENT_VERSION (3, still the max written/read and the opt-in target) and LOG_SEGMENT_VERSION_DEFAULT (2, the writer default). v3 stays fully readable and selectable per-object via logging.yaml binary_log_version: 3.
5edfc34 to
201934b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport #13231 with setting default binary log format as v2 for compatibility.