Add a minimal msgpack serialiser for ccf#7866
Draft
cjen1-msft wants to merge 2 commits intomicrosoft:mainfrom
Draft
Add a minimal msgpack serialiser for ccf#7866cjen1-msft wants to merge 2 commits intomicrosoft:mainfrom
cjen1-msft wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a minimal, header-only MessagePack encoder under src/msgpack/ (including fluentd in_forward EventTime ext type support) to enable more efficient log export as part of the broader observability/log-export work (#7858).
Changes:
- Added a C++20 header-only msgpack encoder (
write_*primitives, container headers, andFluentdEventTime). - Added unit tests (boundary tables + property tests) and differential tests using
nlohmann::json::from_msgpackas an oracle. - Added a libFuzzer harness and integrated the new unit/fuzz tests into the build.
Custom instructions used:
.github/copilot-instructions.md.github/instructions/reviewing.instructions.md
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/msgpack/encode.h |
New header-only msgpack encoder API, error model, and Fluentd EventTime support |
src/msgpack/endian.h |
Big-endian writing helper for msgpack wire format |
src/msgpack/test/encode_test.cpp |
Property/boundary tests for smallest-format-wins encoding and EventTime layout |
src/msgpack/test/differential_test.cpp |
Differential encode-vs-nlohmann decode tests + fluentd message-mode pinned vector |
src/msgpack/test/gen.h |
Shared generator + script driver used by tests and fuzz harness |
src/msgpack/test/fuzz_script_test.cpp |
Deterministic/canned script tests pinning specific composite byte layouts |
src/msgpack/test/format_introspect.h |
Test helper to classify msgpack first-byte format families |
src/msgpack/test/msgpack_fuzz.cpp |
libFuzzer harness for encoder round-trip validation via nlohmann oracle |
CMakeLists.txt |
Registers msgpack_test unit test and msgpack_fuzz_test fuzz target |
Comment on lines
+461
to
+479
| case 8: // Array | ||
| { | ||
| // Once the stack is at the depth cap, force a nil to bound | ||
| // adversarial inputs. | ||
| const uint32_t n = r.u8() % 5; | ||
| write_array_header(buf, n); | ||
| stack.push_back(std::make_shared<OpenFrame>( | ||
| OpenFrame{json::array(), n, std::nullopt})); | ||
| // The new frame will be popped (when its `remaining` hits 0) | ||
| // and then spliced into `frame` by the pop branch above. | ||
| break; | ||
| } | ||
| case 9: // Map | ||
| { | ||
| const uint32_t n = r.u8() % 5; | ||
| write_map_header(buf, n); | ||
| stack.push_back(std::make_shared<OpenFrame>( | ||
| OpenFrame{json::object(), n, std::nullopt})); | ||
| break; |
| // [256, 65535] -> bin 16 (3-byte header) | ||
| // [65536, 2^32-1] -> bin 32 (5-byte header) | ||
| // Throws MsgpackEncodeError(BIN_TOO_LARGE) for sizes >= 2^32. | ||
| inline void write_bin( |
Member
There was a problem hiding this comment.
I'm fairly sure that's supported by clang 18, even if it's not technically in C++20, and I believe we depend on it in quite a few places already, so I would not make this change.
Comment on lines
+432
to
+436
| const size_t n = r.u8(); | ||
| std::vector<uint8_t> bytes; | ||
| r.take(bytes, n); | ||
| write_bin(buf, bytes); | ||
| splice_into(frame, json::binary(bytes)); |
Comment on lines
+465
to
+468
| } | ||
| std::vector<uint8_t> buf; | ||
| write_bin(buf, data); | ||
| CHECK(buf.size() == r.header_size + r.n); |
Comment on lines
+584
to
+587
| }; | ||
| std::uniform_int_distribution<int> coin(0, 99); | ||
| std::uniform_int_distribution<size_t> bp(0, std::size(s_boundaries) - 1); | ||
| std::uniform_int_distribution<int64_t> any_s( |
| STRING_TOO_LARGE = 1, // > 2^32-1 bytes | ||
| BIN_TOO_LARGE = 2, // > 2^32-1 bytes | ||
| MAP_TOO_LARGE = 3, // > 65535 elements (we cap at map16) | ||
| INVALID_EVENT_TIME = 4, // nanoseconds >= 1_000_000_000 |
Comment on lines
+377
to
+379
| // The payload is copied verbatim — msgpack str is byte-array, | ||
| // not text. We do not validate UTF-8 (the spec doesn't require it | ||
| // and the wire format is opaque to byte content). |
achamayou
reviewed
May 6, 2026
| } | ||
| else | ||
| { | ||
| // std::byteswap is C++23-only; this hand-rolled swap keeps the |
Member
There was a problem hiding this comment.
are we able to switch to C++23 on Azure Linux 3 already?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For #7858 we want to export more logs more efficiently.
I proposed that we should support msgpack export, and this is a minimal msgpack serialiser.
Here is a benchmark of the serialiser against nlohmann (json dump) and rapidjson (as an 'optimal baseline')