Skip to content

Add a minimal msgpack serialiser for ccf#7866

Draft
cjen1-msft wants to merge 2 commits intomicrosoft:mainfrom
cjen1-msft:msgpack
Draft

Add a minimal msgpack serialiser for ccf#7866
cjen1-msft wants to merge 2 commits intomicrosoft:mainfrom
cjen1-msft:msgpack

Conversation

@cjen1-msft
Copy link
Copy Markdown
Contributor

For #7858 we want to export more logs more efficiently.
I proposed that we should support msgpack export, and this is a minimal msgpack serialiser.

Here is a benchmark of the serialiser against nlohmann (json dump) and rapidjson (as an 'optimal baseline')

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 json_dump *              |     100 |     1.631 |   16306 |      - |    61323.7
 rapidjson_write          |     100 |     0.142 |    1423 |  0.087 |   702622.2
 msgpack_encode           |     100 |     0.189 |    1894 |  0.116 |   527779.7
 json_dump *              |    1000 |    10.267 |   10266 |      - |    97401.4
 rapidjson_write          |    1000 |     1.886 |    1886 |  0.184 |   530199.1
 msgpack_encode           |    1000 |     2.280 |    2280 |  0.222 |   438575.1
 json_dump *              |   10000 |    97.549 |    9754 |      - |   102512.6
 rapidjson_write          |   10000 |    12.481 |    1248 |  0.128 |   801223.8
 msgpack_encode           |   10000 |    16.415 |    1641 |  0.168 |   609205.9

@cjen1-msft cjen1-msft requested a review from a team as a code owner May 6, 2026 08:54
Copilot AI review requested due to automatic review settings May 6, 2026 08:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a minimal, header-only MessagePack encoder under src/msgpack/ (including fluentd in_forward EventTime ext type support) to enable more efficient log export as part of the broader observability/log-export work (#7858).

Changes:

  • Added a C++20 header-only msgpack encoder (write_* primitives, container headers, and FluentdEventTime).
  • Added unit tests (boundary tables + property tests) and differential tests using nlohmann::json::from_msgpack as an oracle.
  • Added a libFuzzer harness and integrated the new unit/fuzz tests into the build.

Custom instructions used:

  • .github/copilot-instructions.md
  • .github/instructions/reviewing.instructions.md

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/msgpack/encode.h New header-only msgpack encoder API, error model, and Fluentd EventTime support
src/msgpack/endian.h Big-endian writing helper for msgpack wire format
src/msgpack/test/encode_test.cpp Property/boundary tests for smallest-format-wins encoding and EventTime layout
src/msgpack/test/differential_test.cpp Differential encode-vs-nlohmann decode tests + fluentd message-mode pinned vector
src/msgpack/test/gen.h Shared generator + script driver used by tests and fuzz harness
src/msgpack/test/fuzz_script_test.cpp Deterministic/canned script tests pinning specific composite byte layouts
src/msgpack/test/format_introspect.h Test helper to classify msgpack first-byte format families
src/msgpack/test/msgpack_fuzz.cpp libFuzzer harness for encoder round-trip validation via nlohmann oracle
CMakeLists.txt Registers msgpack_test unit test and msgpack_fuzz_test fuzz target

Comment thread src/msgpack/test/gen.h
Comment on lines +461 to +479
case 8: // Array
{
// Once the stack is at the depth cap, force a nil to bound
// adversarial inputs.
const uint32_t n = r.u8() % 5;
write_array_header(buf, n);
stack.push_back(std::make_shared<OpenFrame>(
OpenFrame{json::array(), n, std::nullopt}));
// The new frame will be popped (when its `remaining` hits 0)
// and then spliced into `frame` by the pop branch above.
break;
}
case 9: // Map
{
const uint32_t n = r.u8() % 5;
write_map_header(buf, n);
stack.push_back(std::make_shared<OpenFrame>(
OpenFrame{json::object(), n, std::nullopt}));
break;
Comment thread src/msgpack/encode.h
// [256, 65535] -> bin 16 (3-byte header)
// [65536, 2^32-1] -> bin 32 (5-byte header)
// Throws MsgpackEncodeError(BIN_TOO_LARGE) for sizes >= 2^32.
inline void write_bin(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly sure that's supported by clang 18, even if it's not technically in C++20, and I believe we depend on it in quite a few places already, so I would not make this change.

Comment thread src/msgpack/test/gen.h
Comment on lines +432 to +436
const size_t n = r.u8();
std::vector<uint8_t> bytes;
r.take(bytes, n);
write_bin(buf, bytes);
splice_into(frame, json::binary(bytes));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Comment on lines +465 to +468
}
std::vector<uint8_t> buf;
write_bin(buf, data);
CHECK(buf.size() == r.header_size + r.n);
Comment on lines +584 to +587
};
std::uniform_int_distribution<int> coin(0, 99);
std::uniform_int_distribution<size_t> bp(0, std::size(s_boundaries) - 1);
std::uniform_int_distribution<int64_t> any_s(
Comment thread src/msgpack/encode.h
STRING_TOO_LARGE = 1, // > 2^32-1 bytes
BIN_TOO_LARGE = 2, // > 2^32-1 bytes
MAP_TOO_LARGE = 3, // > 65535 elements (we cap at map16)
INVALID_EVENT_TIME = 4, // nanoseconds >= 1_000_000_000
Comment thread src/msgpack/encode.h
Comment on lines +377 to +379
// The payload is copied verbatim — msgpack str is byte-array,
// not text. We do not validate UTF-8 (the spec doesn't require it
// and the wire format is opaque to byte content).
@cjen1-msft cjen1-msft marked this pull request as draft May 6, 2026 10:33
Comment thread src/msgpack/endian.h
}
else
{
// std::byteswap is C++23-only; this hand-rolled swap keeps the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we able to switch to C++23 on Azure Linux 3 already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants