[fix](zone-map) Avoid per-row Field temporaries in TypedZoneMapIndexWriter by csun5285 · Pull Request #62544 · apache/doris

csun5285 · 2026-04-16T03:03:23Z

Problem

When writing nullable columns, append_nullable splits each block at null boundaries into small non-null runs.

Input block	add_values calls
`[1, 3, NULL, NULL, 4, 5, 6, NULL]`	`add_values([1,3], count=2)` + `add_values([4,5,6], count=3)`

Each add_values call built two heap-allocated Field temporaries just to compare
min/max — allocation, memcpy, variant dispatch, destructor — per call.

Fix

Track page min/max as raw ValType (the same native type std::minmax_element already uses). Materialize Field only once at flush() time.

Benchmark (Release, 65536 rows/iter)

block_rows = post-null-split run length.

Type	block	Baseline (M items/s)	Optimized (M items/s)	Speedup
Int32	1	21.9	240.3	11.0×
Int32	4	75.9	353.0	4.7×
Int32	16	170.0	400.1	2.4×
Int32	64	261.9	394.4	1.5×
Int32	256	314.0	419.7	1.3×
Int32	1024	320.6	425.9	1.3×
Int32	4096	334.2	359.3	1.1×
Double	1	16.7	181.9	10.9×
Double	4	47.0	168.0	3.6×
Double	16	120.0	310.0	2.6×
Double	64	223.1	425.3	1.9×
Double	256	315.1	469.8	1.5×
Double	1024	358.9	477.1	1.3×
Double	4096	373.8	483.3	1.3×
String	1	8.0	59.2	7.4×
String	4	21.7	57.9	2.7×
String	16	41.3	60.7	1.5×
String	64	62.9	80.6	1.3×
String	256	75.2	82.0	1.1×
String	1024	79.4	84.0	1.1×
String	4096	80.4	83.4	1.0×

Small batches (1–16): 2×–11×. Large batches: no regression.

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-04-16T03:03:30Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

csun5285 · 2026-04-16T03:05:19Z

/review

csun5285 · 2026-04-16T03:09:20Z

run buildall

github-actions

Findings

be/src/storage/index/zone_map/zone_map_index.cpp: the new string min/max path now copies and compares the full StringRef in the hot path. Because _materialize_page_minmax() immediately goes through Field::create_field_from_olap_value(), bytes after MAX_ZONE_MAP_INDEX_SIZE are still truncated before any page/segment zone map is persisted. That means long STRING/VARCHAR values now pay O(full_value_size) copy/compare cost and keep those full copies alive until page flush without improving the serialized zone map, so this can regress the long-string / variant workloads the patch is trying to speed up.
be/benchmark/benchmark_zone_map_index.hpp: make_writer() leaks the StorageField returned by StorageFieldFactory::create(*col) on every benchmark iteration. Since each benchmark loop constructs a fresh writer, the benchmark process accumulates leaked fields, which can skew timings or OOM long runs.

Critical Checkpoint Conclusions

Goal of the task: reduce TypedZoneMapIndexWriter::add_values() overhead. The numeric path improves that, but the new string path still has a blocking hot-path regression for long values, so the current patch does not fully accomplish the goal yet.
Scope/minimality: the production change is focused, and the benchmark addition is focused, but the benchmark currently has a resource-management bug.
Concurrency: no new cross-thread behavior is introduced here; writer state stays local to a single writer instance.
Lifecycle/static init: page-lifetime state (_page_min, _page_max, _arena) is self-contained inside the writer. I did not find a new lifecycle or static-initialization bug.
Config/compatibility: no new config, protocol, or storage-format compatibility concerns found.
Parallel code paths: I did not find another equivalent zone-map write path that also needed the same change.
Special conditions/comments: the new string-specific fast path is documented, but the current implementation keeps more data than the persisted zone map can ever use.
Tests: existing be/test/storage/segment/zone_map_index_test.cpp covers basic zone-map serialization, truncation, and float flag behavior, but no new test guards the long-string deferred-materialization path added here.
Observability: not applicable for this refactor.
Transactions/persistence/data correctness: I did not find a data-correctness regression in the currently covered paths.
Performance: blocked by the long-string full-copy/full-compare behavior above.
Other issues: the benchmark leak makes the evidence in the PR body unreliable until it is fixed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

csun5285 · 2026-04-16T03:44:53Z

/review

github-actions

No blocking findings in the modified production path.

Review checkpoints:

Goal/task: The change achieves its goal of moving page min/max tracking to raw ValType and materializing Field only at flush() time. Existing be/test/storage/segment/zone_map_index_test.cpp already exercises repeated add_values()/flush() behavior across int, string, float special values, decimal, and date/time types, although there is no new targeted test for this exact hot path.
Scope/focus: The production change is small and localized to TypedZoneMapIndexWriter, with the benchmark kept separate.
Concurrency: No new concurrent path is introduced. This writer state is still page-local and used through the existing single-writer column-write call chain, so there are no new lock ordering or thread-safety concerns here.
Lifecycle: The new _page_min, _page_max, _page_has_minmax, and _arena state is reset on every flush(). For string types, bytes are copied into _arena, then materialized into Field before _arena.clear(), so I do not see a dangling-reference issue escaping the writer.
Config/compatibility: No new config, FE/BE protocol, or on-disk format change is introduced. ZoneMapPB serialization remains on the existing path.
Parallel code paths: The optimization is applied in the shared zone-map writer path used by the nullable-run split workload described in the PR, so I did not find a sibling production path in this abstraction that also needed the same change.
Special checks: String truncation behavior and float NaN/+Inf/-Inf flag handling remain aligned with the pre-change logic.
Observability/persistence/transactions/data writes: This change only affects in-memory min/max accumulation before the existing serialization path; I do not see new observability, persistence, transactionality, or data-visibility risks.
Performance: The direction is sound. The hot path removes repeated Field construction while keeping materialization at page flush.

Residual risk: I reviewed the code and existing unit coverage, but I did not run a BE build or tests in this runner, so the remaining risk is limited to unexecuted build/benchmark compilation rather than a production-path correctness concern.

csun5285 · 2026-04-16T04:16:44Z

run buildall

…riter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

csun5285 · 2026-04-16T05:31:20Z

run buildall

csun5285 · 2026-04-16T05:47:53Z

/review

github-actions

No blocking issues found in this PR.

Critical checkpoints:

Goal / correctness: The change achieves its stated goal of removing per-call Field temporary construction from TypedZoneMapIndexWriter while preserving the existing zone-map serialization path. The existing unit tests in be/test/storage/segment/zone_map_index_test.cpp already exercise repeated add_values() calls, string truncation, pass-all pages, and multiple primitive/date/time types against the same writer logic.
Change scope: The modification is small and focused. It only adjusts the zone-map writer hot path and adds a benchmark wired into the benchmark binary.
Concurrency: No new concurrency is introduced. TypedZoneMapIndexWriter remains object-local writer state, and the PR does not change locking, thread interaction, or lifecycle ordering.
Lifecycle / initialization: No special lifecycle or static-initialization risk was introduced. The new _page_min, _page_max, and _page_has_minmax state is reset during flush().
Configuration: No new config item was added.
Compatibility: No FE/BE protocol change or on-disk format change was introduced. ZoneMapPB serialization/deserialization remains unchanged.
Parallel paths: I did not find another equivalent zone-map writer path that also needed the same change.
Special conditions: The string truncation behavior and float/double special-value handling follow the existing semantics of this module.
Test coverage: Existing unit coverage is relevant for this change, but no new automated test was added specifically for the deferred materialization path, and I did not run BE tests in this runner.
Observability: No additional logs or metrics are needed for this localized hot-path optimization.
Transaction / persistence / data writes: This touches the storage write path, but it does not alter transaction semantics or persisted metadata shape, so I do not see a replay/compatibility concern from this diff.
Performance: The optimization removes hot-path Field materialization and defers it to flush(). The added benchmark is aligned with the intended compaction/write workload, and I did not identify an obvious performance regression in the changed logic.
Other issues: None blocking found.

Residual risk:

I relied on static review plus the existing zone-map unit coverage; I did not build or run tests locally in this runner.

hello-stephen · 2026-04-16T07:58:42Z

BE UT Coverage Report

Increment line coverage 100.00% (45/45) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.12% (20193/38013)
Line Coverage	36.68% (190143/518336)
Region Coverage	32.92% (147554/448233)
Branch Coverage	34.06% (64610/189702)

hello-stephen · 2026-04-16T08:04:34Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (45/45) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.61% (27400/37222)
Line Coverage	57.30% (296095/516718)
Region Coverage	54.27% (245503/452343)
Branch Coverage	56.07% (106693/190270)

github-actions bot requested changes Apr 16, 2026

View reviewed changes

Comment thread be/src/storage/index/zone_map/zone_map_index.cpp Outdated

Comment thread be/benchmark/benchmark_zone_map_index.hpp Outdated

[bench](zone-map) Add TypedZoneMapIndexWriter::add_values benchmark

8710cee

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

csun5285 force-pushed the opt-zonemapindex branch 2 times, most recently from 0b1cd15 to 4378af6 Compare April 16, 2026 03:44

github-actions bot reviewed Apr 16, 2026

View reviewed changes

[opt](zone-map) Avoid per-row Field temporaries in TypedZoneMapIndexW…

f1cf838

…riter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

csun5285 force-pushed the opt-zonemapindex branch from 4378af6 to f1cf838 Compare April 16, 2026 05:26

github-actions bot reviewed Apr 16, 2026

View reviewed changes

Conversation

csun5285 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Benchmark (Release, 65536 rows/iter)

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Apr 16, 2026

Uh oh!

csun5285 commented Apr 16, 2026

Uh oh!

csun5285 commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

csun5285 commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

csun5285 commented Apr 16, 2026

Uh oh!

csun5285 commented Apr 16, 2026

Uh oh!

csun5285 commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

hello-stephen commented Apr 16, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 16, 2026

BE Regression && UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

csun5285 commented Apr 16, 2026 •

edited

Loading