[fix](zone-map) Avoid per-row Field temporaries in TypedZoneMapIndexWriter#62544
[fix](zone-map) Avoid per-row Field temporaries in TypedZoneMapIndexWriter#62544csun5285 wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
Findings
be/src/storage/index/zone_map/zone_map_index.cpp: the new string min/max path now copies and compares the fullStringRefin the hot path. Because_materialize_page_minmax()immediately goes throughField::create_field_from_olap_value(), bytes afterMAX_ZONE_MAP_INDEX_SIZEare still truncated before any page/segment zone map is persisted. That means longSTRING/VARCHARvalues now pay O(full_value_size) copy/compare cost and keep those full copies alive until page flush without improving the serialized zone map, so this can regress the long-string / variant workloads the patch is trying to speed up.be/benchmark/benchmark_zone_map_index.hpp:make_writer()leaks theStorageFieldreturned byStorageFieldFactory::create(*col)on every benchmark iteration. Since each benchmark loop constructs a fresh writer, the benchmark process accumulates leaked fields, which can skew timings or OOM long runs.
Critical Checkpoint Conclusions
- Goal of the task: reduce
TypedZoneMapIndexWriter::add_values()overhead. The numeric path improves that, but the new string path still has a blocking hot-path regression for long values, so the current patch does not fully accomplish the goal yet. - Scope/minimality: the production change is focused, and the benchmark addition is focused, but the benchmark currently has a resource-management bug.
- Concurrency: no new cross-thread behavior is introduced here; writer state stays local to a single writer instance.
- Lifecycle/static init: page-lifetime state (
_page_min,_page_max,_arena) is self-contained inside the writer. I did not find a new lifecycle or static-initialization bug. - Config/compatibility: no new config, protocol, or storage-format compatibility concerns found.
- Parallel code paths: I did not find another equivalent zone-map write path that also needed the same change.
- Special conditions/comments: the new string-specific fast path is documented, but the current implementation keeps more data than the persisted zone map can ever use.
- Tests: existing
be/test/storage/segment/zone_map_index_test.cppcovers basic zone-map serialization, truncation, and float flag behavior, but no new test guards the long-string deferred-materialization path added here. - Observability: not applicable for this refactor.
- Transactions/persistence/data correctness: I did not find a data-correctness regression in the currently covered paths.
- Performance: blocked by the long-string full-copy/full-compare behavior above.
- Other issues: the benchmark leak makes the evidence in the PR body unreliable until it is fixed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0b1cd15 to
4378af6
Compare
|
/review |
There was a problem hiding this comment.
No blocking findings in the modified production path.
Review checkpoints:
- Goal/task: The change achieves its goal of moving page min/max tracking to raw
ValTypeand materializingFieldonly atflush()time. Existingbe/test/storage/segment/zone_map_index_test.cppalready exercises repeatedadd_values()/flush()behavior across int, string, float special values, decimal, and date/time types, although there is no new targeted test for this exact hot path. - Scope/focus: The production change is small and localized to
TypedZoneMapIndexWriter, with the benchmark kept separate. - Concurrency: No new concurrent path is introduced. This writer state is still page-local and used through the existing single-writer column-write call chain, so there are no new lock ordering or thread-safety concerns here.
- Lifecycle: The new
_page_min,_page_max,_page_has_minmax, and_arenastate is reset on everyflush(). For string types, bytes are copied into_arena, then materialized intoFieldbefore_arena.clear(), so I do not see a dangling-reference issue escaping the writer. - Config/compatibility: No new config, FE/BE protocol, or on-disk format change is introduced.
ZoneMapPBserialization remains on the existing path. - Parallel code paths: The optimization is applied in the shared zone-map writer path used by the nullable-run split workload described in the PR, so I did not find a sibling production path in this abstraction that also needed the same change.
- Special checks: String truncation behavior and float
NaN/+Inf/-Infflag handling remain aligned with the pre-change logic. - Observability/persistence/transactions/data writes: This change only affects in-memory min/max accumulation before the existing serialization path; I do not see new observability, persistence, transactionality, or data-visibility risks.
- Performance: The direction is sound. The hot path removes repeated
Fieldconstruction while keeping materialization at page flush.
Residual risk: I reviewed the code and existing unit coverage, but I did not run a BE build or tests in this runner, so the remaining risk is limited to unexecuted build/benchmark compilation rather than a production-path correctness concern.
|
run buildall |
…riter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4378af6 to
f1cf838
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
No blocking issues found in this PR.
Critical checkpoints:
- Goal / correctness: The change achieves its stated goal of removing per-call
Fieldtemporary construction fromTypedZoneMapIndexWriterwhile preserving the existing zone-map serialization path. The existing unit tests inbe/test/storage/segment/zone_map_index_test.cppalready exercise repeatedadd_values()calls, string truncation, pass-all pages, and multiple primitive/date/time types against the same writer logic. - Change scope: The modification is small and focused. It only adjusts the zone-map writer hot path and adds a benchmark wired into the benchmark binary.
- Concurrency: No new concurrency is introduced.
TypedZoneMapIndexWriterremains object-local writer state, and the PR does not change locking, thread interaction, or lifecycle ordering. - Lifecycle / initialization: No special lifecycle or static-initialization risk was introduced. The new
_page_min,_page_max, and_page_has_minmaxstate is reset duringflush(). - Configuration: No new config item was added.
- Compatibility: No FE/BE protocol change or on-disk format change was introduced.
ZoneMapPBserialization/deserialization remains unchanged. - Parallel paths: I did not find another equivalent zone-map writer path that also needed the same change.
- Special conditions: The string truncation behavior and float/double special-value handling follow the existing semantics of this module.
- Test coverage: Existing unit coverage is relevant for this change, but no new automated test was added specifically for the deferred materialization path, and I did not run BE tests in this runner.
- Observability: No additional logs or metrics are needed for this localized hot-path optimization.
- Transaction / persistence / data writes: This touches the storage write path, but it does not alter transaction semantics or persisted metadata shape, so I do not see a replay/compatibility concern from this diff.
- Performance: The optimization removes hot-path
Fieldmaterialization and defers it toflush(). The added benchmark is aligned with the intended compaction/write workload, and I did not identify an obvious performance regression in the changed logic. - Other issues: None blocking found.
Residual risk:
- I relied on static review plus the existing zone-map unit coverage; I did not build or run tests locally in this runner.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Problem
When writing nullable columns,
append_nullablesplits each block at null boundaries into small non-null runs.[1, 3, NULL, NULL, 4, 5, 6, NULL]add_values([1,3], count=2)+add_values([4,5,6], count=3)Each
add_valuescall built two heap-allocatedFieldtemporaries just to comparemin/max — allocation, memcpy, variant dispatch, destructor — per call.
Fix
Track page min/max as raw
ValType(the same native typestd::minmax_elementalready uses). MaterializeFieldonly once atflush()time.Benchmark (Release, 65536 rows/iter)
block_rows= post-null-split run length.Small batches (1–16): 2×–11×. Large batches: no regression.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)