Skip to content

[fix](zone-map) Avoid per-row Field temporaries in TypedZoneMapIndexWriter#62544

Open
csun5285 wants to merge 2 commits intoapache:masterfrom
csun5285:opt-zonemapindex
Open

[fix](zone-map) Avoid per-row Field temporaries in TypedZoneMapIndexWriter#62544
csun5285 wants to merge 2 commits intoapache:masterfrom
csun5285:opt-zonemapindex

Conversation

@csun5285
Copy link
Copy Markdown
Contributor

@csun5285 csun5285 commented Apr 16, 2026

Problem

When writing nullable columns, append_nullable splits each block at null boundaries into small non-null runs.

Input block add_values calls
[1, 3, NULL, NULL, 4, 5, 6, NULL] add_values([1,3], count=2) + add_values([4,5,6], count=3)

Each add_values call built two heap-allocated Field temporaries just to compare
min/max — allocation, memcpy, variant dispatch, destructor — per call.

Fix

Track page min/max as raw ValType (the same native type std::minmax_element already uses). Materialize Field only once at flush() time.

Benchmark (Release, 65536 rows/iter)

block_rows = post-null-split run length.

Type block Baseline (M items/s) Optimized (M items/s) Speedup
Int32 1 21.9 240.3 11.0×
Int32 4 75.9 353.0 4.7×
Int32 16 170.0 400.1 2.4×
Int32 64 261.9 394.4 1.5×
Int32 256 314.0 419.7 1.3×
Int32 1024 320.6 425.9 1.3×
Int32 4096 334.2 359.3 1.1×
Double 1 16.7 181.9 10.9×
Double 4 47.0 168.0 3.6×
Double 16 120.0 310.0 2.6×
Double 64 223.1 425.3 1.9×
Double 256 315.1 469.8 1.5×
Double 1024 358.9 477.1 1.3×
Double 4096 373.8 483.3 1.3×
String 1 8.0 59.2 7.4×
String 4 21.7 57.9 2.7×
String 16 41.3 60.7 1.5×
String 64 62.9 80.6 1.3×
String 256 75.2 82.0 1.1×
String 1024 79.4 84.0 1.1×
String 4096 80.4 83.4 1.0×

Small batches (1–16): 2×–11×. Large batches: no regression.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285
Copy link
Copy Markdown
Contributor Author

/review

@csun5285
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  1. be/src/storage/index/zone_map/zone_map_index.cpp: the new string min/max path now copies and compares the full StringRef in the hot path. Because _materialize_page_minmax() immediately goes through Field::create_field_from_olap_value(), bytes after MAX_ZONE_MAP_INDEX_SIZE are still truncated before any page/segment zone map is persisted. That means long STRING/VARCHAR values now pay O(full_value_size) copy/compare cost and keep those full copies alive until page flush without improving the serialized zone map, so this can regress the long-string / variant workloads the patch is trying to speed up.
  2. be/benchmark/benchmark_zone_map_index.hpp: make_writer() leaks the StorageField returned by StorageFieldFactory::create(*col) on every benchmark iteration. Since each benchmark loop constructs a fresh writer, the benchmark process accumulates leaked fields, which can skew timings or OOM long runs.

Critical Checkpoint Conclusions

  • Goal of the task: reduce TypedZoneMapIndexWriter::add_values() overhead. The numeric path improves that, but the new string path still has a blocking hot-path regression for long values, so the current patch does not fully accomplish the goal yet.
  • Scope/minimality: the production change is focused, and the benchmark addition is focused, but the benchmark currently has a resource-management bug.
  • Concurrency: no new cross-thread behavior is introduced here; writer state stays local to a single writer instance.
  • Lifecycle/static init: page-lifetime state (_page_min, _page_max, _arena) is self-contained inside the writer. I did not find a new lifecycle or static-initialization bug.
  • Config/compatibility: no new config, protocol, or storage-format compatibility concerns found.
  • Parallel code paths: I did not find another equivalent zone-map write path that also needed the same change.
  • Special conditions/comments: the new string-specific fast path is documented, but the current implementation keeps more data than the persisted zone map can ever use.
  • Tests: existing be/test/storage/segment/zone_map_index_test.cpp covers basic zone-map serialization, truncation, and float flag behavior, but no new test guards the long-string deferred-materialization path added here.
  • Observability: not applicable for this refactor.
  • Transactions/persistence/data correctness: I did not find a data-correctness regression in the currently covered paths.
  • Performance: blocked by the long-string full-copy/full-compare behavior above.
  • Other issues: the benchmark leak makes the evidence in the PR body unreliable until it is fixed.

Comment thread be/src/storage/index/zone_map/zone_map_index.cpp Outdated
Comment thread be/benchmark/benchmark_zone_map_index.hpp Outdated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@csun5285 csun5285 force-pushed the opt-zonemapindex branch 2 times, most recently from 0b1cd15 to 4378af6 Compare April 16, 2026 03:44
@csun5285
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking findings in the modified production path.

Review checkpoints:

  • Goal/task: The change achieves its goal of moving page min/max tracking to raw ValType and materializing Field only at flush() time. Existing be/test/storage/segment/zone_map_index_test.cpp already exercises repeated add_values()/flush() behavior across int, string, float special values, decimal, and date/time types, although there is no new targeted test for this exact hot path.
  • Scope/focus: The production change is small and localized to TypedZoneMapIndexWriter, with the benchmark kept separate.
  • Concurrency: No new concurrent path is introduced. This writer state is still page-local and used through the existing single-writer column-write call chain, so there are no new lock ordering or thread-safety concerns here.
  • Lifecycle: The new _page_min, _page_max, _page_has_minmax, and _arena state is reset on every flush(). For string types, bytes are copied into _arena, then materialized into Field before _arena.clear(), so I do not see a dangling-reference issue escaping the writer.
  • Config/compatibility: No new config, FE/BE protocol, or on-disk format change is introduced. ZoneMapPB serialization remains on the existing path.
  • Parallel code paths: The optimization is applied in the shared zone-map writer path used by the nullable-run split workload described in the PR, so I did not find a sibling production path in this abstraction that also needed the same change.
  • Special checks: String truncation behavior and float NaN/+Inf/-Inf flag handling remain aligned with the pre-change logic.
  • Observability/persistence/transactions/data writes: This change only affects in-memory min/max accumulation before the existing serialization path; I do not see new observability, persistence, transactionality, or data-visibility risks.
  • Performance: The direction is sound. The hot path removes repeated Field construction while keeping materialization at page flush.

Residual risk: I reviewed the code and existing unit coverage, but I did not run a BE build or tests in this runner, so the remaining risk is limited to unexecuted build/benchmark compilation rather than a production-path correctness concern.

@csun5285
Copy link
Copy Markdown
Contributor Author

run buildall

…riter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@csun5285
Copy link
Copy Markdown
Contributor Author

run buildall

@csun5285
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found in this PR.

Critical checkpoints:

  • Goal / correctness: The change achieves its stated goal of removing per-call Field temporary construction from TypedZoneMapIndexWriter while preserving the existing zone-map serialization path. The existing unit tests in be/test/storage/segment/zone_map_index_test.cpp already exercise repeated add_values() calls, string truncation, pass-all pages, and multiple primitive/date/time types against the same writer logic.
  • Change scope: The modification is small and focused. It only adjusts the zone-map writer hot path and adds a benchmark wired into the benchmark binary.
  • Concurrency: No new concurrency is introduced. TypedZoneMapIndexWriter remains object-local writer state, and the PR does not change locking, thread interaction, or lifecycle ordering.
  • Lifecycle / initialization: No special lifecycle or static-initialization risk was introduced. The new _page_min, _page_max, and _page_has_minmax state is reset during flush().
  • Configuration: No new config item was added.
  • Compatibility: No FE/BE protocol change or on-disk format change was introduced. ZoneMapPB serialization/deserialization remains unchanged.
  • Parallel paths: I did not find another equivalent zone-map writer path that also needed the same change.
  • Special conditions: The string truncation behavior and float/double special-value handling follow the existing semantics of this module.
  • Test coverage: Existing unit coverage is relevant for this change, but no new automated test was added specifically for the deferred materialization path, and I did not run BE tests in this runner.
  • Observability: No additional logs or metrics are needed for this localized hot-path optimization.
  • Transaction / persistence / data writes: This touches the storage write path, but it does not alter transaction semantics or persisted metadata shape, so I do not see a replay/compatibility concern from this diff.
  • Performance: The optimization removes hot-path Field materialization and defers it to flush(). The added benchmark is aligned with the intended compaction/write workload, and I did not identify an obvious performance regression in the changed logic.
  • Other issues: None blocking found.

Residual risk:

  • I relied on static review plus the existing zone-map unit coverage; I did not build or run tests locally in this runner.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (45/45) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.12% (20193/38013)
Line Coverage 36.68% (190143/518336)
Region Coverage 32.92% (147554/448233)
Branch Coverage 34.06% (64610/189702)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (45/45) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.61% (27400/37222)
Line Coverage 57.30% (296095/516718)
Region Coverage 54.27% (245503/452343)
Branch Coverage 56.07% (106693/190270)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants