Skip to content

[opt](function) optimize replace() / replace_empty() string functions with columnar fast path#62551

Open
Mryange wants to merge 2 commits intoapache:masterfrom
Mryange:opt-replace
Open

[opt](function) optimize replace() / replace_empty() string functions with columnar fast path#62551
Mryange wants to merge 2 commits intoapache:masterfrom
Mryange:opt-replace

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Apr 16, 2026

What problem does this PR solve?

The previous replace(col, 'literal', 'literal') implementation processed each row independently:

  1. to_string() — copy row data into a temporary std::string
  2. std::string::find + std::string::replace — in-place realloc per match
  3. insert_data() — copy result back into ColumnString

This caused three heap allocations per row and poor cache utilization.

What was done

Added _replace_const_pattern() as a fast path that fires when both old_str and new_str are constants and old_str is non-empty (the common case in SQL queries). The fast path:

  • Works directly on ColumnString chars/offsets — zero per-row std::string allocation
  • Uses a two-level search strategy:
    • Level-1: memchr (glibc AVX512) scans for the needle's first byte; if absent, the entire row is bulk-copied with a single memcpy and the SSE4.1 loop is skipped entirely
    • Level-2: ASCIICaseSensitiveStringSearcher (SSE4.1, prebuilt once outside the row loop) handles full needle matching for rows that pass the pre-filter
  • Uses StringOP::push_value_string / push_empty_string for consistent offset management
  • Applies to both replace and replace_empty — when old_str is non-empty these two functions behave identically

Benchmark results (Release build, 192-core machine)

Case Old New Speedup
SmallStr (10k rows, 50B avg, hits) 1,717,467 ns 1,278,740 ns 1.34x
MedStr (5k rows, 200B avg, hits) 2,099,738 ns 1,247,593 ns 1.68x
LargeStr (1k rows, 1000B avg, hits) 1,230,671 ns 586,966 ns 2.10x
NoMatch (10k rows, no needle present) 378,851 ns 217,587 ns 1.74x

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 16, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue in the new replace / replace_empty fast path.

Critical checkpoint conclusions:

  • Goal of the task / correctness / tests: The goal is to speed up replace and replace_empty when the pattern and replacement are constants. The optimization is not yet correct because the new path bypasses existing ColumnString overflow protection and can corrupt 32-bit offsets instead of failing cleanly on oversized results. Existing BE unit tests cover replace semantics and const-argument combinations, but check_function_all_arg_comb() executes const-argument cases one row at a time, so the new multi-row fast path is not directly covered.
  • Change size / focus: The change is small and focused on one BE function plus benchmarks.
  • Concurrency: No new concurrency is introduced here; this code runs within a single function evaluation and does not add shared mutable state or locking.
  • Lifecycle / static initialization: No special lifecycle or static initialization changes are involved.
  • Configuration items: None.
  • Compatibility / parallel paths: No FE/BE protocol or storage compatibility changes are involved. The optimization only affects the BE implementation of replace / replace_empty, so it must preserve the existing ColumnString contract exactly.
  • Special conditional checks: The fast-path guard itself is reasonable, but the helper it enters still has to preserve the old overflow/error behavior.
  • Test coverage: Existing regression and unit coverage do not directly exercise the new multi-row fast path with constant old_str and new_str.
  • Test result files: Not applicable; no test outputs were added.
  • Observability: Not applicable for this local string-function optimization.
  • Transaction / persistence / data writes: Not applicable.
  • FE/BE variable passing: Not applicable.
  • Performance: The approach is promising, but correctness must be fixed before the speedup can be relied on.

Please address the inline comment before merging.

const char* pos = row.data;
while (pos < row_end) {
const char* match = searcher.search(pos, row_end);
// Copy prefix before match
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fast path bypasses ColumnString::insert_data() / StringOP::push_value_string() and never calls ColumnString::check_chars_length() before growing dst_chars. The old implementation would raise STRING_OVERFLOW_IN_VEC_ENGINE once the result column exceeded 4 GiB, but this path will resize() past that and then truncate offsets when push_empty_string() stores chars.size() into the 32-bit offsets array. A query that expands strings significantly can therefore silently corrupt results instead of failing. Please preserve the overflow check before each manual resize() (or validate the final row growth up front) so the fast path matches the existing ColumnString contract.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 16, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found on the current head (342af9d899662922b1203f9b24c96aee0a4d31c7). The earlier ColumnString overflow problem in the first revision has been fixed.

Critical checkpoint conclusions:

  • Goal of the task / correctness / tests: The goal is to optimize replace() / replace_empty() when old_str and new_str are constants. The current code achieves that goal and keeps the non-empty-pattern semantics aligned with the existing byte-oriented implementation. Existing BE unit tests for replace and replace_empty cover functional semantics and const-argument combinations, so the new branch is exercised, although there is still no dedicated multi-row fast-path test.
  • Change size / focus: The change is small and focused: one BE function fast path plus a benchmark.
  • Concurrency: No new concurrency is introduced. The code runs inside one vectorized function evaluation, adds no shared mutable state, and requires no locking.
  • Lifecycle / static initialization: No special lifecycle or static initialization changes are involved.
  • Configuration items: None.
  • Compatibility / parallel paths: No FE/BE protocol, storage-format, or rolling-upgrade compatibility issues are involved. The optimization is confined to the BE implementation, and the replace / replace_empty split is preserved by only taking the fast path when old_str is non-empty.
  • Special conditional checks: The fast-path guard is narrow and justified: both pattern and replacement must be constant, and the empty-pattern case still falls back to the existing implementation.
  • Test coverage: Existing unit coverage is reasonable for semantics and constness permutations. The new benchmark is useful for performance validation. I did not run BE tests in this review session.
  • Test result files: Not applicable; no regression result files were changed.
  • Observability: No additional observability is needed for this local string-function optimization.
  • Transaction / persistence / data writes / FE-BE variable passing: Not applicable.
  • Performance: The optimization is sensible. Building the searcher once per block and writing directly into ColumnString removes the per-row std::string allocation/copy path that dominated the old implementation.
  • Other issues: I did not find remaining blocking correctness issues in the updated patch.

Residual risk:

  • There is no dedicated multi-row correctness test that specifically targets the new constant-pattern fast path across several rows in one block. That is a test gap worth closing later, but it is not a blocker for this patch as written.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 16, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (52/52) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.13% (20193/38009)
Line Coverage 36.69% (190202/518342)
Region Coverage 32.94% (147651/448250)
Branch Coverage 34.08% (64651/189720)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (52/52) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.72% (27438/37218)
Line Coverage 57.33% (296250/516724)
Region Coverage 54.39% (246043/452360)
Branch Coverage 56.17% (106883/190288)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants