[opt](function) optimize `replace()` / `replace_empty()` string functions with columnar fast path by Mryange · Pull Request #62551 · apache/doris

Mryange · 2026-04-16T07:01:10Z

What problem does this PR solve?

The previous replace(col, 'literal', 'literal') implementation processed each row independently:

to_string() — copy row data into a temporary std::string
std::string::find + std::string::replace — in-place realloc per match
insert_data() — copy result back into ColumnString

This caused three heap allocations per row and poor cache utilization.

What was done

Added _replace_const_pattern() as a fast path that fires when both old_str and new_str are constants and old_str is non-empty (the common case in SQL queries). The fast path:

Works directly on ColumnString chars/offsets — zero per-row std::string allocation
Uses a two-level search strategy:
- Level-1: memchr (glibc AVX512) scans for the needle's first byte; if absent, the entire row is bulk-copied with a single memcpy and the SSE4.1 loop is skipped entirely
- Level-2: ASCIICaseSensitiveStringSearcher (SSE4.1, prebuilt once outside the row loop) handles full needle matching for rows that pass the pre-filter
Uses StringOP::push_value_string / push_empty_string for consistent offset management
Applies to both replace and replace_empty — when old_str is non-empty these two functions behave identically

Benchmark results (Release build, 192-core machine)

Case	Old	New	Speedup
SmallStr (10k rows, 50B avg, hits)	1,717,467 ns	1,278,740 ns	1.34x
MedStr (5k rows, 200B avg, hits)	2,099,738 ns	1,247,593 ns	1.68x
LargeStr (1k rows, 1000B avg, hits)	1,230,671 ns	586,966 ns	2.10x
NoMatch (10k rows, no needle present)	378,851 ns	217,587 ns	1.74x

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-04-16T07:01:18Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Mryange · 2026-04-16T07:02:13Z

/review

github-actions

I found one blocking issue in the new replace / replace_empty fast path.

Critical checkpoint conclusions:

Goal of the task / correctness / tests: The goal is to speed up replace and replace_empty when the pattern and replacement are constants. The optimization is not yet correct because the new path bypasses existing ColumnString overflow protection and can corrupt 32-bit offsets instead of failing cleanly on oversized results. Existing BE unit tests cover replace semantics and const-argument combinations, but check_function_all_arg_comb() executes const-argument cases one row at a time, so the new multi-row fast path is not directly covered.
Change size / focus: The change is small and focused on one BE function plus benchmarks.
Concurrency: No new concurrency is introduced here; this code runs within a single function evaluation and does not add shared mutable state or locking.
Lifecycle / static initialization: No special lifecycle or static initialization changes are involved.
Configuration items: None.
Compatibility / parallel paths: No FE/BE protocol or storage compatibility changes are involved. The optimization only affects the BE implementation of replace / replace_empty, so it must preserve the existing ColumnString contract exactly.
Special conditional checks: The fast-path guard itself is reasonable, but the helper it enters still has to preserve the old overflow/error behavior.
Test coverage: Existing regression and unit coverage do not directly exercise the new multi-row fast path with constant old_str and new_str.
Test result files: Not applicable; no test outputs were added.
Observability: Not applicable for this local string-function optimization.
Transaction / persistence / data writes: Not applicable.
FE/BE variable passing: Not applicable.
Performance: The approach is promising, but correctness must be fixed before the speedup can be relied on.

Please address the inline comment before merging.

github-actions · 2026-04-16T07:21:18Z

+            const char* pos = row.data;
+            while (pos < row_end) {
+                const char* match = searcher.search(pos, row_end);
+                // Copy prefix before match


This fast path bypasses ColumnString::insert_data() / StringOP::push_value_string() and never calls ColumnString::check_chars_length() before growing dst_chars. The old implementation would raise STRING_OVERFLOW_IN_VEC_ENGINE once the result column exceeded 4 GiB, but this path will resize() past that and then truncate offsets when push_empty_string() stores chars.size() into the 32-bit offsets array. A query that expands strings significantly can therefore silently corrupt results instead of failing. Please preserve the overflow check before each manual resize() (or validate the final row growth up front) so the fast path matches the existing ColumnString contract.

Mryange · 2026-04-16T07:47:45Z

/review

github-actions

No blocking issues found on the current head (342af9d899662922b1203f9b24c96aee0a4d31c7). The earlier ColumnString overflow problem in the first revision has been fixed.

Critical checkpoint conclusions:

Goal of the task / correctness / tests: The goal is to optimize replace() / replace_empty() when old_str and new_str are constants. The current code achieves that goal and keeps the non-empty-pattern semantics aligned with the existing byte-oriented implementation. Existing BE unit tests for replace and replace_empty cover functional semantics and const-argument combinations, so the new branch is exercised, although there is still no dedicated multi-row fast-path test.
Change size / focus: The change is small and focused: one BE function fast path plus a benchmark.
Concurrency: No new concurrency is introduced. The code runs inside one vectorized function evaluation, adds no shared mutable state, and requires no locking.
Lifecycle / static initialization: No special lifecycle or static initialization changes are involved.
Configuration items: None.
Compatibility / parallel paths: No FE/BE protocol, storage-format, or rolling-upgrade compatibility issues are involved. The optimization is confined to the BE implementation, and the replace / replace_empty split is preserved by only taking the fast path when old_str is non-empty.
Special conditional checks: The fast-path guard is narrow and justified: both pattern and replacement must be constant, and the empty-pattern case still falls back to the existing implementation.
Test coverage: Existing unit coverage is reasonable for semantics and constness permutations. The new benchmark is useful for performance validation. I did not run BE tests in this review session.
Test result files: Not applicable; no regression result files were changed.
Observability: No additional observability is needed for this local string-function optimization.
Transaction / persistence / data writes / FE-BE variable passing: Not applicable.
Performance: The optimization is sensible. Building the searcher once per block and writing directly into ColumnString removes the per-row std::string allocation/copy path that dominated the old implementation.
Other issues: I did not find remaining blocking correctness issues in the updated patch.

Residual risk:

There is no dedicated multi-row correctness test that specifically targets the new constant-pattern fast path across several rows in one block. That is a test gap worth closing later, but it is not a blocker for this patch as written.

Mryange · 2026-04-16T08:13:01Z

run buildall

hello-stephen · 2026-04-16T09:25:24Z

BE UT Coverage Report

Increment line coverage 100.00% (52/52) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.13% (20193/38009)
Line Coverage	36.69% (190202/518342)
Region Coverage	32.94% (147651/448250)
Branch Coverage	34.08% (64651/189720)

hello-stephen · 2026-04-16T10:21:44Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (52/52) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.72% (27438/37218)
Line Coverage	57.33% (296250/516724)
Region Coverage	54.39% (246043/452360)
Branch Coverage	56.17% (106883/190288)

第二次过了

upd

49f7cf5

github-actions bot previously requested changes Apr 16, 2026

View reviewed changes

review

342af9d

github-actions bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[opt](function) optimize `replace()` / `replace_empty()` string functions with columnar fast path#62551

[opt](function) optimize `replace()` / `replace_empty()` string functions with columnar fast path#62551
Mryange wants to merge 2 commits intoapache:masterfrom
Mryange:opt-replace

Mryange commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Apr 16, 2026

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mryange commented Apr 16, 2026

What problem does this PR solve?

What was done

Benchmark results (Release build, 192-core machine)

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Apr 16, 2026

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 16, 2026

BE Regression && UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants