[opt](function) optimize replace() / replace_empty() string functions with columnar fast path#62551
[opt](function) optimize replace() / replace_empty() string functions with columnar fast path#62551Mryange wants to merge 2 commits intoapache:masterfrom
replace() / replace_empty() string functions with columnar fast path#62551Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I found one blocking issue in the new replace / replace_empty fast path.
Critical checkpoint conclusions:
- Goal of the task / correctness / tests: The goal is to speed up
replaceandreplace_emptywhen the pattern and replacement are constants. The optimization is not yet correct because the new path bypasses existingColumnStringoverflow protection and can corrupt 32-bit offsets instead of failing cleanly on oversized results. Existing BE unit tests coverreplacesemantics and const-argument combinations, butcheck_function_all_arg_comb()executes const-argument cases one row at a time, so the new multi-row fast path is not directly covered. - Change size / focus: The change is small and focused on one BE function plus benchmarks.
- Concurrency: No new concurrency is introduced here; this code runs within a single function evaluation and does not add shared mutable state or locking.
- Lifecycle / static initialization: No special lifecycle or static initialization changes are involved.
- Configuration items: None.
- Compatibility / parallel paths: No FE/BE protocol or storage compatibility changes are involved. The optimization only affects the BE implementation of
replace/replace_empty, so it must preserve the existingColumnStringcontract exactly. - Special conditional checks: The fast-path guard itself is reasonable, but the helper it enters still has to preserve the old overflow/error behavior.
- Test coverage: Existing regression and unit coverage do not directly exercise the new multi-row fast path with constant
old_strandnew_str. - Test result files: Not applicable; no test outputs were added.
- Observability: Not applicable for this local string-function optimization.
- Transaction / persistence / data writes: Not applicable.
- FE/BE variable passing: Not applicable.
- Performance: The approach is promising, but correctness must be fixed before the speedup can be relied on.
Please address the inline comment before merging.
| const char* pos = row.data; | ||
| while (pos < row_end) { | ||
| const char* match = searcher.search(pos, row_end); | ||
| // Copy prefix before match |
There was a problem hiding this comment.
This fast path bypasses ColumnString::insert_data() / StringOP::push_value_string() and never calls ColumnString::check_chars_length() before growing dst_chars. The old implementation would raise STRING_OVERFLOW_IN_VEC_ENGINE once the result column exceeded 4 GiB, but this path will resize() past that and then truncate offsets when push_empty_string() stores chars.size() into the 32-bit offsets array. A query that expands strings significantly can therefore silently corrupt results instead of failing. Please preserve the overflow check before each manual resize() (or validate the final row growth up front) so the fast path matches the existing ColumnString contract.
|
/review |
There was a problem hiding this comment.
No blocking issues found on the current head (342af9d899662922b1203f9b24c96aee0a4d31c7). The earlier ColumnString overflow problem in the first revision has been fixed.
Critical checkpoint conclusions:
- Goal of the task / correctness / tests: The goal is to optimize
replace()/replace_empty()whenold_strandnew_strare constants. The current code achieves that goal and keeps the non-empty-pattern semantics aligned with the existing byte-oriented implementation. Existing BE unit tests forreplaceandreplace_emptycover functional semantics and const-argument combinations, so the new branch is exercised, although there is still no dedicated multi-row fast-path test. - Change size / focus: The change is small and focused: one BE function fast path plus a benchmark.
- Concurrency: No new concurrency is introduced. The code runs inside one vectorized function evaluation, adds no shared mutable state, and requires no locking.
- Lifecycle / static initialization: No special lifecycle or static initialization changes are involved.
- Configuration items: None.
- Compatibility / parallel paths: No FE/BE protocol, storage-format, or rolling-upgrade compatibility issues are involved. The optimization is confined to the BE implementation, and the
replace/replace_emptysplit is preserved by only taking the fast path whenold_stris non-empty. - Special conditional checks: The fast-path guard is narrow and justified: both pattern and replacement must be constant, and the empty-pattern case still falls back to the existing implementation.
- Test coverage: Existing unit coverage is reasonable for semantics and constness permutations. The new benchmark is useful for performance validation. I did not run BE tests in this review session.
- Test result files: Not applicable; no regression result files were changed.
- Observability: No additional observability is needed for this local string-function optimization.
- Transaction / persistence / data writes / FE-BE variable passing: Not applicable.
- Performance: The optimization is sensible. Building the searcher once per block and writing directly into
ColumnStringremoves the per-rowstd::stringallocation/copy path that dominated the old implementation. - Other issues: I did not find remaining blocking correctness issues in the updated patch.
Residual risk:
- There is no dedicated multi-row correctness test that specifically targets the new constant-pattern fast path across several rows in one block. That is a test gap worth closing later, but it is not a blocker for this patch as written.
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
The previous
replace(col, 'literal', 'literal')implementation processed each row independently:to_string()— copy row data into a temporarystd::stringstd::string::find+std::string::replace— in-place realloc per matchinsert_data()— copy result back into ColumnStringThis caused three heap allocations per row and poor cache utilization.
What was done
Added
_replace_const_pattern()as a fast path that fires when bothold_strandnew_strare constants andold_stris non-empty (the common case in SQL queries). The fast path:ColumnStringchars/offsets — zero per-rowstd::stringallocationmemchr(glibc AVX512) scans for the needle's first byte; if absent, the entire row is bulk-copied with a singlememcpyand the SSE4.1 loop is skipped entirelyASCIICaseSensitiveStringSearcher(SSE4.1, prebuilt once outside the row loop) handles full needle matching for rows that pass the pre-filterStringOP::push_value_string/push_empty_stringfor consistent offset managementreplaceandreplace_empty— whenold_stris non-empty these two functions behave identicallyBenchmark results (Release build, 192-core machine)
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)