[refine](column) replace field-based nested default with column-level operation in ColumnVariant by Mryange · Pull Request #62547 · apache/doris

Mryange · 2026-04-16T03:52:15Z

What problem does this PR solve?

Issue Number: N/A

Problem Summary:

ColumnVariant::try_insert_default_from_nested was filling missing nested subcolumn rows by
converting the last row of a sibling leaf column to a Field, recursively walking the Field
tree via FieldVisitorReplaceScalars to replace scalar values with the type default, and then
inserting the resulting Field back into the column.

This approach was unnecessarily expensive (column → Field → Field tree traversal → column) and
inconsistent with the batch counterpart try_insert_many_defaults_from_nested, which operates
entirely at the column level using cut + clone_with_default_values.

The scalar visitor (FieldVisitorReplaceScalars) existed as duplicate dead code in both
column_variant.cpp and variant_util.cpp.

Changes

try_insert_default_from_nested: rewrite to use the same column-level approach as
try_insert_many_defaults_from_nested — leaf->data.cut(leaf_size - 1, 1).clone_with_default_values(field_info) — eliminating the Field round-trip and the dependency on get_default() / FieldVisitorReplaceScalars.
Remove FieldVisitorReplaceScalars from both column_variant.cpp and variant_util.cpp; neither file has any remaining callers.
Remove field_visitor unit test in column_variant_test.cpp that directly tested the now-deleted visitor class.

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

Mryange · 2026-04-16T03:52:25Z

/review

github-actions · 2026-04-16T03:53:49Z

OpenCode automated review failed and did not complete.

Error: Review step was skipped (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24491092035

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

Mryange · 2026-04-16T04:15:07Z

/review

github-actions

I found a blocking behavioral regression in ColumnVariant::try_insert_default_from_nested.

Critical checkpoints:

Goal / correctness: The PR aims to refactor nested default synthesis to a column-level implementation. It does not fully preserve the previous behavior: missing nested siblings now use a synthesized row even when the donor sibling row is NULL, which changes the inserted value from the old insert_default() fallback to a copied/synthesized nested row. No test in this PR proves the null-row case still behaves correctly.
Change size / focus: The code deletion is small and focused, but it also removed the only direct coverage of the old transformation logic without replacing it with assertions for the changed path.
Concurrency: I did not find new concurrency or lock-order issues in the touched code.
Lifecycle / config / compatibility: No lifecycle, config, or wire/storage compatibility changes here.
Parallel paths: The batch helper try_insert_many_defaults_from_nested() already used the column-level clone path, but the single-row helper previously had an extra null guard. This PR made the two paths look similar by dropping that guard, which is the source of the regression.
Conditional checks: The removed last_field.is_null() check was behaviorally significant; removing it is not a mechanical cleanup.
Test coverage: Existing tests in column_variant_test.cpp do not assert the actual inserted row shape/nullability for try_insert_default_from_nested, and this PR deletes the visitor-only test without adding a replacement for the null-donor case.
Test results / observability / transactionality / FE-BE propagation: Not applicable for this change.
Performance: The new path is plausibly cheaper, but correctness regressed in the null-donor case.
Other issues: none beyond the above blocker and the missing targeted regression test.

Please restore the old null-row behavior (or add an explicit justification if semantics are intentionally changing) and add a targeted unit/regression test that covers a missing nested sibling when the donor nested leaf row is NULL.

github-actions · 2026-04-16T04:35:20Z

-    if (last_field.is_null()) {
+    size_t leaf_size = leaf->data.size();
+    if (leaf_size == 0) {
        return false;


This changes behavior for nullable nested siblings. Before this patch, try_insert_default_from_nested() called get_last_field() and returned false when the donor leaf row was NULL, so the caller fell back to entry->data.insert_default(). After the refactor we always cut the last donor row and rebuild a default-shaped subcolumn from it, which means a row like items.tags = null can now synthesize a nested sibling value for a missing items.id instead of keeping that sibling NULL too. That is a user-visible semantic change in the JSON/variant parse path, not just a refactor. Please preserve the old null guard here or add an explicit test showing the new behavior is intended.

github-actions · 2026-04-16T04:35:20Z

@@ -3082,30 +3082,6 @@ TEST_F(ColumnVariantTest, get_field_info_all_types) {
    }


The patch removes the only direct test around the deleted transformation logic, but it does not add any replacement coverage for the behavior that changed in try_insert_default_from_nested(). The existing try_insert_default_from_nested test only exercises the call and never asserts the inserted row contents/nullability. Please add a targeted test for the blocking case above: donor nested sibling row is NULL, target sibling path is missing, and we must still fall back to insert_default() rather than materializing a synthesized nested row.

Mryange · 2026-04-16T04:42:00Z

/review

Mryange · 2026-04-16T05:05:21Z

/review

github-actions

Findings

be/src/core/column/column_variant.cpp: try_insert_default_from_nested() now rebuilds the synthetic row through clone_with_default_values(FieldInfo), but the local FieldInfo only carries scalar_type_id and num_dimensions. For metadata-carrying nested types, especially decimal arrays copied through ColumnVariant::try_insert() / insert_from(), that loses the entry type's precision and scale and can fail while materializing the missing sibling row. The old single-row path inserted a typed default Field into the existing column and preserved that metadata.

Critical Checkpoints

Goal / correctness: The PR aims to replace the field round-trip with a column-level operation. It improves the fast path idea, but it does not preserve behavior for metadata-carrying nested types, and the current tests do not prove correctness for the changed path.
Scope / minimality: The change is localized, but it also changes the single-row nested-default semantics by routing it through a helper with different type-metadata behavior.
Concurrency: No new concurrency, locking, or lifecycle issues were introduced in the touched code.
Configuration / compatibility: No config, persistence, or FE/BE protocol compatibility changes are involved here.
Parallel code paths: try_insert_many_defaults_from_nested() already uses the column-level helper; this PR makes the single-row path share the same metadata-loss risk instead of keeping its previous typed-field behavior.
Tests: Coverage is insufficient for this behavior change. The remaining try_insert_default_from_nested test only smoke-calls the helper and does not assert the produced nested default values or typed metadata.
Observability / persistence / transactions: Not applicable for this refactor.
Performance: The intended optimization is reasonable, but correctness needs to be preserved before the single-row path switches implementations.

I did not find additional blocking issues beyond this regression after tracing the insert, parse, finalize, and copy paths around ColumnVariant nested defaults.

github-actions · 2026-04-16T05:29:08Z

    }

-    auto default_scalar = entry->data.get_least_common_type()->get_default();
+    FieldInfo field_info = {


This changes the single-row path from inserting a typed default Field into the existing entry column to rebuilding a temporary subcolumn from FieldInfo. That is not equivalent for metadata-carrying types. Here field_info only keeps scalar_type_id and num_dimensions; precision and scale stay at their defaults. If a missing nested sibling is ARRAY<DECIMAL(10,2)> and we hit this path through ColumnVariant::try_insert() or insert_from(), clone_with_default_values() reaches create_array_of_type(TYPE_DECIMAL..., 0, 0) and throws from create_decimal(). The old implementation used entry->data.get_least_common_type()->get_default() and inserted that typed field directly, so it preserved the decimal metadata. Please preserve the full type metadata here before switching the single-row path over to clone_with_default_values().

github-actions

No blocking issues found in this PR.

Critical checkpoint conclusions:

Goal / correctness: The change does achieve its stated goal of removing the per-row column -> Field -> visitor -> column round-trip and aligning try_insert_default_from_nested() with the existing column-level batch path in try_insert_many_defaults_from_nested().
Change size / focus: The modification is small and focused to the nested-default helper plus dead-code/test cleanup.
Concurrency: Not applicable. This path is synchronous column manipulation with no new shared-state or lock behavior.
Lifecycle / static init: Not applicable.
Config / compatibility / FE-BE protocol: Not applicable. No new config, protocol, or persisted format changes.
Parallel code paths: The main parallel path here is the batch helper; this PR improves consistency by making the single-row helper use the same column-level approach.
Conditional checks: The previous null-row guard is preserved with is_null_at(leaf_size - 1).
Tests: Existing unit coverage does touch try_insert_default_from_nested() and clone_with_default_values(), but the dedicated assertions around exact defaulted nested shapes are still fairly weak, so mixed-dimension and null-donor cases remain the main residual testing gap.
Observability / persistence / transactionality: Not applicable.
Performance: Likely improved, since the new path stays at the column level and removes unnecessary Field materialization and visitor traversal.

Overall summary: no critical blocking issues identified during review; residual risk is limited to test depth rather than an obvious correctness problem in the implementation.

Mryange · 2026-04-16T06:48:53Z

run buildall

第二个通过

hello-stephen · 2026-04-16T08:38:32Z

BE UT Coverage Report

Increment line coverage 0.00% (0/11) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.12% (20189/38005)
Line Coverage	36.68% (190091/518272)
Region Coverage	32.90% (147482/448210)
Branch Coverage	34.05% (64597/189694)

hello-stephen · 2026-04-16T08:56:20Z

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/11) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.73% (27439/37214)
Line Coverage	57.35% (296323/516654)
Region Coverage	54.35% (245839/452320)
Branch Coverage	56.13% (106788/190262)

Mryange · 2026-04-16T09:23:42Z

run beut

Mryange · 2026-04-16T10:38:20Z

run buildall

hello-stephen · 2026-04-16T10:42:44Z

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/11) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.73% (27439/37214)
Line Coverage	57.35% (296290/516654)
Region Coverage	54.35% (245827/452320)
Branch Coverage	56.12% (106782/190262)

Mryange · 2026-04-16T11:25:18Z

run buildall

hello-stephen · 2026-04-16T11:30:11Z

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/11) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.72% (27434/37214)
Line Coverage	57.33% (296183/516654)
Region Coverage	54.34% (245785/452320)
Branch Coverage	56.09% (106719/190262)

hello-stephen · 2026-04-16T12:39:46Z

BE UT Coverage Report

Increment line coverage 0.00% (0/11) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.12% (20189/38005)
Line Coverage	36.68% (190122/518272)
Region Coverage	32.96% (147729/448210)
Branch Coverage	34.07% (64633/189694)

hello-stephen · 2026-04-16T13:41:40Z

BE Regression && UT Coverage Report

Increment line coverage 90.91% (10/11) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.93% (26769/37214)
Line Coverage	54.94% (283866/516654)
Region Coverage	52.01% (235273/452320)
Branch Coverage	53.44% (101676/190262)

Mryange · 2026-04-16T15:09:24Z

/review

github-actions bot requested changes Apr 16, 2026

View reviewed changes

github-actions bot previously requested changes Apr 16, 2026

View reviewed changes

github-actions bot reviewed Apr 16, 2026

View reviewed changes

Mryange added 2 commits April 16, 2026 19:25

upd

bdc6f00

for review

1048ac1

Mryange force-pushed the refine-variant-dev-4.16 branch from 2cc10c2 to 1048ac1 Compare April 16, 2026 11:25

		@@ -3082,30 +3082,6 @@ TEST_F(ColumnVariantTest, get_field_info_all_types) {
		}

Conversation

Mryange commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Changes

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 16, 2026

BE Regression && UT Coverage Report

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

BE Regression && UT Coverage Report

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 16, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 16, 2026

BE Regression && UT Coverage Report

Uh oh!

Mryange commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mryange commented Apr 16, 2026 •

edited

Loading