Skip to content

[fix](user_var)fix integer typing and prefer Variable.realExpression for argument/type resolution#62524

Draft
starocean999 wants to merge 5 commits intoapache:masterfrom
starocean999:master_0415
Draft

[fix](user_var)fix integer typing and prefer Variable.realExpression for argument/type resolution#62524
starocean999 wants to merge 5 commits intoapache:masterfrom
starocean999:master_0415

Conversation

@starocean999
Copy link
Copy Markdown
Contributor

@starocean999 starocean999 commented Apr 15, 2026

PR fixes two bugs:
(1) user/session variable integer typing was wrong when converted to Nereids literals;
(2) Ensure ExpressionTrait resolves argument expressions and types from a Variable's real expression when present, and falls back to the Variable itself when not. This prevents generating incorrect function signatures using Variable placeholders.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking findings:

  1. fe/fe-core/src/main/java/org/apache/doris/qe/ConnectContext.java: getLiteralForUserVar() still mishandles LARGEINT. SetUserDefinedVarOp.run() stores user variables via value.toLegacyLiteral(), and a Nereids LargeIntLiteral becomes org.apache.doris.analysis.LargeIntLiteral, not IntLiteral. The new case LARGEINT under literalExpr instanceof IntLiteral is therefore unreachable. A real user variable like SET @v = 9223372036854775808 still falls through to the final else and is converted to a string literal, so the advertised integer-typing fix is incomplete.
  2. fe/fe-core/src/test/java/org/apache/doris/nereids/rules/analysis/UserVariableAnalysisTest.java: testUserVarIntegerTypeBig() only covers BIGINT (Long.MAX_VALUE), so the current test suite does not prove the LARGEINT case above.

Critical checkpoint conclusions:

  • Goal of the task: Partially achieved. The early variable-replacement/type-coercion path looks fixed and has unit/regression coverage, but the integer-typing fix is incomplete because LARGEINT user variables are still converted incorrectly.
  • Small/clear/focused: Yes. The PR is narrowly scoped to FE user-variable analysis and tests.
  • Concurrency: Not applicable in the touched paths. No new shared-state or locking changes were introduced.
  • Lifecycle/static initialization: Not applicable.
  • Configuration changes: None.
  • Incompatible protocol/storage changes: None.
  • Parallel code paths: Checked the analysis, constant-folding, and SQL-cache variable paths. No other missing mirror path was found beyond the remaining LARGEINT legacy-literal conversion gap.
  • Special conditional checks: No blocking issue here.
  • Test coverage: Insufficient for the LARGEINT path; the new tests only prove INT/BIGINT typing and the early replacement fix.
  • Modified test results: The added regression output is plausible for the timing/coercion fix. No obvious .out mismatch was found from inspection.
  • Observability: Not needed for this change.
  • Transaction/persistence: Not applicable.
  • Data writes/modifications: Not applicable.
  • FE-BE transmitted variables: Not applicable.
  • Performance: No material concern in the touched code.
  • Other issues: None beyond the blocking correctness gap and its missing test coverage.

import java.math.BigInteger;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Collections;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: this LARGEINT arm is unreachable in the actual user-variable flow. SetUserDefinedVarOp.run() stores values via value.toLegacyLiteral(), and a Nereids LargeIntLiteral becomes org.apache.doris.analysis.LargeIntLiteral, not IntLiteral. That means SET @v = 9223372036854775808 still falls through to the final else here and is converted to a string literal instead of a Nereids LargeIntLiteral. Please handle legacy LargeIntLiteral explicitly (or reuse Literal.fromLegacyLiteral(literalExpr, literalExpr.getType()) for the numeric branches).`

}

@Test
public void testUserVarIntegerTypeBig() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long.MAX_VALUE is still a legacy IntLiteral with BIGINT type, so this test never exercises the newly added LARGEINT path. Please add a case using a real legacy org.apache.doris.analysis.LargeIntLiteral (for example "9223372036854775808"), otherwise the remaining bug in getLiteralForUserVar() will not be caught.

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking regression found.

  1. fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/ExpressionAnalyzer.java: visitUnboundVariable() now returns variable.getRealExpression() during analysis, so analyzed trees no longer contain Variable nodes at all. That breaks at least one existing semantic check: generated-column validation explicitly rejects variables only after analysis in CreateTableInfo.checkExpressionInGeneratedColumn() (fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/CreateTableInfo.java:1190-1195,1256-1261). With this patch, a generated column like c AS (@x + 1) is analyzed into a literal expression before that check runs, so the prohibition is silently bypassed. This is a real behavioral regression, not just a missed test, because the old flow kept Variable until the later VariableToLiteral rule.

Critical checkpoints:

  • Goal / correctness: Partially accomplished. The PR addresses user-variable typing and early type coercion, but it also introduces a generated-column correctness regression, so the overall goal is not safely met. Existing tests do not cover this path.
  • Scope / minimality: Not minimal enough. The fix changes the general analysis-time semantics of UnboundVariable, which affects unrelated callers beyond the reported bug.
  • Concurrency: No meaningful new concurrency concerns in the touched code.
  • Lifecycle: No special lifecycle/static-init issues identified.
  • Config: No config changes.
  • Compatibility: No FE/BE protocol or storage compatibility changes identified.
  • Parallel paths: Yes, there was already a parallel VariableToLiteral path; changing the earlier analysis path without revalidating all downstream Variable consumers caused the regression above.
  • Special conditions/comments: The added defensive null-check is harmless, but the behavior change needs a stronger invariant explanation because it affects analysis semantics globally.
  • Test coverage: Insufficient. Added FE/regression tests cover the motivating bug, but there is no negative test for contexts that must still reject variables after analysis, such as generated columns.
  • Test results: New regression output looks consistent for the added query.
  • Observability: No new observability needs for this small FE logic change.
  • Transaction/persistence: Not applicable.
  • Data writes/modifications: The generated-column regression can persist invalid metadata expressions, so this area is affected indirectly and needs correction before merge.
  • FE/BE variable passing: Not applicable.
  • Performance: No meaningful performance concern from the intended fix itself.
  • Other issues: None blocking beyond the semantic regression above.

Please keep Variable nodes available until all semantic validations that rely on them have run, or move/duplicate those validations to the parsed-expression phase before replacing variables with literals, and add a regression/unit test for generated-column rejection.

if (sqlCacheContext.isPresent()) {
sqlCacheContext.get().addUsedVariable(variable);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes UnboundVariable analysis from producing a Variable node to producing its literal value immediately. That fixes the early type-coercion problem, but it also removes Variable from every analyzed tree, which breaks downstream semantic checks that intentionally inspect analyzed expressions for Variable. One concrete case is generated columns: CreateTableInfo analyzes the expression first and only then rejects instanceof Variable (checkExpressionInGeneratedColumn()), so after this change a generated column like c AS (@x + 1) can slip through as a plain literal expression instead of being rejected. The old flow kept the Variable until the later VariableToLiteral rule, so that check still worked. Please either preserve Variable until those validations finish, or move the generated-column variable rejection earlier and add coverage for it.

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed commit 97915f88ece394e9b2e95be119015e1c2a3c6fad.

Blocking issues found.

Critical checkpoint conclusions:

  • Goal / correctness: Partially met. The BIGINT/LARGEINT window_funnel case is covered, but the PR still leaves small-width user-variable typing broken and introduces an analyzer regression by erasing Variable nodes too early.
  • Small / focused change: Not fully. The SQL-cache bookkeeping change in ExpressionAnalyzer also changes the analyzed tree shape, which is broader than needed.
  • Concurrency: No concurrency concerns in the touched code.
  • Lifecycle / initialization: No special lifecycle or static-init issues.
  • Config: No new configuration.
  • Compatibility: No FE-BE/protocol/storage format change, but generated-column metadata correctness regresses because forbidden variables can now be folded into persisted expressions.
  • Parallel paths: VariableToLiteral already performs variable-to-literal replacement later in analysis; bypassing it here skips later checks that still expect Variable nodes.
  • Conditional checks: The added defensive null-check is harmless, but it is attached to the broader semantic regression above.
  • Test coverage: Insufficient for the changed behavior. The added tests cover the WINDOW_FUNNEL bigint case and SQL-cache recording, but they miss a negative generated-column case and an exact small-int case such as CURTIME(@p).
  • Test result updates: The updated regression output looks consistent with the covered case.
  • Observability: No additional observability is needed.
  • Transaction / persistence / data writes: A generated column created from a variable expression would persist the wrong analyzed expression, so this is a metadata correctness issue.
  • Performance: No material concerns.

Please address the inline comments before approving.

sqlCacheContext.get().addUsedVariable(variable);
}
}
return variable.getRealExpression();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the literal here changes the analyzed tree too early. VariableToLiteral already replaces variables later in analysis, but callers like CreateTableInfo.generatedColumnCheck() analyze first and only then reject expr instanceof Variable (fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/CreateTableInfo.java:1187-1261). With this change, a generated-column expression containing @x is accepted and the current session value gets baked into the persisted expression instead of being rejected.

Suggested change
return variable.getRealExpression();
return variable;

import java.math.BigInteger;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Collections;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still widens TINYINT/SMALLINT user variables to INTEGER, so exact-width signatures remain broken. SET @p = 3 stores a TINYINT via LogicalPlanBuilder.visitIntegerLiteral() and TinyIntLiteral.toLegacyLiteral(), but this branch still returns Literal.of((int) ...), so SELECT CURTIME(@p) is analyzed as current_time(INT) and fails against the existing current_time(TINYINT) signature in CurrentTime. If the goal is to preserve the original integer type, the TINYINT and SMALLINT cases need explicit byte/short handling too.

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found blocking issues in both the implementation and the newly added FE unit tests.

  1. The Variable->literal timing fix is still incomplete. ExpressionTrait now unwraps Variable for callers that go through getArguments*(), but BindExpression still binds functions before the later VariableToLiteral analyzer pass runs. Functions that validate raw child() nodes still hit the old timing bug. A concrete reachable example is SET @p = 3; SELECT utc_time(@p) (also utc_timestamp(@p) and char(@cs, 65)): ExpressionAnalyzer.visitUnboundFunction() calls TypeCoercionUtils.processBoundFunction() immediately, and those implementations still check child(0).isLiteral() / instanceof StringLikeLiteral, so they reject the Variable wrapper before the new helper is ever consulted.
  2. UserVariableAnalysisTest.testUserVarIntegerTypeSmall() asserts the wrong type. new IntLiteral(1L) is a legacy TINYINT, and the new ConnectContext logic now preserves that width, so this assertion expects the pre-fix behavior and will fail once FE UTs are run.
  3. UserVariableAnalysisTest.testVariableRecordedAndTypeCoercion() no longer matches the implementation. The patch removed SQL-cache recording from ExpressionAnalyzer.visitUnboundVariable(); recording now happens in ReplaceVariableByLiteral. This test only calls ExpressionAnalyzer.analyze(...), so usedVariables stays empty and the assertion will fail.

Critical checkpoints

  • Goal of current task: Partially achieved. Integer width preservation and the window_funnel regression path are covered, but the general Variable->literal timing issue is not fully fixed and the added FE UTs do not currently prove the intended behavior.
  • Modification size/focus: Mostly contained, but the cross-cutting ExpressionTrait change still leaves raw-child parallel paths untouched.
  • Concurrency: No concurrency, locking, or lifecycle concerns in the touched code.
  • Configuration / compatibility: No new config, storage-format, or FE-BE compatibility concerns were introduced here.
  • Parallel code paths: Not fully covered; functions that use raw child() / isLiteral() checks bypass the new helpers.
  • Special conditional checks: The integer-width switch itself is straightforward, but the remaining typed/user-variable paths are still uneven.
  • Test coverage: Regression coverage was added for window_funnel, but the new FE unit tests currently contain invalid expectations and do not cover the remaining raw-child function paths.
  • Test result modifications: regression-test/data/query_p0/set/test_user_var.out looks consistent with the added single-row aggregate case.
  • Observability / transaction / persistence / data-write concerns: Not applicable in this PR.

Environment note: I could not execute FE UTs in this runner because thirdparty/installed/bin/protoc is missing, and run-fe-ut.sh stops during generated-source setup.

return children();
ImmutableList.Builder<Expression> arguments = ImmutableList.builder();
for (Expression arg : children()) {
if (arg instanceof Variable && ((Variable) arg).getRealExpression() != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only fixes callers that go through getArguments*(), but BindExpression still binds functions before the later VariableToLiteral analyzer pass runs. That means any function that validates raw child() nodes still hits the same timing bug. A concrete reachable example is SET @p = 3; SELECT utc_time(@p) (also utc_timestamp(@p) and char(@cs, 65)): ExpressionAnalyzer.visitUnboundFunction() immediately calls TypeCoercionUtils.processBoundFunction(), and those implementations still check child(0).isLiteral() / instanceof StringLikeLiteral, so they reject the Variable wrapper before this helper is ever consulted. So this fixes helper-based callers like window_funnel, but not the general Variable->literal timing issue described in the PR.

// get nereids literal via ConnectContext helper
Literal l = ConnectContext.get().getLiteralForUserVar("a");
Assertions.assertNotNull(l);
Assertions.assertEquals(IntegerType.INSTANCE, l.getDataType());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new IntLiteral(1L) is a legacy TINYINT, not an INT (IntLiteral.init() picks the smallest fitting width). With the new ConnectContext logic this now becomes a TinyIntLiteral, so this assertion is expecting the pre-fix behavior and will fail once FE UTs are run. If the intent is to verify INT, construct the legacy literal with Type.INT; otherwise assert TinyIntType.

Assertions.assertNotNull(analyzed);

// sql cache context should have recorded the used variable
List<Variable> used = cascadesContext.getStatementContext().getSqlCacheContext().get().getUsedVariables();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test no longer matches the final implementation. The patch removed SQL-cache recording from ExpressionAnalyzer.visitUnboundVariable() and only ReplaceVariableByLiteral calls sqlCacheContext.addUsedVariable(...). ExpressionAnalyzer.analyze(cmp, rewriteContext) never runs that rewrite, so usedVariables stays empty here. To test SQL-cache recording you need to run the analyzer job / VariableToLiteral pass, or explicitly rewrite analyzed with ReplaceVariableByLiteral before asserting.

@starocean999 starocean999 changed the title [fix](user_var)fix integer typing and prevent Variable->literal timing issue in expression analysis [fix](user_var)fix integer typing and prefer Variable.realExpression for argument/type resolution Apr 17, 2026
@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings:

  1. ConnectContext.getLiteralForUserVar() now depends on the stored legacy literal keeping its original primitive type, but the forwarded FE path still drops that information. FEOpExecutor.getForwardUserVariables() serializes an IntLiteral(Type.BIGINT/INT/SMALLINT/...) as INT_LITERAL with msg.type set, while ConnectProcessor.getLiteralExprFromThrift() reconstructs every INT_LITERAL as new IntLiteral(value) and ignores node.type. A query forwarded through an observer/follower can therefore still turn SET @v = CAST(1 AS BIGINT) into a TINYINT user variable before this new code runs, so the integer-typing bug is only fixed on the local FE path.
  2. The new ExpressionTrait behavior only helps call sites that already go through getArgument*() / getArguments*(). A lot of builtins still inspect raw child() nodes during checkLegalityBeforeTypeCoercion() / computeSignature(), and those checks run before the later VariableToLiteral analyzer rule. For example, SELECT UTC_TIMESTAMP(@p) still fails: visitUnboundVariable() produces Variable(realExpression=literal), visitUnboundFunction() builds UtcTimestamp(Variable), then TypeCoercionUtils.processBoundFunction() calls UtcTimestamp.checkLegalityBeforeTypeCoercion(), which rejects !child(0).isLiteral(). So the PR fixes the window_funnel-style helper path but leaves the same user-variable bug class in other functions.

Critical Checkpoints:

  • Goal of the task: partially met. The PR fixes local user-variable integer typing and helper-based argument/type resolution, and the added tests prove the window_funnel path, but it does not cover forwarded FE execution or functions that still validate raw child() nodes.
  • Scope/minimality: ConnectContext is focused; the ExpressionTrait change is broad and changes semantics for every function helper even though the remaining failures still come from bind-time raw-child checks.
  • Concurrency: no new concurrency-sensitive logic here.
  • Lifecycle/static initialization: not applicable.
  • Configuration: none.
  • Compatibility / parallel paths: not complete. The forwarded FE path for user variables and the raw-child() function-validation path are parallel code paths that remain inconsistent with the new behavior.
  • Special conditional checks: the new integer-type switch is understandable, but it assumes upstream type preservation that is not true on the forwarded path.
  • Test coverage: insufficient for the changed behavior. Local unit/regression coverage exists, but there is no coverage for FE forwarding or other builtin functions that still see Variable wrappers directly.
  • Test result changes: the regression output looks correct for the covered case.
  • Observability: no new observability needs.
  • Transaction/persistence/data-write concerns: not applicable.
  • FE-BE / FE-FE variable passing: incomplete; FE-to-FE forwarding still loses integer subtype information.
  • Performance: no blocking issue found, but the global ExpressionTrait rewrite adds work to frequently used helpers.
  • Other issues: none beyond the blockers above.

Overall opinion: request changes until the forwarded FE path preserves the original integer subtype and the remaining bind-time raw-child validation paths are handled consistently for user variables.

} else if (literalExpr instanceof IntLiteral) {
} else if (literalExpr instanceof IntLiteral || literalExpr instanceof LargeIntLiteral) {
// the value in the IntLiteral should be int, but now is long in old planner literalExpr
// so type coercion to generate right new planner int Literal
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix still misses the forwarded-FE path. getLiteralForUserVar() now relies on the stored legacy literal keeping its original primitive type, but FEOpExecutor.getForwardUserVariables() serializes all of TINYINT/SMALLINT/INT/BIGINT as INT_LITERAL, and ConnectProcessor.getLiteralExprFromThrift() reconstructs that as new IntLiteral(value) while ignoring node.type. So SET @v = CAST(1 AS BIGINT) executed through an observer/follower still degrades to a TINYINT before this method runs. ExprToThriftVisitor already sends msg.type, so the real fix needs to preserve that type when rebuilding the legacy IntLiteral; otherwise the integer-typing bug remains on forwarded queries.

* getArguments.
*/
default List<Expression> getArguments() {
return children();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwrapping Variable here only fixes callers that already use getArgument*() / getArguments*(). A lot of function legality/signature code still reads raw child() nodes before the later VariableToLiteral analyzer rule runs, so the same user-variable bug is still present there. Concrete example: SELECT UTC_TIMESTAMP(@p) with @p = 3 still fails, because visitUnboundVariable() returns Variable(realExpression=literal), visitUnboundFunction() builds UtcTimestamp(Variable), and then TypeCoercionUtils.processBoundFunction() immediately calls UtcTimestamp.checkLegalityBeforeTypeCoercion(), which rejects !child(0).isLiteral(). The same pattern exists in utc_time, uniform, width_bucket, etc. I think this needs to be fixed earlier in the bind/type-check path, or those raw-child checks need to be updated as well; otherwise this patch only fixes the window_funnel-style subset of functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants