Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -459,6 +459,23 @@ trait CheckAnalysis extends LookupCatalog with QueryErrorsBase with PlanToString
messageParameters = Map("name" -> "IDENTIFIER", "expr" -> p.identifierExpr.sql)
)

case c: CacheTableAsSelect if c.tempViewName.resolved =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if c.tempViewName.resolved guard was added in commit 90f09a9 specifically to keep this invariant check from pre-empting the proper UNRESOLVED_COLUMN user-facing error when IDENTIFIER(<unresolved-col>) itself fails to resolve. The comment documents the intent well, but there's no test that pins this behavior — a future change that drops the guard (e.g. case c: CacheTableAsSelect => without the guard) would silently swap the user-facing error for a SparkException internal error, with nothing in CI to catch it.

Worth adding a regression test alongside the SPARK-46625 cache-table tests in ParametersSuite. Rough shape:

test("SPARK-46625: CACHE TABLE IDENTIFIER(<unresolved-col>) reports UNRESOLVED_COLUMN, not an internal error") {
  val ex = intercept[AnalysisException] {
    spark.sql(
      """CACHE TABLE IDENTIFIER(unresolved_col) AS SELECT 1 AS a""".stripMargin)
  }
  assert(ex.getCondition.startsWith("UNRESOLVED_COLUMN"))
  // and explicitly NOT the internal-error condition the invariant case throws
  assert(!ex.getMessage.contains("CacheTableAsSelect.tempViewName must be"))
}

That pins both halves of the design — the guard fires for unresolved IDENTIFIERs (so the catch-all LogicalPlan case below produces UNRESOLVED_COLUMN), AND the case below it stays the catch-all, not pre-empted by this branch.

// The parser builds `tempViewName` as either a `Literal[StringType]` (for direct
// identifiers and `IDENTIFIER('literal')`) or an `ExpressionWithUnresolvedIdentifier`
// that resolves to such a Literal. Validate the post-analysis shape so any future
// construction path that violates the invariant fails loudly here, not deep inside
// execution via `tempViewNameString`. The `resolved` guard ensures that when the
// IDENTIFIER expression itself failed to resolve (e.g. `IDENTIFIER(<unresolved-col>)`),
// we fall through to the catch-all `LogicalPlan` case so the user sees the proper
// `UNRESOLVED_COLUMN` error rather than an internal error.
c.tempViewName match {
case Literal(value, _: StringType) if value != null => // OK
case other =>
throw SparkException.internalError(
"CacheTableAsSelect.tempViewName must be a non-null string literal after " +
s"analysis, but got: ${other.sql}")
}

case operator: LogicalPlan =>
operator transformExpressionsDown {
case hof: HigherOrderFunction if hof.arguments.exists {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ package org.apache.spark.sql.catalyst.analysis

import scala.collection.mutable

import org.apache.spark.SparkException
import org.apache.spark.sql.catalyst.expressions.{Expression, SubqueryExpression, VariableReference}
import org.apache.spark.sql.catalyst.plans.logical.{CreateView, LogicalPlan}
import org.apache.spark.sql.catalyst.plans.logical.{CreateView, InsertIntoStatement, LogicalPlan, V2WriteCommand}
import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
import org.apache.spark.sql.catalyst.trees.TreePattern._
import org.apache.spark.sql.errors.QueryCompilationErrors
Expand Down Expand Up @@ -70,6 +71,39 @@ class ResolveIdentifierClause(earlyBatches: Seq[RuleExecutor[LogicalPlan]#Batch]

executor.execute(p.planBuilder.apply(
IdentifierResolution.evalIdentifierExpr(p.identifierExpr), p.children))
// `InsertIntoStatement.table` and `V2WriteCommand.table` are non-child LogicalPlan slots
// (`child = query`), so the standard `resolveOperatorsUp` traversal never visits
// placeholders inside them. Materialize them explicitly. Only `InsertIntoStatement` and
// `OverwriteByExpression` carry a parse-time placeholder today, but matching the
// `V2WriteCommand` trait keeps the rule consistent across the family.
case i: InsertIntoStatement if i.table.isInstanceOf[PlanWithUnresolvedIdentifier] =>
val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
if (p.identifierExpr.resolved && p.childrenResolved) {
if (referredTempVars.isDefined) {
referredTempVars.get ++= collectTemporaryVariablesInLogicalPlan(p)
}
i.copy(table = executor.execute(p.planBuilder.apply(
IdentifierResolution.evalIdentifierExpr(p.identifierExpr), p.children)))
} else {
i
}
case w: V2WriteCommand if w.table.isInstanceOf[PlanWithUnresolvedIdentifier] =>
val p = w.table.asInstanceOf[PlanWithUnresolvedIdentifier]
if (p.identifierExpr.resolved && p.childrenResolved) {
if (referredTempVars.isDefined) {
referredTempVars.get ++= collectTemporaryVariablesInLogicalPlan(p)
}
executor.execute(p.planBuilder.apply(
IdentifierResolution.evalIdentifierExpr(p.identifierExpr), p.children)) match {
case nr: NamedRelation => w.withNewTable(nr)
case other =>
throw SparkException.internalError(
"PlanWithUnresolvedIdentifier in V2WriteCommand.table must materialize " +
s"into a NamedRelation, but got: ${other.getClass.getName}")
}
} else {
w
}
case other =>
other.transformExpressionsWithPruning(_.containsAnyPattern(UNRESOLVED_IDENTIFIER)) {
case e: ExpressionWithUnresolvedIdentifier if e.identifierExpr.resolved =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ package org.apache.spark.sql.catalyst.analysis

import org.apache.spark.SparkException
import org.apache.spark.sql.catalyst.expressions.{Expression, LeafExpression, SubqueryExpression, Unevaluable}
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SupervisingCommand}
import org.apache.spark.sql.catalyst.plans.logical.{InsertIntoStatement, LogicalPlan, SupervisingCommand, V2WriteCommand}
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.catalyst.trees.TreePattern.{COMMAND, PARAMETER, PARAMETERIZED_QUERY, TreePattern, UNRESOLVED_WITH}
import org.apache.spark.sql.errors.{QueryCompilationErrors, QueryErrorsBase}
Expand Down Expand Up @@ -179,9 +179,30 @@ object BindParameters extends Rule[LogicalPlan] with QueryErrorsBase {
p0.resolveOperatorsDownWithPruning(_.containsPattern(PARAMETER) && !stop) {
case p1 =>
stop = p1.isInstanceOf[ParameterizedQuery]
p1.transformExpressionsWithPruning(_.containsPattern(PARAMETER)) (f orElse {
case sub: SubqueryExpression => sub.withNewPlan(bind(sub.plan)(f))
})
// `InsertIntoStatement.table` and `V2WriteCommand.table` are non-child LogicalPlan
// slots, so the standard `resolveOperatorsDown` traversal never visits parameter
// markers inside them. Recurse explicitly so `INSERT ... IDENTIFIER(:p)` and
// `INSERT INTO IDENTIFIER(:p) REPLACE WHERE ...` resolve under the legacy
// parameter-substitution mode (SPARK-46625). Today only the `OverwriteByExpression`
// variant of `V2WriteCommand` is parser-built with a placeholder in `table`; the trait
// match keeps the rule consistent for any future analyzer-built node in the same shape.
val withBoundTable = p1 match {
case i: InsertIntoStatement if i.table.containsPattern(PARAMETER) =>
i.copy(table = bind(i.table)(f))
case w: V2WriteCommand if w.table.containsPattern(PARAMETER) =>
bind(w.table)(f) match {
case nr: NamedRelation => w.withNewTable(nr)
case other =>
throw SparkException.internalError(
"Parameter binding on V2WriteCommand.table must preserve " +
s"NamedRelation, but got: ${other.getClass.getName}")
}
case other => other
}
withBoundTable.transformExpressionsWithPruning(_.containsPattern(PARAMETER)) (
f orElse {
case sub: SubqueryExpression => sub.withNewPlan(bind(sub.plan)(f))
})
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,35 @@ trait UnresolvedUnaryNode extends UnaryNode with UnresolvedNode
/**
* A logical plan placeholder that holds the identifier clause string expression. It will be
* replaced by the actual logical plan with the evaluated identifier string.
*
* Extends `NamedRelation` so it can occupy a `NamedRelation`-typed slot (e.g.
* `OverwriteByExpression.table`) directly at parse time, instead of wrapping the whole command.
*
* The parser always places this node inside the command's identifier slot (a child slot for
* DELETE/UPDATE/MERGE/CTAS/RTAS, or a non-child slot for `InsertIntoStatement.table` and
* `OverwriteByExpression.table` -- handled via explicit cases in `ResolveIdentifierClause` and
* `BindParameters`). It is never the substitution root of a `WITH ... <command>` subtree, so
* `CTEInChildren` semantics are not needed: any surrounding `WithCTE` produced by
* `CTESubstitution` targets the inner command directly.
*/
case class PlanWithUnresolvedIdentifier(
identifierExpr: Expression,
children: Seq[LogicalPlan],
planBuilder: (Seq[String], Seq[LogicalPlan]) => LogicalPlan)
extends UnresolvedNode {
extends UnresolvedNode with NamedRelation {

def this(identifierExpr: Expression, planBuilder: Seq[String] => LogicalPlan) = {
this(identifierExpr, Nil, (ident, _) => planBuilder(ident))
}

final override val nodePatterns: Seq[TreePattern] = Seq(PLAN_WITH_UNRESOLVED_IDENTIFIER)

// Placeholder name used by error paths that render `NamedRelation.name` for an unresolved
// table reference -- e.g. `SparkStrategies.extractTableNameForError` and the `r: NamedRelation`
// fallback in `QueryCompilationErrors`. Renders as the SQL text of the identifier expression
// (e.g. `IDENTIFIER(:p)` or `concat('a', 'b')`) so error messages remain informative.
override def name: String = identifierExpr.sql
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PlanWithUnresolvedIdentifier is now reachable as a NamedRelation via two error paths I traced: SparkStrategies.extractTableNameForError (sql/core/.../SparkStrategies.scala:1135) and the r: NamedRelation => toSQLId(r.name) fallback in QueryCompilationErrors.scala:3635. Both will render the SQL text of the unresolved expression (e.g. IDENTIFIER(:p) or concat('a', 'b')) as the "table name" in error messages. That's a reasonable fallback, but worth a one-line comment on this override noting that identifierExpr.sql is intentionally an unresolved-placeholder stand-in for those error paths.


override protected def withNewChildrenInternal(
newChildren: IndexedSeq[LogicalPlan]): LogicalPlan =
copy(identifierExpr, newChildren, planBuilder)
Expand Down
Loading