[SPARK-56939][SQL] Resolve deadlock between USE and function lookup#55977
[SPARK-56939][SQL] Resolve deadlock between USE and function lookup#55977srielau wants to merge 2 commits into
Conversation
### What changes were proposed in this pull request? Break the SessionCatalog/CatalogManager lock-order inversion that can deadlock concurrent `USE SCHEMA` / `USE CATALOG` and unqualified function resolution on the same SparkSession. - `CatalogManager.setCurrentNamespace` / `setCurrentCatalog`: snapshot the dispatch decision under the manager's intrinsic lock, run the `v1SessionCatalog` callbacks OUTSIDE the lock, then publish the new state under the lock again. This stops the "manager lock then catalog lock" arm of the cycle. - Add `CatalogManager.sessionFunctionKindsForUnqualifiedResolution()` that snapshots `(currentCatalog, currentNamespace, sessionPath)` in a single critical section. The `v1SessionCatalog.getCurrentDatabase` read needed for the default-namespace fallback is taken BEFORE the manager lock, so the helper never re-introduces the deadlock cycle while still avoiding torn-state observations under racing path updates. - Route `SessionCatalog.sessionFunctionKindsInResolutionOrder` and `FunctionResolution.isSessionBeforeBuiltinInPath` through that single helper, so the lookup loop and the `session-before-builtin` predicate share one consistent snapshot. - Tighten the doc comments on the affected methods to document the locking contract and prevent future regressions. ### Why are the changes needed? After SPARK-56750 wired `CatalogManager` into `SessionCatalog` as the live source for path-driven session function kinds, two paths form a lock-order inversion: - Arm 1 (`SessionCatalog.synchronized` -> `CatalogManager.synchronized`): unqualified function resolution evaluating the live PATH reaches into `CatalogManager.sqlResolutionPathEntries` (which synchronizes on the manager) while holding the catalog's intrinsic lock at peer call sites. - Arm 2 (`CatalogManager.synchronized` -> `SessionCatalog.synchronized`): `setCurrentNamespace` / `setCurrentCatalog` hold the manager's lock and then call back into `v1SessionCatalog.setCurrentDatabase*`, which synchronizes on `SessionCatalog`. Two threads sharing a SparkSession -- one running any SQL with an unqualified function reference, the other running `USE SCHEMA` / `USE CATALOG` -- can wedge on each other's intrinsic locks. The hazard is independent of `spark.sql.functionResolution.sessionOrder`: Arm 1 still acquires the manager lock just to read what the order is, and Arm 2 has nothing to do with order at all. ### Does this PR introduce _any_ user-facing change? No. This is a concurrency fix; serial behavior is unchanged. The only observable difference under contention is that the session no longer deadlocks. ### How was this patch tested? - New regression test in `SetPathSuite`, `SPARK-56939: concurrent USE SCHEMA and unqualified function lookups do not deadlock`, that follows the JIRA's exact repro: one thread alternates `USE SCHEMA s1` / `USE SCHEMA s2`, another runs unqualified `count(*)` queries. Without the fix the threads hang on each other's intrinsic locks; with the fix the test completes within the 30s budget. - `build/sbt 'sql/testOnly *SetPathSuite'` -- 60/60 pass. - `build/sbt 'catalyst/testOnly *SessionCatalogSuite *CatalogManagerSuite *FunctionResolution*'` -- 119/119 pass. - `build/sbt 'sql/testOnly *SQLFunctionSuite *SQLViewSuite'` -- 70/70 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor / Claude Opus 4.7
cloud-fan
left a comment
There was a problem hiding this comment.
Summary
Prior state and problem. Two paths between SessionCatalog and CatalogManager form a lock-order inversion on a shared SparkSession:
- Arm 2:
CatalogManager.setCurrentNamespace/setCurrentCatalogheld the manager's intrinsic lock and then called intov1SessionCatalog.setCurrentDatabase*, which takesSessionCatalog.synchronized. - Arm 1: peer call sites that hold
SessionCatalog.synchronizedreachCatalogManager.sqlResolutionPathEntries, which issynchronizedon the manager.
lookupBuiltinOrTempFunction was already kept un-synchronized (with an explicit comment at SessionCatalog.scala:2574) precisely because Arm 1 was anticipated — but Arm 2 was not. Two threads sharing a session — one running USE SCHEMA, the other any unqualified function reference — could wedge on each other's intrinsic locks, as reproduced in SPARK-56939.
Design approach. Snapshot-then-act split-lock pattern:
- For
setCurrentNamespace/setCurrentCatalog: snapshot the dispatch decision under the CM lock, run the v1 callback outside the lock, then re-acquire the lock to publish. - A new helper
sessionFunctionKindsForUnqualifiedResolutionconsolidates the lookup-side snapshot into a single critical section so the path-driven kinds order (a) doesn't re-enterSessionCatalogunder the manager lock and (b) doesn't observe a torn (catalog, namespace) tuple. It deliberately readsv1.getCurrentDatabasebefore the CM lock so the helper itself doesn't deadlock. - The two callers (
SessionCatalog.sessionFunctionKindsInResolutionOrderandFunctionResolution.isSessionBeforeBuiltinInPath) are routed through the helper so the predicate and the lookup loop share one snapshot.
Key design decisions made by this PR.
setCurrentCatalognow performsv1.setCurrentDatabase(default)before publishing_currentCatalogName(previously after). This trades pre-PR atomicity for slightly more robust error semantics (arequireDbExistsfailure no longer leaves CM with a half-published new catalog name).- The helper always reads
v1.getCurrentDatabaseeven when the value will be unused (when an explicit_currentNamespaceis present). Lifting the read inside the lock would re-introduce the deadlock cycle, as the comment notes.
Implementation sketch. Touches three files: CatalogManager (new helper + split locks on setCurrent*), SessionCatalog (route through the helper, with locking-contract doc), FunctionResolution (route through the helper). One regression test in SetPathSuite.
This is a clean, well-scoped concurrency fix and the new regression test is faithful to the JIRA repro. None of the items below are blockers; the inline comments are tightening, and the general observations are mostly forward-looking guards.
General observations (not anchored to changed lines)
-
Concurrent
USE CATALOG+USE SCHEMAsemantics. The PR description's "serial behavior is unchanged" is accurate, but the split-lock pattern admits new concurrent interleavings that the pre-PR atomic version did not. Two examples worth thinking through:- During
setCurrentCatalog, betweenv1.setCurrentDatabase(default)(L400) and the publish (L401–404), a reader ofcurrentNamespaceobserves(oldCatalog, v1.currentDb=default). If the old catalog is the session catalog, the user's previous namespace is briefly invisible — concurrent SQL analysis could miss relations in the prior namespace. The comment at L398–399 frames the new ordering as strictly better; in fairness it should acknowledge this symmetric torn-state window. - In
setCurrentNamespace, theisSessionsnapshot at L143 can drift ifsetCurrentCatalogswitches to a v2 catalog before the v1 callback runs —v1.setCurrentDatabaseWithNameCheckwould then still stompv1.currentDbeven though the active catalog is no longer session. Long-term state stays consistent (a later switch back to the session catalog resetsv1.currentDbtodefault), but the intermediate observation is novel.
These are arguably acceptable concurrency-mode trade-offs given the deadlock alternative, but the PR description currently addresses only the serial case.
- During
-
CatalogManager.reset()(L414) still holds CM lock acrossv1SessionCatalog.setCurrentDatabase— the same lock-order inversion the PR removes fromsetCurrent*. Tests-only andprivate[sql], so practically harmless, but the asymmetry leaves the contract uneven. Either apply the same split-lock pattern or document why reset is exempt. -
currentPathString(L235) still nests CM → SC. It'ssynchronizedon CM and transitively callsv1SessionCatalog.getCurrentDatabaseviacurrentNamespace. Safe today only because no current code path holds SC and then waits for CM. Worth a one-line doc note that this is the only remaining intentional CM→SC nest, so future changes that introduce SC→CM ordering elsewhere don't unintentionally resurrect the deadlock involvingCURRENT_PATH()readers. -
Stale references in
SessionCatalog. The comments atSessionCatalog.scala:2574–2578and:2731–2734citesetCurrentNamespaceas the deadlock counter-party. Post-PR that exact scenario no longer exists. The non-synchronization is still defensible as a forward guard; one option is to refresh the comments to reflect the new invariant (e.g., "kept non-synchronized so the SPARK-56939 CM↔SC ordering invariant survives") rather than describing the now-removed deadlock.
| * Snapshot the live PATH-derived [[SessionCatalog.SessionFunctionKind]] order used by | ||
| * unqualified function/table-function resolution. | ||
| * | ||
| * The kinds list is computed by reading the current catalog, current namespace, and the |
There was a problem hiding this comment.
Minor: this docstring states the (catalog, namespace, path) triple is observed atomically, but v1CurrentDb (read at L299) is intentionally outside the lock. The kinds extraction only consumes system.builtin / system.session entries, so a torn v1 read can't affect the result — but readers tracing this code may not see why the staleness is harmless. One extra clause acknowledging that v1's contribution to effectiveNs is intentionally racy (and explaining why it doesn't matter for the kinds list) would head off confusion.
There was a problem hiding this comment.
Good catch. Reworked the docstring (commit ecf913a) to spell out the contract: the kinds extraction in systemFunctionKindsFromPath only retains literal system.builtin / system.session entries, so even if v1CurrentDb lags by one USE SCHEMA, a stale CURRENT_SCHEMA expansion can change the path content but never the kinds list that this helper returns. Added that justification + a forward pointer to the deadlock cycle so future readers don't lift the v1 read inside the lock.
| if (needsSwitch) { | ||
| // Reset the current database of v1 `SessionCatalog` when switching current catalog, so that | ||
| // when we switch back to session catalog, the current namespace definitely is ["default"]. | ||
| // Run this BEFORE publishing the new catalog name so that if a reader observes the new |
There was a problem hiding this comment.
This justification only tells half the story. "If a reader observes the new catalog, the v1 state is already consistent with it" is true, but the symmetric case is now allowed: while the new name is not yet published, a concurrent reader sees (oldCatalog, v1.currentDb=default) — a torn state the pre-PR atomic version forbade. If the old catalog was the session catalog (the common case), the user's previous namespace is briefly invisible to that reader. Worth mentioning the trade-off so a future maintainer doesn't read this comment and conclude the new ordering is strictly better.
There was a problem hiding this comment.
Fair point -- the prior comment only told the half-story that the new ordering forbids. Expanded it (ecf913a) to also acknowledge the symmetric window: between v1.setCurrentDatabase(default) and the publish, a concurrent reader of currentNamespace sees (oldCatalog, v1.currentDb = default), which when the old catalog was the session catalog briefly hides the user's previous namespace. Framed both halves as a concurrency trade-off against the deadlock alternative so a future maintainer can't read the comment as "strictly better." Did the same for setCurrentNamespace's drifted isSession snapshot.
| barrier.await() | ||
| var i = 0 | ||
| while (i < budget && errors.isEmpty) { | ||
| sql(if ((i % 2) == 0) "USE SCHEMA spark_56939_s1" else "USE SCHEMA spark_56939_s2") |
There was a problem hiding this comment.
Both setCurrentNamespace and setCurrentCatalog were rewritten with the same split-lock pattern, but only USE SCHEMA is exercised here. A symmetric thread that toggles USE CATALOG between spark_catalog and a registered v2 catalog would close coverage of the setCurrentCatalog arm and guard against asymmetric regressions in one path or the other.
There was a problem hiding this comment.
Added a third thread (ecf913a) that toggles USE spark_56939_testcat <-> USE spark_catalog, exercising the setCurrentCatalog arm symmetrically with the existing USE SCHEMA thread. Registered an InMemoryCatalog under spark_56939_testcat in the test setup. The schema-thread now tolerates NoSuchNamespaceException since the catalog-thread can switch to the v2 testcat (which doesn't have those schemas) -- that's an expected interleaving outcome, not a deadlock symptom. Test completes in ~5s on the default config.
Aside on the syntax: USE CATALOG <name> isn't a real Spark SQL form; the grammar has just USE identifierReference and resolves a single identifier to a catalog if one is registered. So the test uses USE $v2Catalog / USE spark_catalog (parser-accepted) while still describing the operation as "USE CATALOG" in the test name/comments since that is the conceptual operation.
|
|
||
| assert(!useThread.isAlive, | ||
| "USE SCHEMA thread did not finish; lock-order inversion between SessionCatalog and " + | ||
| "CatalogManager likely returned (SPARK-56939).") |
There was a problem hiding this comment.
Minor wording: "likely returned" reads ambiguously (it can be parsed transitively). Suggest "likely regressed" — and the same change at L1045.
| "CatalogManager likely returned (SPARK-56939).") | |
| "CatalogManager likely regressed (SPARK-56939).") |
| "CatalogManager likely returned (SPARK-56939).") | ||
| assert(!lookupThread.isAlive, | ||
| "Lookup thread did not finish; lock-order inversion between SessionCatalog and " + | ||
| "CatalogManager likely returned (SPARK-56939).") |
There was a problem hiding this comment.
Same as above:
| "CatalogManager likely returned (SPARK-56939).") | |
| "CatalogManager likely regressed (SPARK-56939).") |
Per @cloud-fan's review on PR apache#55977: - Clarify the locking contract on `CatalogManager.sessionFunctionKindsForUnqualifiedResolution`: the `v1CurrentDb` read is intentionally racy w.r.t. a concurrent `USE SCHEMA`, and `systemFunctionKindsFromPath` only retains `system.builtin` / `system.session` entries -- so a stale `v1CurrentDb` cannot affect the returned kinds list. Move that justification into the method doc. - Expand the `setCurrentCatalog` and `setCurrentNamespace` comments to acknowledge the new concurrent torn-state windows the split-lock pattern admits (briefly invisible old namespace during a catalog switch; drifted `isSession` snapshot if a peer switches catalogs mid-call). Frame these explicitly as trade-offs against the deadlock alternative rather than unconditional improvements. - Apply the same split-lock pattern to `CatalogManager.reset()` so the locking contract is uniform across every CM mutator that calls back into `v1SessionCatalog`. `reset` is `private[sql]` and test-only today, but keeping the contract symmetric prevents future test helpers from reintroducing the SPARK-56939 cycle. - Document `currentPathString` as the only remaining intentional `CatalogManager -> SessionCatalog` nest in `CatalogManager`. It is safe today because the SPARK-56939 fix removed every reverse ordering; future changes that introduce a new SC->CM ordering must take this nest into account. - Refresh the doc comments on `SessionCatalog.lookupBuiltinOrTempTableFunction` and `lookupFunctionInfo` so they describe the post-SPARK-56939 invariant (catalog lock must never be held while reaching into `CatalogManager` from a function-resolution path) instead of citing the now-removed `setCurrentNamespace` counter-party. - Extend the regression test with a third thread that toggles between the session catalog and a registered v2 catalog, exercising the `setCurrentCatalog` arm symmetrically with the `setCurrentNamespace` arm. The `useSchemaThread` tolerates `NoSuchNamespaceException`: when the catalog thread happens to switch to the v2 testcat first, the v1 schemas don't exist there; that is an expected interleaving outcome, not a deadlock symptom. - Minor wording: "likely returned" -> "likely regressed" in the deadlock assertion messages. ### How was this patch tested? - `build/sbt 'sql/testOnly *SetPathSuite'` -- 60/60 pass (the extended regression test completes in ~5s). - `build/sbt 'catalyst/testOnly *SessionCatalogSuite *CatalogManagerSuite *FunctionResolution*'` -- 119/119 pass. - `build/sbt 'sql/testOnly *SQLFunctionSuite *SQLViewSuite'` -- 70/70 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor / Claude Opus 4.7
|
Thanks for the thorough review @cloud-fan! Pushed a followup commit (ecf913a) that addresses every item, including the three general observations from the review summary that weren't anchored to specific lines:
The regression test was extended with a third All suites green locally: |
What changes were proposed in this pull request?
Break the
SessionCatalog/CatalogManagerlock-order inversion that can deadlock concurrentUSE SCHEMA/USE CATALOGand unqualified function resolution on the sameSparkSession.CatalogManager.setCurrentNamespace/setCurrentCatalog: snapshot the dispatch decision under the manager's intrinsic lock, run thev1SessionCatalogcallbacks outside the lock, then publish the new state under the lock again. This stops the "manager lock then catalog lock" arm of the cycle.CatalogManager.sessionFunctionKindsForUnqualifiedResolution()that snapshots(currentCatalog, currentNamespace, sessionPath)in a single critical section. Thev1SessionCatalog.getCurrentDatabaseread needed for the default-namespace fallback is taken before the manager lock, so the helper never re-introduces the deadlock cycle while still avoiding torn-state observations under racing path updates.SessionCatalog.sessionFunctionKindsInResolutionOrderandFunctionResolution.isSessionBeforeBuiltinInPaththrough that single helper, so the lookup loop and thesession-before-builtinpredicate share one consistent snapshot.Why are the changes needed?
After SPARK-56750 wired
CatalogManagerintoSessionCatalogas the live source for path-driven session function kinds, two paths form a lock-order inversion:SessionCatalog.synchronized->CatalogManager.synchronized): unqualified function resolution evaluating the live PATH reaches intoCatalogManager.sqlResolutionPathEntries(which synchronizes on the manager) while holding the catalog's intrinsic lock at peer call sites.CatalogManager.synchronized->SessionCatalog.synchronized):setCurrentNamespace/setCurrentCataloghold the manager's lock and then call back intov1SessionCatalog.setCurrentDatabase*, which synchronizes onSessionCatalog.Two threads sharing a
SparkSession-- one running any SQL with an unqualified function reference, the other runningUSE SCHEMA/USE CATALOG-- can wedge on each other's intrinsic locks. The hazard is independent ofspark.sql.functionResolution.sessionOrder: Arm 1 still acquires the manager lock just to read what the order is, and Arm 2 has nothing to do with order at all. See SPARK-56939 for a standalone repro.Does this PR introduce any user-facing change?
No. This is a concurrency fix; serial behavior is unchanged. The only observable difference under contention is that the session no longer deadlocks.
How was this patch tested?
SetPathSuite,SPARK-56939: concurrent USE SCHEMA and unqualified function lookups do not deadlock, that follows the JIRA's exact repro: one thread alternatesUSE SCHEMA s1/USE SCHEMA s2, another runs unqualifiedcount(*)queries. Without the fix the threads hang on each other's intrinsic locks; with the fix the test completes within the 30s budget.build/sbt 'sql/testOnly *SetPathSuite'-- 60/60 pass.build/sbt 'catalyst/testOnly *SessionCatalogSuite *CatalogManagerSuite *FunctionResolution*'-- 119/119 pass.build/sbt 'sql/testOnly *SQLFunctionSuite *SQLViewSuite'-- 70/70 pass.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor / Claude Opus 4.7