Skip to content

refactor(workflow-operator): remove dead MySQLSource exec path#6052

Open
Ma77Ball wants to merge 4 commits into
apache:mainfrom
Ma77Ball:refactor/mysql-source-retention
Open

refactor(workflow-operator): remove dead MySQLSource exec path#6052
Ma77Ball wants to merge 4 commits into
apache:mainfrom
Ma77Ball:refactor/mysql-source-retention

Conversation

@Ma77Ball

@Ma77Ball Ma77Ball commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

  • Delete MySQLSourceOpExec and MySQLConnUtil, the executor and JDBC connection helper for the deprecated MySQL source operator, which have been unreachable dead code since the mysql-connector-java driver was dropped in chore(dependencies): drop mysql-connector-java #4386.
  • Reduce MySQLSourceOpDesc to a deserialization-only stub: getPhysicalOp now throws UnsupportedOperationException instead of wiring the deleted executor, and the MySQLConnUtil-based establishConn override is removed, while its fields, operatorInfo, and updatePort remain so legacy workflows still deserialize.
  • Add a comment at the LogicalOp type registration noting the type is retained non-executable for legacy-workflow deserialization only.
  • Update MySQLSourceOpDescSpec (flip the getPhysicalOp case to assert it throws, keep the deserialization coverage) and delete MySQLConnUtilSpec along with the util it covered.

Any related issues, documentation, discussions?

Closes: #6051

How was this PR tested?

  • Run sbt "WorkflowOperator/testOnly *MySQLSourceOpDescSpec" and expect 5 tests passing, including the flipped case asserting getPhysicalOp throws UnsupportedOperationException and the round-trip test confirming a MySQLSource node still deserializes into a MySQLSourceOpDesc.
  • Note: tests were run locally under JDK 17; under JDK 25 the sbt-jacoco 0.8.11 instrumentation step fails on a dependency class before any test runs, which is an environment/tooling limitation unrelated to this change.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF

@github-actions github-actions Bot added refactor Refactor the code common labels Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @PG1204, @bobbai00, @aglinxinyuan
    You can notify them by mentioning @PG1204, @bobbai00, @aglinxinyuan in a comment.

@codecov-commenter

codecov-commenter commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.67%. Comparing base (24b587f) to head (87fdc15).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6052      +/-   ##
============================================
- Coverage     56.89%   56.67%   -0.22%     
- Complexity     3058     3064       +6     
============================================
  Files          1129     1124       -5     
  Lines         43802    43296     -506     
  Branches       4743     4666      -77     
============================================
- Hits          24922    24540     -382     
+ Misses        17449    17357      -92     
+ Partials       1431     1399      -32     
Flag Coverage Δ *Carryforward flag
access-control-service 70.00% <ø> (ø)
agent-service 44.59% <ø> (ø) Carriedforward from 7de1803
amber 58.76% <100.00%> (+0.11%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 52.30% <ø> (ø)
file-service 62.81% <ø> (ø)
frontend 49.42% <ø> (-0.71%) ⬇️ Carriedforward from 7de1803
notebook-migration-service 78.57% <ø> (ø)
pyamber 90.51% <ø> (-0.64%) ⬇️ Carriedforward from 7de1803
python 90.69% <ø> (ø) Carriedforward from 7de1803
workflow-compiling-service 55.14% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 4 better · 🔴 1 worse · ⚪ 10 noise (<±5%) · 0 without baseline

Compared against main 24b587f benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 398 0.243 25,903/29,977/29,977 us 🟢 -14.5% / 🔴 +102.8%
🟢 bs=100 sw=10 sl=64 810 0.495 121,069/145,525/145,525 us 🟢 -17.7% / 🔴 +34.3%
bs=1000 sw=10 sl=64 917 0.56 1,094,605/1,118,945/1,118,945 us ⚪ within ±5% / 🔴 +9.7%
Baseline details

Latest main 24b587f from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 398 tuples/sec 415 tuples/sec 770.95 tuples/sec -4.1% -48.4%
bs=10 sw=10 sl=64 MB/s 0.243 MB/s 0.253 MB/s 0.471 MB/s -4.0% -48.4%
bs=10 sw=10 sl=64 p50 25,903 us 23,405 us 12,775 us +10.7% +102.8%
bs=10 sw=10 sl=64 p95 29,977 us 35,073 us 15,286 us -14.5% +96.1%
bs=10 sw=10 sl=64 p99 29,977 us 35,073 us 18,795 us -14.5% +59.5%
bs=100 sw=10 sl=64 throughput 810 tuples/sec 797 tuples/sec 976.93 tuples/sec +1.6% -17.1%
bs=100 sw=10 sl=64 MB/s 0.495 MB/s 0.487 MB/s 0.596 MB/s +1.6% -17.0%
bs=100 sw=10 sl=64 p50 121,069 us 119,606 us 102,557 us +1.2% +18.1%
bs=100 sw=10 sl=64 p95 145,525 us 176,847 us 108,383 us -17.7% +34.3%
bs=100 sw=10 sl=64 p99 145,525 us 176,847 us 115,249 us -17.7% +26.3%
bs=1000 sw=10 sl=64 throughput 917 tuples/sec 930 tuples/sec 1,009 tuples/sec -1.4% -9.1%
bs=1000 sw=10 sl=64 MB/s 0.56 MB/s 0.568 MB/s 0.616 MB/s -1.4% -9.1%
bs=1000 sw=10 sl=64 p50 1,094,605 us 1,073,709 us 997,695 us +1.9% +9.7%
bs=1000 sw=10 sl=64 p95 1,118,945 us 1,111,462 us 1,036,731 us +0.7% +7.9%
bs=1000 sw=10 sl=64 p99 1,118,945 us 1,111,462 us 1,069,334 us +0.7% +4.6%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,502.55,200,128000,398,0.243,25902.68,29976.70,29976.70
1,100,10,64,20,2468.46,2000,1280000,810,0.495,121068.96,145525.00,145525.00
2,1000,10,64,20,21805.50,20000,12800000,917,0.560,1094605.49,1118944.75,1118944.75

@aglinxinyuan aglinxinyuan requested a review from Copilot July 1, 2026 23:11

@aglinxinyuan aglinxinyuan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we remove Desc as well?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cleans up the deprecated MySQL source operator implementation under common/workflow-operator by removing unreachable execution/JDBC code while keeping a minimal descriptor so legacy workflows containing a MySQLSource node can still deserialize.

Changes:

  • Deleted the dead execution path (MySQLSourceOpExec) and JDBC helper (MySQLConnUtil) for the deprecated MySQL source operator.
  • Converted MySQLSourceOpDesc.getPhysicalOp into an explicit non-executable stub by throwing UnsupportedOperationException.
  • Updated logical-op type registration and unit tests to reflect “deserialization-only” retention and the new non-executable behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLSourceOpExec.scala Removed unreachable MySQL source executor implementation.
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLConnUtil.scala Removed MySQL JDBC URL/connection helper no longer usable without the driver.
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLSourceOpDesc.scala Made descriptor explicitly non-executable by throwing in getPhysicalOp.
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/LogicalOp.scala Added an inline note that MySQLSource type registration is retained only for legacy deserialization.
common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLSourceOpDescSpec.scala Updated tests to assert getPhysicalOp throws for the non-executable stub.
common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLConnUtilSpec.scala Removed tests for the deleted JDBC utility.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 30 to 35
override def getPhysicalOp(
workflowId: WorkflowIdentity,
executionId: ExecutionIdentity
): PhysicalOp =
PhysicalOp
.sourcePhysicalOp(
workflowId,
executionId,
this.operatorIdentifier,
OpExecWithClassName(
"org.apache.texera.amber.operator.source.sql.mysql.MySQLSourceOpExec",
objectMapper.writeValueAsString(this)
)
)
.withInputPorts(operatorInfo.inputPorts)
.withOutputPorts(operatorInfo.outputPorts)
.withPropagateSchema(
SchemaPropagationFunc(_ => Map(operatorInfo.outputPorts.head.id -> sourceSchema()))
)
throw new UnsupportedOperationException("MySQL Source operator is no longer executable.")

Comment on lines 66 to +70
"MySQLSourceOpDesc.getPhysicalOp" should
"wire the MySQL exec as a source op with no input port and one output port" in {
"throw because the operator is no longer executable" in {
val d = new MySQLSourceOpDesc
val physical = d.getPhysicalOp(DEFAULT_WORKFLOW_ID, DEFAULT_EXECUTION_ID)
physical.opExecInitInfo match {
case OpExecWithClassName(className, _) =>
className shouldBe "org.apache.texera.amber.operator.source.sql.mysql.MySQLSourceOpExec"
case other => fail(s"expected OpExecWithClassName, got $other")
}
physical.inputPorts.keySet shouldBe empty
physical.outputPorts.keySet shouldBe d.operatorInfo.outputPorts.map(_.id).toSet
an[UnsupportedOperationException] should be thrownBy
d.getPhysicalOp(DEFAULT_WORKFLOW_ID, DEFAULT_EXECUTION_ID)
@Yicong-Huang

Yicong-Huang commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

wait wait I thought we temporarily disabled mysql op due to lib license issue, and pending for adding it back.

See this discussion for details #4393 (comment)

@bobbai00 we really should have left an issue about the next step for mysql operator.

Why don't we remove Desc as well?

@aglinxinyuan it is for backward compatibility. we almost cannot remove an operator desc currently, removing any would cause existing workflow in a incorrect state. for mysql, if we plan to have a different exec implementation (using another lib or jdbc), then it will be good to keep desc for now.

@chenlica

chenlica commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

I may have mistakenly thought that we wanted to remove this MySQL support. Let's go with the result of the discussion #4393 (comment).

@Ma77Ball

Ma77Ball commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@Yicong-Huang, should I remove this PR or keep it open (it will remove the excess code left by MySQL)? I can open the issue afterward. However, if you believe this code will be reused in the future, I can remove the pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common refactor Refactor the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove dead MySQLSource exec path and document deserialization-only retention

6 participants