refactor(workflow-operator): remove dead MySQLSource exec path#6052
refactor(workflow-operator): remove dead MySQLSource exec path#6052Ma77Ball wants to merge 4 commits into
Conversation
Automated Reviewer SuggestionsBased on the
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6052 +/- ##
============================================
- Coverage 56.89% 56.67% -0.22%
- Complexity 3058 3064 +6
============================================
Files 1129 1124 -5
Lines 43802 43296 -506
Branches 4743 4666 -77
============================================
- Hits 24922 24540 -382
+ Misses 17449 17357 -92
+ Partials 1431 1399 -32
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
| config | throughput | MB/s | latency | max Δ latest / 7d | |
|---|---|---|---|---|---|
| 🔴 | bs=10 sw=10 sl=64 | 398 | 0.243 | 25,903/29,977/29,977 us | 🟢 -14.5% / 🔴 +102.8% |
| 🟢 | bs=100 sw=10 sl=64 | 810 | 0.495 | 121,069/145,525/145,525 us | 🟢 -17.7% / 🔴 +34.3% |
| ⚪ | bs=1000 sw=10 sl=64 | 917 | 0.56 | 1,094,605/1,118,945/1,118,945 us | ⚪ within ±5% / 🔴 +9.7% |
Baseline details
Latest main 24b587f from same runner
| config | metric | PR | latest main | 7d avg | Δ latest | Δ 7d |
|---|---|---|---|---|---|---|
| bs=10 sw=10 sl=64 | throughput | 398 tuples/sec | 415 tuples/sec | 770.95 tuples/sec | -4.1% | -48.4% |
| bs=10 sw=10 sl=64 | MB/s | 0.243 MB/s | 0.253 MB/s | 0.471 MB/s | -4.0% | -48.4% |
| bs=10 sw=10 sl=64 | p50 | 25,903 us | 23,405 us | 12,775 us | +10.7% | +102.8% |
| bs=10 sw=10 sl=64 | p95 | 29,977 us | 35,073 us | 15,286 us | -14.5% | +96.1% |
| bs=10 sw=10 sl=64 | p99 | 29,977 us | 35,073 us | 18,795 us | -14.5% | +59.5% |
| bs=100 sw=10 sl=64 | throughput | 810 tuples/sec | 797 tuples/sec | 976.93 tuples/sec | +1.6% | -17.1% |
| bs=100 sw=10 sl=64 | MB/s | 0.495 MB/s | 0.487 MB/s | 0.596 MB/s | +1.6% | -17.0% |
| bs=100 sw=10 sl=64 | p50 | 121,069 us | 119,606 us | 102,557 us | +1.2% | +18.1% |
| bs=100 sw=10 sl=64 | p95 | 145,525 us | 176,847 us | 108,383 us | -17.7% | +34.3% |
| bs=100 sw=10 sl=64 | p99 | 145,525 us | 176,847 us | 115,249 us | -17.7% | +26.3% |
| bs=1000 sw=10 sl=64 | throughput | 917 tuples/sec | 930 tuples/sec | 1,009 tuples/sec | -1.4% | -9.1% |
| bs=1000 sw=10 sl=64 | MB/s | 0.56 MB/s | 0.568 MB/s | 0.616 MB/s | -1.4% | -9.1% |
| bs=1000 sw=10 sl=64 | p50 | 1,094,605 us | 1,073,709 us | 997,695 us | +1.9% | +9.7% |
| bs=1000 sw=10 sl=64 | p95 | 1,118,945 us | 1,111,462 us | 1,036,731 us | +0.7% | +7.9% |
| bs=1000 sw=10 sl=64 | p99 | 1,118,945 us | 1,111,462 us | 1,069,334 us | +0.7% | +4.6% |
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,502.55,200,128000,398,0.243,25902.68,29976.70,29976.70
1,100,10,64,20,2468.46,2000,1280000,810,0.495,121068.96,145525.00,145525.00
2,1000,10,64,20,21805.50,20000,12800000,917,0.560,1094605.49,1118944.75,1118944.75
aglinxinyuan
left a comment
There was a problem hiding this comment.
Why don't we remove Desc as well?
There was a problem hiding this comment.
Pull request overview
This PR cleans up the deprecated MySQL source operator implementation under common/workflow-operator by removing unreachable execution/JDBC code while keeping a minimal descriptor so legacy workflows containing a MySQLSource node can still deserialize.
Changes:
- Deleted the dead execution path (
MySQLSourceOpExec) and JDBC helper (MySQLConnUtil) for the deprecated MySQL source operator. - Converted
MySQLSourceOpDesc.getPhysicalOpinto an explicit non-executable stub by throwingUnsupportedOperationException. - Updated logical-op type registration and unit tests to reflect “deserialization-only” retention and the new non-executable behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLSourceOpExec.scala | Removed unreachable MySQL source executor implementation. |
| common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLConnUtil.scala | Removed MySQL JDBC URL/connection helper no longer usable without the driver. |
| common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLSourceOpDesc.scala | Made descriptor explicitly non-executable by throwing in getPhysicalOp. |
| common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/LogicalOp.scala | Added an inline note that MySQLSource type registration is retained only for legacy deserialization. |
| common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLSourceOpDescSpec.scala | Updated tests to assert getPhysicalOp throws for the non-executable stub. |
| common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/source/sql/mysql/MySQLConnUtilSpec.scala | Removed tests for the deleted JDBC utility. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| override def getPhysicalOp( | ||
| workflowId: WorkflowIdentity, | ||
| executionId: ExecutionIdentity | ||
| ): PhysicalOp = | ||
| PhysicalOp | ||
| .sourcePhysicalOp( | ||
| workflowId, | ||
| executionId, | ||
| this.operatorIdentifier, | ||
| OpExecWithClassName( | ||
| "org.apache.texera.amber.operator.source.sql.mysql.MySQLSourceOpExec", | ||
| objectMapper.writeValueAsString(this) | ||
| ) | ||
| ) | ||
| .withInputPorts(operatorInfo.inputPorts) | ||
| .withOutputPorts(operatorInfo.outputPorts) | ||
| .withPropagateSchema( | ||
| SchemaPropagationFunc(_ => Map(operatorInfo.outputPorts.head.id -> sourceSchema())) | ||
| ) | ||
| throw new UnsupportedOperationException("MySQL Source operator is no longer executable.") | ||
|
|
| "MySQLSourceOpDesc.getPhysicalOp" should | ||
| "wire the MySQL exec as a source op with no input port and one output port" in { | ||
| "throw because the operator is no longer executable" in { | ||
| val d = new MySQLSourceOpDesc | ||
| val physical = d.getPhysicalOp(DEFAULT_WORKFLOW_ID, DEFAULT_EXECUTION_ID) | ||
| physical.opExecInitInfo match { | ||
| case OpExecWithClassName(className, _) => | ||
| className shouldBe "org.apache.texera.amber.operator.source.sql.mysql.MySQLSourceOpExec" | ||
| case other => fail(s"expected OpExecWithClassName, got $other") | ||
| } | ||
| physical.inputPorts.keySet shouldBe empty | ||
| physical.outputPorts.keySet shouldBe d.operatorInfo.outputPorts.map(_.id).toSet | ||
| an[UnsupportedOperationException] should be thrownBy | ||
| d.getPhysicalOp(DEFAULT_WORKFLOW_ID, DEFAULT_EXECUTION_ID) |
|
wait wait I thought we temporarily disabled mysql op due to lib license issue, and pending for adding it back. See this discussion for details #4393 (comment) @bobbai00 we really should have left an issue about the next step for mysql operator.
@aglinxinyuan it is for backward compatibility. we almost cannot remove an operator desc currently, removing any would cause existing workflow in a incorrect state. for mysql, if we plan to have a different exec implementation (using another lib or jdbc), then it will be good to keep desc for now. |
|
I may have mistakenly thought that we wanted to remove this MySQL support. Let's go with the result of the discussion #4393 (comment). |
|
@Yicong-Huang, should I remove this PR or keep it open (it will remove the excess code left by MySQL)? I can open the issue afterward. However, if you believe this code will be reused in the future, I can remove the pr. |
What changes were proposed in this PR?
MySQLSourceOpExecandMySQLConnUtil, the executor and JDBC connection helper for the deprecated MySQL source operator, which have been unreachable dead code since themysql-connector-javadriver was dropped in chore(dependencies): dropmysql-connector-java#4386.MySQLSourceOpDescto a deserialization-only stub:getPhysicalOpnow throwsUnsupportedOperationExceptioninstead of wiring the deleted executor, and theMySQLConnUtil-basedestablishConnoverride is removed, while its fields,operatorInfo, andupdatePortremain so legacy workflows still deserialize.LogicalOptype registration noting the type is retained non-executable for legacy-workflow deserialization only.MySQLSourceOpDescSpec(flip thegetPhysicalOpcase to assert it throws, keep the deserialization coverage) and deleteMySQLConnUtilSpecalong with the util it covered.Any related issues, documentation, discussions?
Closes: #6051
How was this PR tested?
sbt "WorkflowOperator/testOnly *MySQLSourceOpDescSpec"and expect 5 tests passing, including the flipped case assertinggetPhysicalOpthrowsUnsupportedOperationExceptionand the round-trip test confirming aMySQLSourcenode still deserializes into aMySQLSourceOpDesc.Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF