[CALCITE-7422] Support large plan optimization mode for HepPlanner#4803
[CALCITE-7422] Support large plan optimization mode for HepPlanner#4803zhuwenzhuang wants to merge 5 commits intoapache:mainfrom
Conversation
d87d57a to
d4ae34e
Compare
d4ae34e to
eefc51c
Compare
6369ba1 to
6d9ab05
Compare
|
A better iterater implementation of DFS/BFS is needed. I will optimize this later (unavailable for the next two weeks). |
6d9ab05 to
a5805c8
Compare
4e77147 to
3870056
Compare
90c7cd5 to
9ed2932
Compare
Key optimizations of large plan mode: 1. Reusable graph, avoid reinit. 2. Efficient traversal, skip stable subtree. 3. Fine-grained GC. Usage: see comments of HepPlanner() Perf result of LargePlanBenchmark: Match Order Union Num Node Count Rule Transforms Time (ms) -------------------------------------------------------------------- ARBITRARY 1000 4000 6006 1043 ARBITRARY 3000 12000 18006 1306 ARBITRARY 10000 40000 60006 3655 ARBITRARY 30000 120000 180006 13040 DEPTH_FIRST 1000 4000 6006 347 DEPTH_FIRST 3000 12000 18006 1068 DEPTH_FIRST 10000 40000 60006 4165 DEPTH_FIRST 30000 120000 180006 12898 BOTTOM_UP 1000 4000 6006 1145 BOTTOM_UP 3000 12000 18006 10152 TOP_DOWN 1000 4000 6006 1193 TOP_DOWN 3000 12000 18006 8074
9ed2932 to
7aabe9d
Compare
|
|
||
| private boolean enableFiredRulesCache = false; | ||
|
|
||
| private boolean largePlanMode = false; |
There was a problem hiding this comment.
Is the optimization somehow worse for small plans?
If it's always faster, there should not be such a flag, so you should document it as "To be removed in the future".
If small plans are worse in this mode, the flag should remain, but it still should be documented somehow (e.g., how to choose whether to set it).
There was a problem hiding this comment.
The optimization is always better for plans in any scale.
I will add the comment "To be removed in the future".
There was a problem hiding this comment.
I'm not very familiar with this optimization; I only took a quick look, but it seems to be quite effective. I do have one concern, though: should we add a specific test to verify the correctness of the execution plan? Alternatively, could we simply remove the largePlanMode configuration entirely (or set it to true for the purpose of comparative testing) and let all existing test cases validate the plan's correctness?
There was a problem hiding this comment.
- LargePlanBenchmark includes a test compare different match orders’ rule match count. it is a specific test to verify the correctness.
- I think this also depends on how the community views the specific compatibility impact of this change. The optimization still guarantees that planning does not stop until no more rules can be matched.
However, if a user has some hacky rules that rely on the traversal order strictly following a fixed pattern, the optimization result may change for those users. I believe such users are actually using HepPlanner incorrectly. If everyone thinks the risk of enabling this by default is acceptable, I would prefer to turn this optimization on by default.
There was a problem hiding this comment.
And if the type digest flag is false by default, the main performance bottleneck is type digest hashing and comparison.
There was a problem hiding this comment.
Thank you for the explanation. Regarding the first question, my concern is whether a single case can effectively cover a wide range of scenarios. As for the second question, while it is true that HepPlanner supports several traversal strategies, I believe that the current optimization should not interfere with these traversal methods—since they are currently supported, we must ensure backward compatibility. If a specific traversal strategy were to be adversely affected, shouldn't we consider disabling this optimization in such instances? I would appreciate it if you could let me know whether my understanding is correct.
There was a problem hiding this comment.
1.Maybe I can run all tests with large plan mode/rule cache/type digest enabled, and see the results.
2.Not adversely affected. If someone uses the match limit and match order to do some match order sensitive stuff, it might break.
There was a problem hiding this comment.
For the first point, I'm concerned that a one-time review of the results doesn't establish a standardized process for future iterations. For the second point, I want to ask whether this optimization can avoid these issues, or if it can automatically fall back to a degraded mode with certain constraints when users employ such methods. Is this feasible to implement?
There was a problem hiding this comment.
I agree with @xiedeyantu for 1., it's OK for now but unfortunately it won't protect from future regressions.
For 2., I think I got what you mean, but if you could provide a concrete example, it would help making sure we are aligned, backwards compatibility is a major concern for libraries like Calcite.
There was a problem hiding this comment.
Thank you for the review. @xiedeyantu @asolimando
I added regression coverage for HepPlanner-related cases.
I also added a concrete compatibility-sensitive case in HepPlannerTest.
|
If you make changes, please use new commits for now. |
- Add Javadoc for largePlanMode field with "To be removed in the future" note - Rename clear() to clearRules() for clarity in multiphase optimization - Update usage example to use clearRules() and fix "graph is reused" to "graph is preserved" - Remove UnsupportedOperationException in findBestExp() for large plan mode - Fix comment about garbage collection in getGraphIterator() - Add comment about caching fired rule before constructing HepRuleCall - Fix "different with" to "different from" in assertion error messages
mihaibudiu
left a comment
There was a problem hiding this comment.
I have approved, but please consider adding the two extra comments I have suggested.
00e9545 to
a206f67
Compare
|






Motivation and details: https://issues.apache.org/jira/browse/CALCITE-7422
Before:
LargePlanBenchmark:100 : 1s
LargePlanBenchmark:1000 : 9s
LargePlanBenchmark:10000 : pretty slow, cannot measure.
CPU Profiler Result(enable fired rules cache and disable large plan mode):


fireRule(RelOptRuleCall ruleCall)takes 2% cpu time.Mem Profiler Result(enable fired rules cache and disable large plan mode):
After (enable fired rules cache and large plan mode):
LargePlanBenchmark:100 : 1s
LargePlanBenchmark:1000 : 2s
LargePlanBenchmark:10000 : about 60s
Benchmark (unionNum) Mode Cnt Score Error Units LargePlanBenchmark.testLargeUnionPlan 100 avgt 256.561 ms/op LargePlanBenchmark.testLargeUnionPlan 1000 avgt 1616.421 ms/op LargePlanBenchmark.testLargeUnionPlan 10000 avgt 53393.727 ms/opCPU Profiler Result:
fireRule(RelOptRuleCall ruleCall)takes 11% cpu time. There is still lots of room for CPU optimization.Mem Profiler Result:

(avoid buildListRecurse/collectGarbage, smaller memory peak size)