Skip to content

[CALCITE-7422] Support large plan optimization mode for HepPlanner#4803

Open
zhuwenzhuang wants to merge 5 commits intoapache:mainfrom
zhuwenzhuang:large_plan_mode
Open

[CALCITE-7422] Support large plan optimization mode for HepPlanner#4803
zhuwenzhuang wants to merge 5 commits intoapache:mainfrom
zhuwenzhuang:large_plan_mode

Conversation

@zhuwenzhuang
Copy link
Contributor

@zhuwenzhuang zhuwenzhuang commented Feb 24, 2026

Motivation and details: https://issues.apache.org/jira/browse/CALCITE-7422
Before:
LargePlanBenchmark:100 : 1s
LargePlanBenchmark:1000 : 9s
LargePlanBenchmark:10000 : pretty slow, cannot measure.

CPU Profiler Result(enable fired rules cache and disable large plan mode):
fireRule(RelOptRuleCall ruleCall) takes 2% cpu time.
image
Mem Profiler Result(enable fired rules cache and disable large plan mode):
image

After (enable fired rules cache and large plan mode):
LargePlanBenchmark:100 : 1s
LargePlanBenchmark:1000 : 2s
LargePlanBenchmark:10000 : about 60s

Benchmark (unionNum) Mode Cnt Score Error Units LargePlanBenchmark.testLargeUnionPlan 100 avgt 256.561 ms/op LargePlanBenchmark.testLargeUnionPlan 1000 avgt 1616.421 ms/op LargePlanBenchmark.testLargeUnionPlan 10000 avgt 53393.727 ms/op
CPU Profiler Result:
fireRule(RelOptRuleCall ruleCall) takes 11% cpu time. There is still lots of room for CPU optimization.
image

Mem Profiler Result:
(avoid buildListRecurse/collectGarbage, smaller memory peak size)
image

@zhuwenzhuang zhuwenzhuang force-pushed the large_plan_mode branch 6 times, most recently from d87d57a to d4ae34e Compare February 24, 2026 15:35
@zhuwenzhuang zhuwenzhuang marked this pull request as draft February 24, 2026 15:38
@zhuwenzhuang zhuwenzhuang force-pushed the large_plan_mode branch 2 times, most recently from 6369ba1 to 6d9ab05 Compare February 25, 2026 06:12
@zhuwenzhuang
Copy link
Contributor Author

A better iterater implementation of DFS/BFS is needed. I will optimize this later (unavailable for the next two weeks).

@zhuwenzhuang zhuwenzhuang changed the title [CALCITE-7422] Support large plan optimizaion mode for HepPlanner [CALCITE-7422] Support large plan optimization mode for HepPlanner Mar 1, 2026
@zhuwenzhuang zhuwenzhuang force-pushed the large_plan_mode branch 7 times, most recently from 4e77147 to 3870056 Compare March 12, 2026 09:46
@zhuwenzhuang
Copy link
Contributor Author

zhuwenzhuang commented Mar 12, 2026

After default depth first iterator replaced by HepVertexIterator.

LargePlanBenchmark:10000 takes 3.6s [10000 union (40000 rel nodes), large plan mode + rule cache + ARBITRARY match order]:

  1. CPU Profiler Result
    1.1.fireRule(RelOptRuleCall ruleCall) takes 70 % CPU.
image

1.2. garbageCollection's removeEdge(V source, V target) takes 23 % CPU.
image

Memory Profiler Result:
Primarily used by rules, with rare use by the planner/iterator itself.
image

LargePlanBenchmark:100000 takes 68.741 s [400k rel nodes, same configuration]

All perf result of LargePlanBenchmark:
NOTE:NodeCount/RuleTransforms are estimation values from the test' scale.

MatchOrder UnionNum NodeCount RuleTransforms Time (ms)
ARBITRARY 1,000 4,000 6,006 1,043
ARBITRARY 3,000 12,000 18,006 1,306
ARBITRARY 10,000 40,000 60,006 3,655
ARBITRARY 30,000 120,000 180,006 13,040
DEPTH_FIRST 1,000 4,000 6,006 347
DEPTH_FIRST 3,000 12,000 18,006 1,068
DEPTH_FIRST 10,000 40,000 60,006 4,165
DEPTH_FIRST 30,000 120,000 180,006 12,898
BOTTOM_UP 1,000 4,000 6,006 1,145
BOTTOM_UP 3,000 12,000 18,006 10,152
TOP_DOWN 1,000 4,000 6,006 1,193
TOP_DOWN 3,000 12,000 18,006 8,074

@zhuwenzhuang zhuwenzhuang force-pushed the large_plan_mode branch 2 times, most recently from 90c7cd5 to 9ed2932 Compare March 12, 2026 10:31
@zhuwenzhuang zhuwenzhuang marked this pull request as ready for review March 12, 2026 11:02
Key optimizations of large plan mode:

1. Reusable graph, avoid reinit.
2. Efficient traversal, skip stable subtree.
3. Fine-grained GC.

Usage: see comments of HepPlanner()

Perf result of LargePlanBenchmark:
Match Order     Union Num  Node Count      Rule Transforms Time (ms)
--------------------------------------------------------------------
ARBITRARY       1000       4000            6006            1043
ARBITRARY       3000       12000           18006           1306
ARBITRARY       10000      40000           60006           3655
ARBITRARY       30000      120000          180006          13040
DEPTH_FIRST     1000       4000            6006            347
DEPTH_FIRST     3000       12000           18006           1068
DEPTH_FIRST     10000      40000           60006           4165
DEPTH_FIRST     30000      120000          180006          12898
BOTTOM_UP       1000       4000            6006            1145
BOTTOM_UP       3000       12000           18006           10152
TOP_DOWN        1000       4000            6006            1193
TOP_DOWN        3000       12000           18006           8074

private boolean enableFiredRulesCache = false;

private boolean largePlanMode = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the optimization somehow worse for small plans?
If it's always faster, there should not be such a flag, so you should document it as "To be removed in the future".
If small plans are worse in this mode, the flag should remain, but it still should be documented somehow (e.g., how to choose whether to set it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization is always better for plans in any scale.
I will add the comment "To be removed in the future".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with this optimization; I only took a quick look, but it seems to be quite effective. I do have one concern, though: should we add a specific test to verify the correctness of the execution plan? Alternatively, could we simply remove the largePlanMode configuration entirely (or set it to true for the purpose of comparative testing) and let all existing test cases validate the plan's correctness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. LargePlanBenchmark includes a test compare different match orders’ rule match count. it is a specific test to verify the correctness.
  2. I think this also depends on how the community views the specific compatibility impact of this change. The optimization still guarantees that planning does not stop until no more rules can be matched.
    However, if a user has some hacky rules that rely on the traversal order strictly following a fixed pattern, the optimization result may change for those users. I believe such users are actually using HepPlanner incorrectly. If everyone thinks the risk of enabling this by default is acceptable, I would prefer to turn this optimization on by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if the type digest flag is false by default, the main performance bottleneck is type digest hashing and comparison.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. Regarding the first question, my concern is whether a single case can effectively cover a wide range of scenarios. As for the second question, while it is true that HepPlanner supports several traversal strategies, I believe that the current optimization should not interfere with these traversal methods—since they are currently supported, we must ensure backward compatibility. If a specific traversal strategy were to be adversely affected, shouldn't we consider disabling this optimization in such instances? I would appreciate it if you could let me know whether my understanding is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.Maybe I can run all tests with large plan mode/rule cache/type digest enabled, and see the results.
2.Not adversely affected. If someone uses the match limit and match order to do some match order sensitive stuff, it might break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first point, I'm concerned that a one-time review of the results doesn't establish a standardized process for future iterations. For the second point, I want to ask whether this optimization can avoid these issues, or if it can automatically fall back to a degraded mode with certain constraints when users employ such methods. Is this feasible to implement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @xiedeyantu for 1., it's OK for now but unfortunately it won't protect from future regressions.

For 2., I think I got what you mean, but if you could provide a concrete example, it would help making sure we are aligned, backwards compatibility is a major concern for libraries like Calcite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. @xiedeyantu @asolimando

I added regression coverage for HepPlanner-related cases.
I also added a concrete compatibility-sensitive case in HepPlannerTest.

@mihaibudiu
Copy link
Contributor

If you make changes, please use new commits for now.

- Add Javadoc for largePlanMode field with "To be removed in the future" note
- Rename clear() to clearRules() for clarity in multiphase optimization
- Update usage example to use clearRules() and fix "graph is reused" to "graph is preserved"
- Remove UnsupportedOperationException in findBestExp() for large plan mode
- Fix comment about garbage collection in getGraphIterator()
- Add comment about caching fired rule before constructing HepRuleCall
- Fix "different with" to "different from" in assertion error messages
Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have approved, but please consider adding the two extra comments I have suggested.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Mar 18, 2026
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants