[VL] Reduce Velox hash shuffle partition buffer memory by evicting large partitions after split by wankunde · Pull Request #12089 · apache/gluten

wankunde · 2026-05-13T12:45:54Z

What changes are proposed in this pull request?

Why this PR is needed?

In doSplit() method, VeloxHashShuffleWriter will reclaim memory when the memory is full, but evictPartitionBuffersMinSize() method will be skipped due to splitState_ == kInit condition. Although there are some large shuffle partition buffers, we can not evict them.

Changes in this PR:

This change adds a configurable threshold for the Velox hash shuffle writer to evict large partition buffers immediately after splitting an input batch.
When enabled, the hash shuffle writer estimates each partition buffer’s allocated capacity after split and evicts partitions whose buffered data exceeds the threshold.
This reduces retained partition buffer memory before processing the next input RowVector.

The new config is:

spark.gluten.sql.columnar.shuffle.evictPartitionSize

Default value: 256 * 1024 bytes.

How was this patch tested?

Fixed some OOM cases in our production.

Was this patch authored or co-authored using generative AI tooling?

NO

…artitions after split

github-actions · 2026-05-13T12:46:30Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-05-13T14:00:56Z

Run Gluten Clickhouse CI on x86

marin-ma · 2026-05-14T09:36:56Z

but evictPartitionBuffersMinSize() method will be skipped due to splitState_ == kInit condition.

Can you explain a bit more about this? According to the code logic, evictPartitionBuffersMinSize() is used for reclaiming memory when splitState_ == kInit

https://github.com/apache/gluten/blob/main/cpp/velox/shuffle/VeloxHashShuffleWriter.cc#L1389-L1391

wankunde · 2026-05-14T13:41:40Z

but evictPartitionBuffersMinSize() method will be skipped due to splitState_ == kInit condition.

Can you explain a bit more about this? According to the code logic, evictPartitionBuffersMinSize() is used for reclaiming memory when splitState_ == kInit

https://github.com/apache/gluten/blob/main/cpp/velox/shuffle/VeloxHashShuffleWriter.cc#L1389-L1391

When the shuffle writer is in doSplit() method, and split state may be kPreAlloc or kSplit, hash shuffle writer can not reclaim memory from itself.

spill number	split state	memory used	comment
0	kInit	0MB
0	kPreAlloc	100MB
0	kSplit	150MB
1	kInit	150MB
1	kPreAlloc	250MB
1	kSplit	300MB
2	kInit	300MB
2	kPreAlloc	450MB	case1 : ShuffleWriter used + some other used > task offheap total
2	kSplit	500MB	case2 : ShuffleWriter used + some other used > task offheap total

marin-ma

Thanks. Some minor comments for the configuration

marin-ma · 2026-05-14T09:18:34Z

      .createWithDefault(0.25)

+  val COLUMNAR_SHUFFLE_EVICT_PARTITION_SIZE =
+    buildConf("spark.gluten.sql.columnar.shuffle.evictPartitionSize")


I would suggest use spark.gluten.sql.columnar.shuffle.partitionBufferEvictThreshold for this configuration.

marin-ma · 2026-05-14T09:20:21Z

+        "For Velox hash shuffle writer, evict partition buffers larger than this threshold " +
+          "after splitting an input batch.")
+      .intConf
+      .createWithDefault(256 * 1024)


Can you set the default value to -1 to disable this eviction behaviour by default?

github-actions · 2026-05-15T05:39:20Z

Run Gluten Clickhouse CI on x86

marin-ma · 2026-05-15T10:02:54Z

+| spark.gluten.sql.columnar.shuffle.codecBackend                      | &lt;undefined&gt; |
+| spark.gluten.sql.columnar.shuffle.compression.threshold             | 100               | If number of rows in a batch falls below this threshold, will copy all buffers into one buffer to compress.                                                                                                                                                                                                                                                                    |
+| spark.gluten.sql.columnar.shuffle.dictionary.enabled                | false             | Enable dictionary in hash-based shuffle.                                                                                                                                                                                                                                                                                                                                       |
+| spark.gluten.sql.columnar.shuffle.evictPartitionSize                | 262144            | For Velox hash shuffle writer, evict partition buffers larger than this threshold after splitting an input batch.                                                                                                                                                                                                                                                              |


Please also update the document.

marin-ma · 2026-05-15T10:03:21Z

LGTM. cc @FelixYBW Do you have any comments?

github-actions · 2026-05-15T10:10:45Z

Run Gluten Clickhouse CI on x86

Reduce Velox hash shuffle partition buffer memory by evicting large p…

3b69ec8

…artitions after split

github-actions Bot added CORE works for Gluten Core VELOX labels May 13, 2026

wankunde changed the title ~~[VL] Reduce Velox hash shuffle partition buffer memory by evicting large partitions after split~~ [DRAFT][VL] Reduce Velox hash shuffle partition buffer memory by evicting large partitions after split May 13, 2026

update doc

5d40890

github-actions Bot added the DOCS label May 13, 2026

wankunde changed the title ~~[DRAFT][VL] Reduce Velox hash shuffle partition buffer memory by evicting large partitions after split~~ [VL] Reduce Velox hash shuffle partition buffer memory by evicting large partitions after split May 14, 2026

marin-ma reviewed May 14, 2026

View reviewed changes

Rename conf name

715585d

marin-ma reviewed May 15, 2026

View reviewed changes

update docs with dev/gen-all-config-docs.sh

26c6145

Conversation

wankunde commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

marin-ma commented May 14, 2026

Uh oh!

wankunde commented May 14, 2026

Uh oh!

marin-ma left a comment

Choose a reason for hiding this comment

Uh oh!

marin-ma May 14, 2026

Choose a reason for hiding this comment

Uh oh!

wankunde May 15, 2026

Choose a reason for hiding this comment

Uh oh!

marin-ma May 14, 2026

Choose a reason for hiding this comment

Uh oh!

wankunde May 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

marin-ma May 15, 2026

Choose a reason for hiding this comment

Uh oh!

wankunde May 15, 2026

Choose a reason for hiding this comment

Uh oh!

marin-ma commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wankunde commented May 13, 2026 •

edited

Loading