Split attention backends #12870

dxqb · 2025-12-21T21:28:02Z

What does this PR do?

Most recent models have been using variable lengths captions (Qwen, Chroma, Z-Image, ...) and require attention masking if batch size > 1 with multiple captions.
torch SDPA only uses its internal flash attention algorithm if there is no attention mask. Otherwise it falls back to another algorithm that is significantly slower, especially for high sequence lengths.

This PR implements an attention backend that splits up the attention batch into individual samples. Even though attention has to be called multiple times then, it is still faster than masked attention (tested up to batch size 8).

This PR also lays the groundwork for efficiently using "flash varlen" and other varlen attention backends, which are already implemented but not efficiently (see code comment).

This PR is based on @kashif and @cdutr 's work in this PR: #12702
To see only my changes, relative to their PR, follow this link: https://github.com/dxqb/diffusers/pull/4/files

Benchmarks

Training benchmarks using OneTrainer: especially training in higher resolution benefits:

Inference benchmark using diffusers Qwen example script (but with regional compilation):
Inference benefits when comparing apples to apples, which is BS2 for CFG. However, the current pipeline already avoids attention masks by calling the transformer twice with BS1, so there is only a slight practical improvement for inference:

Who can review?

@yiyixuxu and @sayakpaul
CC @kashif and @cdutr for feedback

contains #12892

- Remove seq_lens parameter from dispatch_attention_fn - Update varlen backends to extract seqlens from masks - Update QwenImage to pass 2D joint_attention_mask - Fix native backend to handle 2D boolean masks - Fix sage_varlen seqlens_q to match seqlens_k for self-attention Note: sage_varlen still producing black images, needs further investigation

…to txt_seq_lens

Co-authored-by: YiYi Xu <[email protected]>

Enhances documentation with comprehensive performance insights for QwenImage pipeline:

Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter

…to txt_seq_lens

dxqb · 2025-12-26T17:06:36Z

added backends to split flash attention. This is useful especially for Windows users, because native torch SDPA doesn't use flash internally on Windows
tried to add Z-Image support but ran into this issue: Z-Image text sequence length issue #12893

kashif and others added 30 commits November 23, 2025 18:02

Fix QwenImage txt_seq_lens handling

b547fcf

formatting

72a80c6

formatting

88cee8b

remove txt_seq_lens and use bool mask

ac5ac24

Merge branch 'main' into txt_seq_lens

0477526

use compute_text_seq_len_from_mask

18efdde

add seq_lens to dispatch_attention_fn

6a549d4

use joint_seq_lens

2d424e0

remove unused index_block

30b5f98

Merge branch 'main' into txt_seq_lens

588dc04

Merge branch 'txt_seq_lens' of https://github.com/kashif/diffusers in…

ec52417

…to txt_seq_lens

fix formatting

beeb020

undo sage changes

5c6f8e3

xformers support

5d434f6

hub fix

71ba603

Merge branch 'main' into txt_seq_lens

babf490

fix torch compile issues

afad335

Merge branch 'main' into txt_seq_lens

2d5ab16

fix tests

c78a1e9

use _prepare_attn_mask_native

d6d4b1d

proper deprecation notice

e999b76

add deprecate to txt_seq_lens

8115f0b

Update src/diffusers/models/transformers/transformer_qwenimage.py

3b1510c

Co-authored-by: YiYi Xu <[email protected]>

Update src/diffusers/models/transformers/transformer_qwenimage.py

3676d8e

Co-authored-by: YiYi Xu <[email protected]>

Only create the mask if there's actual padding

9ed0ffd

Merge branch 'main' into txt_seq_lens

abec461

fix order of docstrings

e26e7b3

Adds performance benchmarks and optimization details for QwenImage

59e3882

Enhances documentation with comprehensive performance insights for QwenImage pipeline:

Merge branch 'main' into txt_seq_lens

0cb2138

cdutr and others added 14 commits December 17, 2025 17:23

Updates deprecation warnings for txt_seq_lens parameter

61f5265

Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter

fix compile

2ef38e2

Merge branch 'txt_seq_lens' of https://github.com/kashif/diffusers in…

270c63f

…to txt_seq_lens

formatting

35efa06

fix compile tests

50c4815

Merge branch 'main' into txt_seq_lens

c88bc06

rename helper

1433783

remove duplicate

8de799c

smaller values

fc93747

Merge branch 'main' into txt_seq_lens

8bb47d8

removed

b7c288a

Merge branch 'main' into txt_seq_lens

4700b7f

split attention

2f86879

fix type hints

87bbde4

This was referenced Dec 21, 2025

Fix QwenImage txt_seq_lens handling #12702

Open

Split attention backend #12869

Closed

fix error if no attn mask is passed

66056f1

dxqb mentioned this pull request Dec 21, 2025

Split attention Nerogar/OneTrainer#1218

Draft

dxqb and others added 8 commits December 23, 2025 16:03

Merge branch 'main' into split_attention

0a713d1

Merge remote-tracking branch 'origin/main' into pr-12702-base

b9880f6

Merge branch 'pr-12702-base' into split_attention

a8bba06

check attention mask

5eef3ef

Merge branch 'check_attn_mask' into split_attention

e593603

Merge branch 'main' into pr-12702-base

23e7a65

Merge branch 'pr-12702-base' into split_attention

0584542

more backends

7651363

dxqb changed the title ~~Split attention backend~~ Split attention backends Dec 26, 2025

dxqb added 2 commits December 26, 2025 18:11

bugfix

cc134a7

bugfix

7e456cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split attention backends #12870

Split attention backends #12870

dxqb commented Dec 21, 2025 •

edited

Loading

Uh oh!

dxqb commented Dec 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Split attention backends #12870

Are you sure you want to change the base?

Split attention backends #12870

Conversation

dxqb commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Benchmarks

Who can review?

Uh oh!

dxqb commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dxqb commented Dec 21, 2025 •

edited

Loading

dxqb commented Dec 26, 2025 •

edited

Loading