Skip to content

[AOTI-CUDA] qwen3.5-35B-A3B prefill int8 perf through MoE #18949

@digantdesai

Description

@digantdesai

Take MoE fused op from
bf16 @ (int4 --> bf16) = fp32
to
(bf16 --> int8) @ (int4 --> int8) = int32

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions