Skip to content

MoE Commnucator design doc#818

Draft
Binyang2014 wants to merge 25 commits into
feature/epfrom
binyli/ep
Draft

MoE Commnucator design doc#818
Binyang2014 wants to merge 25 commits into
feature/epfrom
binyli/ep

Conversation

@Binyang2014

@Binyang2014 Binyang2014 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Add API doc for MoE communication

@Binyang2014 Binyang2014 marked this pull request as ready for review June 16, 2026 15:55
@Binyang2014 Binyang2014 marked this pull request as draft June 19, 2026 22:19
jeseszhang1010 pushed a commit that referenced this pull request Jun 26, 2026
…high-level API

Unify the high-level MoECommunicator to select its backend from
MoECommunicatorConfig.mode:
- mode="ll": low-latency (EXPERT_MAJOR) via MoERuntime (reused from binyli/ep,
  PR #818). The LL runtime is built lazily, so a build that only binds the HT
  Buffer can still use mode="ht" without MoERuntime being present.
- mode="ht": high-throughput (FLAT) via the DeepEP-style Buffer; intranode vs
  internode is auto-selected from the RDMA buffer-size hint.

dispatch() gains an optional previous_handle that reuses the routing layout from
a prior dispatch with identical topk_ids (cached intranode dispatch also skips
notify_dispatch's host-side counter wait), letting a benchmark isolate the
on-GPU dispatch-kernel cost (NCCL-EP ep_bench convention).

Rewrite the intranode/internode HT benchmark loops to drive the public
MoECommunicator(mode="ht") API instead of raw Buffer calls. Export MoERuntime.

Validated on 1 node x 4 GB200 GPUs: correctness PASS; dispatch/combine match the
raw-Buffer baseline under identical env (no high-level overhead).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant