[Request]: MTP weights for speculative decoding

## Proposal

MiniMax-M2.7's `config.json` declares MTP architecture parameters (`use_mtp: true`, `num_mtp_modules: 3`, `mtp_transformer_layers: 1`), but the published safetensors contain no trained MTP weights — verified that 0 of the ~1369 tensor keys match the `mtp.*` pattern in the upstream FP8 release.

Are MTP weights planned for release? If yes, what is the rough timeline?

## Why it matters

Without MTP, M2.7 is limited to single-token autoregressive decoding. With trained MTP heads + speculative decoding, peer models like Qwen3.5-397B-A17B (which ships trained MTP) achieve roughly 2–3× decode throughput, making them substantially more competitive than M2.7 for self-hosted inference latency despite their larger active-parameter count.

## Related

- The corresponding request on the M2 repo (https://github.com/MiniMax-AI/MiniMax-M2/issues/47) has been open with no response.
- A community discussion on HF asking for draft-model recommendations has also gone unanswered: https://huggingface.co/MiniMaxAI/MiniMax-M2/discussions/9.

## Alternative

If trained MTP heads are not planned, publishing a small distilled MiniMax-M2.7 variant (~1–3B parameters) sharing the M2.7 tokenizer would let the community train EAGLE-style draft heads independently. Currently every published MiniMax-family model (M1 / M2 / M2.1 / M2.5 / M2.7 / Text-01) is 229B+ parameters, leaving no realistic draft-model option for tokenizer-aligned speculative decoding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request]: MTP weights for speculative decoding #16

Proposal

Why it matters

Related

Alternative

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Request]: MTP weights for speculative decoding #16

Description

Proposal

Why it matters

Related

Alternative

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions