Skip to content

[Request]: MTP weights for speculative decoding #16

@adurham

Description

@adurham

Proposal

MiniMax-M2.7's config.json declares MTP architecture parameters (use_mtp: true, num_mtp_modules: 3, mtp_transformer_layers: 1), but the published safetensors contain no trained MTP weights — verified that 0 of the ~1369 tensor keys match the mtp.* pattern in the upstream FP8 release.

Are MTP weights planned for release? If yes, what is the rough timeline?

Why it matters

Without MTP, M2.7 is limited to single-token autoregressive decoding. With trained MTP heads + speculative decoding, peer models like Qwen3.5-397B-A17B (which ships trained MTP) achieve roughly 2–3× decode throughput, making them substantially more competitive than M2.7 for self-hosted inference latency despite their larger active-parameter count.

Related

Alternative

If trained MTP heads are not planned, publishing a small distilled MiniMax-M2.7 variant (~1–3B parameters) sharing the M2.7 tokenizer would let the community train EAGLE-style draft heads independently. Currently every published MiniMax-family model (M1 / M2 / M2.1 / M2.5 / M2.7 / Text-01) is 229B+ parameters, leaving no realistic draft-model option for tokenizer-aligned speculative decoding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions