fix: prevent prefill starvation under high decode load by grimoire · Pull Request #4532 · InternLM/lmdeploy

grimoire · 2026-04-16T05:04:29Z

Summary

Under high utilization (running + ready >= 50% max_batches) with small waiting requests (total tokens < max_prefill_token_num), do_prefill_default would always choose decode, starving waiting requests indefinitely
Add a consecutive decode counter (_decode_count) that forces a prefill after prefill_interval (default 16) decode rounds when requests are waiting
The counter resets only when a prefill actually produces inputs (not just when one is attempted), preventing the guard from being "burned" by failed allocation attempts

Copilot

Pull request overview

This PR updates the PyTorch engine input scheduling policy to prevent prefill starvation when decode load is high, by introducing a “max consecutive decode rounds” guard that forces a prefill attempt when requests are waiting.

Changes:

Add InputsMakerConfig.max_prefill_gap (sourced from engine.engine_config.prefill_interval) to control how many consecutive decode rounds are allowed before forcing prefill.
Track consecutive decode rounds via InputsMakerAsync._decode_count, forcing prefill once the threshold is reached while requests are waiting.
Reset the counter only when a prefill actually produces inputs (not merely when prefill is attempted).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Prevents waiting requests from being starved by continuous decode selection under high decode load in the PyTorch engine’s input-making/scheduling logic.

Changes:

Add prefill_interval to InputsMakerConfig (default 16) and plumb it from engine.engine_config.
Track consecutive decode decisions via _decode_count and force a prefill once the interval is reached while requests are waiting.
Reset _decode_count only when a non-decoding ModelInputs is actually produced.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

optimize prefill waiting time

caaadde

Copilot AI review requested due to automatic review settings April 16, 2026 05:04

Copilot started reviewing on behalf of grimoire April 16, 2026 05:05 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread lmdeploy/pytorch/engine/inputs_maker.py Outdated

Comment thread lmdeploy/pytorch/engine/inputs_maker.py Outdated

Comment thread lmdeploy/pytorch/engine/inputs_maker.py Outdated

grimoire changed the title ~~ix: prevent prefill starvation under high decode load~~ fix: prevent prefill starvation under high decode load Apr 16, 2026

fix comment

b36d6d5

grimoire requested a review from Copilot April 16, 2026 06:34

Copilot started reviewing on behalf of grimoire April 16, 2026 06:34 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread lmdeploy/pytorch/engine/inputs_maker.py Outdated

Comment thread lmdeploy/pytorch/engine/inputs_maker.py

grimoire added 2 commits April 16, 2026 14:57

Merge branch 'main' into less-prefill-waiting

3d37da8

check prefill_interval

3882434

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent prefill starvation under high decode load#4532

fix: prevent prefill starvation under high decode load#4532
grimoire wants to merge 4 commits intoInternLM:mainfrom
grimoire:less-prefill-waiting

grimoire commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

grimoire commented Apr 16, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants