-
Notifications
You must be signed in to change notification settings - Fork 680
[BugFix] Refine the preparation of cpu and storage cache #5777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the cache preparation logic to unify the handling of CPU and storage cache within the prefix_cache_manager module. The main motivation is to consolidate scattered cache preparation code and eliminate duplication between CPU and storage cache handling.
Key Changes:
- Integrated storage cache matching and preparation logic into the
request_match_blocksmethod in prefix_cache_manager - Removed the separate
get_storage_cached_blocksmethod from resource_manager_v1 (though calls to it remain in some code paths outside this diff) - Split the
gpu_cpu_cache_prepare_timemetric into separatecpu_cache_prepare_timeandstorage_cache_prepare_timemetrics for better observability
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/engine/sched/resource_manager_v1.py | Removed duplicate storage cache logic and unused import; updated get_prefix_cached_blocks to handle new consolidated metrics |
| fastdeploy/cache_manager/prefix_cache_manager.py | Integrated storage cache matching into request_match_blocks; added storage cache preparation logic with proper block allocation and recycling; updated return value of issue_prefetch_storage_task to return count instead of list |
| fastdeploy/engine/request.py | Renamed metric field from gpu_cpu_cache_prepare_time to cpu_cache_prepare_time to better reflect its purpose |
| fastdeploy/cache_manager/cache_metrics.py | Added storage cache token tracking throughout metrics calculation and reporting |
| fastdeploy/cache_manager/cache_transfer_manager.py | Improved debug logging by showing counts instead of full lists for better performance |
| benchmarks/benchmark_serving.py | Updated to use new cpu_cache_prepare_time metric name |
| benchmarks/backend_request_func.py | Updated to use new cpu_cache_prepare_time metric name |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5777 +/- ##
==========================================
Coverage ? 66.71%
==========================================
Files ? 347
Lines ? 44354
Branches ? 6810
==========================================
Hits ? 29591
Misses ? 12578
Partials ? 2185
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Motivation
统一准备cpu和storage的cache
Modifications
prefix-cache-manager模块
Usage or Command
不变
Accuracy Tests
后续更新ci镜像后添加单侧
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.