Pod Ring preset#251
Open
AtlantaPepsi wants to merge 3 commits intoROCm:candidatefrom
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new “podring” preset and centralizes several scheduling helper utilities so they can be reused across presets.
Changes:
- Added a new
PodRingPresetto run intra-pod ring transfers (optionally with NIC queue-pair transfers) and print per-group summaries. - Moved common helper routines (
StrideGenerate,RoundRobinSchedule,CombinationSchedule) intoTransferBench::Utils. - Updated existing presets to call the new
Utils::helper implementations.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/client/Utilities.hpp | Adds shared scheduling / indexing helpers used by multiple presets. |
| src/client/Presets/Presets.hpp | Registers the new podring preset and includes its header. |
| src/client/Presets/PodRing.hpp | New preset implementing ring transfers within pod subgroups. |
| src/client/Presets/PodPeerToPeer.hpp | Switches round-robin scheduling call to Utils::RoundRobinSchedule. |
| src/client/Presets/PodAllToAll.hpp | Removes local stride helper and uses Utils::StrideGenerate. |
| src/client/Presets/NicPeerToPeer.hpp | Removes local scheduling helpers and uses Utils::RoundRobinSchedule / Utils::CombinationSchedule. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+194
to
+197
| double gfxMax = std::numeric_limits<double>::min(); | ||
| double nicMin = std::numeric_limits<double>::max(); | ||
| double nicAvg = 0.0; | ||
| double nicMax = std::numeric_limits<double>::min(); |
There was a problem hiding this comment.
For initializing max values, std::numeric_limits<double>::min() is the smallest positive normalized value, not the most-negative value. If a bandwidth can be 0.0, gfxMax/nicMax would incorrectly remain ~2e-308. Use std::numeric_limits<double>::lowest() (or initialize to -std::numeric_limits<double>::infinity() / 0.0) for max initializers.
Suggested change
| double gfxMax = std::numeric_limits<double>::min(); | |
| double nicMin = std::numeric_limits<double>::max(); | |
| double nicAvg = 0.0; | |
| double nicMax = std::numeric_limits<double>::min(); | |
| double gfxMax = std::numeric_limits<double>::lowest(); | |
| double nicMin = std::numeric_limits<double>::max(); | |
| double nicAvg = 0.0; | |
| double nicMax = std::numeric_limits<double>::lowest(); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
We need a intra-pod ring preset, similar to
nicringspreset, to simulate potential patterns used by RCCLTechnical Details
Similar to
poda2apreset, we have the option to reorder all detectable devices according to user-input stride, then divide reordered devices into subgroups of user specified size. Each subgroup will be a ring.Test Plan
Test Result
Example: on 2 nodes each with 4 GPU
Submission Checklist