Adding smoketest preset for simple correctness tests#266
Adding smoketest preset for simple correctness tests#266gilbertlee-amd wants to merge 4 commits intoROCm:candidatefrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new smoketest preset to TransferBench to run a small suite of correctness-focused DMA/GFX transfer checks across common patterns (H2D/D2H/D2D, broadcast, gather, all-to-all), with environment-variable knobs to control sizes, subexecutor counts, and test selection.
Changes:
- Introduces a new
smoketestpreset implementation and registers it in the preset dispatch map. - Adds EnvVars helpers for parsing string arrays and for printing string vectors.
- Updates changelog; includes a small validation-path tweak and whitespace cleanup.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/client/Presets/SmokeTest.hpp | New smoketest preset logic, env-var knobs, and tabular output. |
| src/client/Presets/Presets.hpp | Registers the new smoketest preset. |
| src/client/EnvVars.hpp | Adds string-array env-var parsing + string-vector formatting helper. |
| src/header/TransferBench.hpp | Adjusts validation mismatch handling in ValidateAllTransfers(). |
| src/client/Utilities.hpp | Removes trailing whitespace. |
| CHANGELOG.md | Documents the new preset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R%d%c%d %d %lu)", | ||
| rank, MemTypeStr[isH2D ? cpuMemType : gpuMemType], isH2D ? cpuIdx : gpuIdx, | ||
| rank, ExeTypeStr[exeType], gpuIdx, | ||
| rank, MemTypeStr[isH2D ? gpuMemType : cpuMemType], isH2D ? gpuIdx : cpuIdx, | ||
| numSubExec, numBytes); | ||
| } else if (isD2D_RW || isD2D_RR) { | ||
| // Copy from this GPU to "next" GPU | ||
| int dstRank = rank, dstGpuIdx = gpuIdx + 1; | ||
| if (dstGpuIdx >= GetNumExecutors(exeType, dstRank)) { | ||
| dstGpuIdx = 0; | ||
| dstRank = (rank+1) % numRanks; | ||
| } | ||
| sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R%d%c%d %d %lu)", | ||
| rank, MemTypeStr[gpuMemType], gpuIdx, | ||
| isD2D_RW ? rank : dstRank, ExeTypeStr[exeType], isD2D_RW ? gpuIdx : dstGpuIdx, | ||
| dstRank, MemTypeStr[gpuMemType], dstGpuIdx, | ||
| numSubExec, numBytes); | ||
| } else if (isBroadcast) { | ||
| // Split up the number of CUs across all Transfers | ||
| sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R*%c* %d %lu)", | ||
| rank, MemTypeStr[gpuMemType], gpuIdx, | ||
| rank, ExeTypeStr[exeType], gpuIdx, | ||
| MemTypeStr[gpuMemType], | ||
| numSubExec, numBytes); | ||
| } else if (isGather) { | ||
| // Split up the number of CUs across all Transfers | ||
| sprintf(transferStr, "-1 (R*%c* R%d%c%d R%d%c%d %d %lu)", | ||
| MemTypeStr[gpuMemType], | ||
| rank, ExeTypeStr[exeType], gpuIdx, | ||
| rank, MemTypeStr[gpuMemType], gpuIdx, | ||
| numSubExec, numBytes); | ||
| } else if (isAllToAll) { | ||
| // Split up the number of CUs across all Transfers | ||
| sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R*%c* %d %lu)", | ||
| rank, MemTypeStr[gpuMemType], gpuIdx, | ||
| rank, ExeTypeStr[exeType], gpuIdx, | ||
| MemTypeStr[gpuMemType], | ||
| numSubExec, numBytes); |
There was a problem hiding this comment.
transferStr is a fixed 128-byte buffer, but the code uses sprintf() repeatedly to build strings that include ranks/indices/subexecutor counts and numBytes. This risks buffer overflow if values grow (or formatting changes). Please switch to snprintf() (and handle truncation) or build the string with std::string/ostringstream.
| for (int rank = 0; allPass && rank < numRanks; rank++) { | ||
| int numGpus = GetNumExecutors(exeType, rank); | ||
| for (int gpuIdx = 0; allPass && gpuIdx < numGpus; gpuIdx++) { | ||
| if (isH2D | isD2H) { |
There was a problem hiding this comment.
In the host<->device branch the condition uses bitwise OR (isH2D | isD2H). With booleans this happens to work, but it’s easy to misread and doesn’t short-circuit. Please use logical OR (||) for clarity/intent.
| if (isH2D | isD2H) { | |
| if (isH2D || isD2H) { |
| std::string GetStr(std::vector<std::string> const& varnameList) const { | ||
| std::string result = ""; | ||
| for (int i = 0; i < varnameList.size(); i++) { | ||
| if (i) result += ","; | ||
| result += varnameList[i]; | ||
| } | ||
| return result; | ||
| } |
There was a problem hiding this comment.
The new GetStr(std::vectorstd::string ...) loop uses int i < varnameList.size(). This can trigger signed/unsigned warnings. Consider using size_t for the index or a range-based loop, consistent with typical STL usage.
| @@ -3805,7 +3805,6 @@ static bool IsConfiguredGid(union ibv_gid const& gid) | |||
| transferIdx, i, dstIdx, t.dsts[dstIdx].memRank, expected[i], output[i]}; | |||
| } | |||
| } | |||
There was a problem hiding this comment.
In ValidateAllTransfers(), if memcmp(output, expected, t.numBytes) is non-zero but the per-element loop doesn’t find a mismatch (e.g., +0.0 vs -0.0 compare equal as floats but differ by bytes), we now fall through and treat the transfer as valid. Please restore a fallback error return after the loop (or compare elements by bytes/bitwise) so any memcmp mismatch is always reported as an error.
| } | |
| } | |
| return {ERR_FATAL, "Transfer %d: Destination %d on rank %d differs from expected output at the byte level, but no float-value mismatch was found", | |
| transferIdx, dstIdx, t.dsts[dstIdx].memRank}; |
| static std::vector<std::string> GetEnvVarStrArray(std::string const& varname, std::vector<std::string> const& defaultValue) | ||
| { | ||
| if (getenv(varname.c_str())) { | ||
| std::vector<std::string> values; | ||
| char* arrayStr = getenv(varname.c_str()); | ||
| char* token = strtok(arrayStr, ","); | ||
| while (token) { | ||
| values.push_back(token); | ||
| token = strtok(NULL, ","); | ||
| } | ||
| return values; |
There was a problem hiding this comment.
GetEnvVarStrArray() tokenizes the buffer returned by getenv() with strtok(), which mutates the underlying environment string and can cause surprising side effects (subsequent getenv("SIZE_LIST") calls will see a truncated value). It’s also not thread-safe. Please copy the getenv() value into a local std::string (or vector) and split it without modifying the environment buffer (e.g., manual split / stringstream / strtok_r on a copy).
| if (Utils::HasDuplicateHostname()) { | ||
| Utils::Print("[WARN] It is recommended to run TransferBench with one rank per host to avoid potential aliasing of executors\n"); | ||
| } | ||
| return 0; |
There was a problem hiding this comment.
SmokeTestPreset() prints a warning when tests fail, but it always returns 0. For CI/scripting use, the preset should return a non-zero exit code when any test fails (e.g., return testsFailed ? 1 : 0).
| return 0; | |
| return testsFailed ? 1 : 0; |
Motivation
Adds a simple smoketest preset into TransferBench to perform simple correctness tests. Knobs exist to allow for more thorough tests, however defaults are chosen to try to keep run-times long.
Technical Details
This preset tests DMA/GFX copies between host and device, between two devices, as well as broadcast / gather and all to all.
It supports multi-rank, and has knobs for controlling the Transfer sizes to sweep (SIZE_LIST), as well as how many GFX subexecutors to use (GFX_SE_LIST). For performance reasons, SE_MAX_BYTES is used to limit the total amount of bytes a single GFX Subexecutor (CU/WGP) will Transfers - for example skipping tests copying 1GB using only 1 CU. TEST_LISTS can be used to specify a subset of tests to run, and accepts ranges such as 1-7. Copies are done in parallel
Test Plan
Testing of the various knobs were tried, as well as fault-injection to confirm that failures would be shown.
Test Result
Sample output: