Skip to content

Adding smoketest preset for simple correctness tests#266

Open
gilbertlee-amd wants to merge 4 commits intoROCm:candidatefrom
gilbertlee-amd:SmokeTest
Open

Adding smoketest preset for simple correctness tests#266
gilbertlee-amd wants to merge 4 commits intoROCm:candidatefrom
gilbertlee-amd:SmokeTest

Conversation

@gilbertlee-amd
Copy link
Copy Markdown
Collaborator

Motivation

Adds a simple smoketest preset into TransferBench to perform simple correctness tests. Knobs exist to allow for more thorough tests, however defaults are chosen to try to keep run-times long.

Technical Details

This preset tests DMA/GFX copies between host and device, between two devices, as well as broadcast / gather and all to all.
It supports multi-rank, and has knobs for controlling the Transfer sizes to sweep (SIZE_LIST), as well as how many GFX subexecutors to use (GFX_SE_LIST). For performance reasons, SE_MAX_BYTES is used to limit the total amount of bytes a single GFX Subexecutor (CU/WGP) will Transfers - for example skipping tests copying 1GB using only 1 CU. TEST_LISTS can be used to specify a subset of tests to run, and accepts ranges such as 1-7. Copies are done in parallel

Test Plan

Testing of the various knobs were tried, as well as fault-injection to confirm that failures would be shown.

Test Result

Sample output:

[smoketest-preset Related]
CPU_MEM_TYPE         =            0 : Using default CPU (0=default, 1=coherent, 2=non-coherent, 3=uncached, 4=unpinned)
GFX_SE_LIST          =            2 : Testing GFX with subexecutor counts: 1,256
GPU_MEM_TYPE         =            0 : Using default GPU (0=default, 1=fine-grained, 2=uncached, 3=managed)
SIZE_LIST            =            3 : Transfer sizes tested: 1K,16M,256M
SE_MAX_BYTES         =         128M : Each SubExecutor can work on at most 134217728 bytes
TEST_LIST            =           14 : Running all tests

Running tests on 8 GPUs total across 1 rank(s)

                                                | GFX | GFX |
| Name                      | Test | DMA | Test | 001 | 256 |
|---------------------------|------|-----|------|-----|-----|
| Copy (H2D)                |  01  | *** |  08  | **. | .** |
| Copy (D2H)                |  02  | *** |  09  | **. | .** |
| Copy (D2D) (Remote Write) |  03  | *** |  10  | **. | .** |
| Copy (D2D) (Remote Read ) |  04  | *** |  11  | **. | .** |
| Broadcast  (One to All)   |  05  | *** |  12  | **. | .** |
| Gather     (All to One)   |  06  | *** |  13  | **. | .** |
| All To All                |  07  | *** |  14  | **. | .** |
|---------------------------|------|-----|------|-----|-----|

All tests passed


Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new smoketest preset to TransferBench to run a small suite of correctness-focused DMA/GFX transfer checks across common patterns (H2D/D2H/D2D, broadcast, gather, all-to-all), with environment-variable knobs to control sizes, subexecutor counts, and test selection.

Changes:

  • Introduces a new smoketest preset implementation and registers it in the preset dispatch map.
  • Adds EnvVars helpers for parsing string arrays and for printing string vectors.
  • Updates changelog; includes a small validation-path tweak and whitespace cleanup.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/client/Presets/SmokeTest.hpp New smoketest preset logic, env-var knobs, and tabular output.
src/client/Presets/Presets.hpp Registers the new smoketest preset.
src/client/EnvVars.hpp Adds string-array env-var parsing + string-vector formatting helper.
src/header/TransferBench.hpp Adjusts validation mismatch handling in ValidateAllTransfers().
src/client/Utilities.hpp Removes trailing whitespace.
CHANGELOG.md Documents the new preset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +106 to +143
sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R%d%c%d %d %lu)",
rank, MemTypeStr[isH2D ? cpuMemType : gpuMemType], isH2D ? cpuIdx : gpuIdx,
rank, ExeTypeStr[exeType], gpuIdx,
rank, MemTypeStr[isH2D ? gpuMemType : cpuMemType], isH2D ? gpuIdx : cpuIdx,
numSubExec, numBytes);
} else if (isD2D_RW || isD2D_RR) {
// Copy from this GPU to "next" GPU
int dstRank = rank, dstGpuIdx = gpuIdx + 1;
if (dstGpuIdx >= GetNumExecutors(exeType, dstRank)) {
dstGpuIdx = 0;
dstRank = (rank+1) % numRanks;
}
sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R%d%c%d %d %lu)",
rank, MemTypeStr[gpuMemType], gpuIdx,
isD2D_RW ? rank : dstRank, ExeTypeStr[exeType], isD2D_RW ? gpuIdx : dstGpuIdx,
dstRank, MemTypeStr[gpuMemType], dstGpuIdx,
numSubExec, numBytes);
} else if (isBroadcast) {
// Split up the number of CUs across all Transfers
sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R*%c* %d %lu)",
rank, MemTypeStr[gpuMemType], gpuIdx,
rank, ExeTypeStr[exeType], gpuIdx,
MemTypeStr[gpuMemType],
numSubExec, numBytes);
} else if (isGather) {
// Split up the number of CUs across all Transfers
sprintf(transferStr, "-1 (R*%c* R%d%c%d R%d%c%d %d %lu)",
MemTypeStr[gpuMemType],
rank, ExeTypeStr[exeType], gpuIdx,
rank, MemTypeStr[gpuMemType], gpuIdx,
numSubExec, numBytes);
} else if (isAllToAll) {
// Split up the number of CUs across all Transfers
sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R*%c* %d %lu)",
rank, MemTypeStr[gpuMemType], gpuIdx,
rank, ExeTypeStr[exeType], gpuIdx,
MemTypeStr[gpuMemType],
numSubExec, numBytes);
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transferStr is a fixed 128-byte buffer, but the code uses sprintf() repeatedly to build strings that include ranks/indices/subexecutor counts and numBytes. This risks buffer overflow if values grow (or formatting changes). Please switch to snprintf() (and handle truncation) or build the string with std::string/ostringstream.

Copilot uses AI. Check for mistakes.
for (int rank = 0; allPass && rank < numRanks; rank++) {
int numGpus = GetNumExecutors(exeType, rank);
for (int gpuIdx = 0; allPass && gpuIdx < numGpus; gpuIdx++) {
if (isH2D | isD2H) {
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the host<->device branch the condition uses bitwise OR (isH2D | isD2H). With booleans this happens to work, but it’s easy to misread and doesn’t short-circuit. Please use logical OR (||) for clarity/intent.

Suggested change
if (isH2D | isD2H) {
if (isH2D || isD2H) {

Copilot uses AI. Check for mistakes.
Comment thread src/client/EnvVars.hpp
Comment on lines +597 to +604
std::string GetStr(std::vector<std::string> const& varnameList) const {
std::string result = "";
for (int i = 0; i < varnameList.size(); i++) {
if (i) result += ",";
result += varnameList[i];
}
return result;
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new GetStr(std::vectorstd::string ...) loop uses int i < varnameList.size(). This can trigger signed/unsigned warnings. Consider using size_t for the index or a range-based loop, consistent with typical STL usage.

Copilot uses AI. Check for mistakes.
@@ -3805,7 +3805,6 @@ static bool IsConfiguredGid(union ibv_gid const& gid)
transferIdx, i, dstIdx, t.dsts[dstIdx].memRank, expected[i], output[i]};
}
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ValidateAllTransfers(), if memcmp(output, expected, t.numBytes) is non-zero but the per-element loop doesn’t find a mismatch (e.g., +0.0 vs -0.0 compare equal as floats but differ by bytes), we now fall through and treat the transfer as valid. Please restore a fallback error return after the loop (or compare elements by bytes/bitwise) so any memcmp mismatch is always reported as an error.

Suggested change
}
}
return {ERR_FATAL, "Transfer %d: Destination %d on rank %d differs from expected output at the byte level, but no float-value mismatch was found",
transferIdx, dstIdx, t.dsts[dstIdx].memRank};

Copilot uses AI. Check for mistakes.
Comment thread src/client/EnvVars.hpp
Comment on lines +541 to +551
static std::vector<std::string> GetEnvVarStrArray(std::string const& varname, std::vector<std::string> const& defaultValue)
{
if (getenv(varname.c_str())) {
std::vector<std::string> values;
char* arrayStr = getenv(varname.c_str());
char* token = strtok(arrayStr, ",");
while (token) {
values.push_back(token);
token = strtok(NULL, ",");
}
return values;
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetEnvVarStrArray() tokenizes the buffer returned by getenv() with strtok(), which mutates the underlying environment string and can cause surprising side effects (subsequent getenv("SIZE_LIST") calls will see a truncated value). It’s also not thread-safe. Please copy the getenv() value into a local std::string (or vector) and split it without modifying the environment buffer (e.g., manual split / stringstream / strtok_r on a copy).

Copilot uses AI. Check for mistakes.
if (Utils::HasDuplicateHostname()) {
Utils::Print("[WARN] It is recommended to run TransferBench with one rank per host to avoid potential aliasing of executors\n");
}
return 0;
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SmokeTestPreset() prints a warning when tests fail, but it always returns 0. For CI/scripting use, the preset should return a non-zero exit code when any test fails (e.g., return testsFailed ? 1 : 0).

Suggested change
return 0;
return testsFailed ? 1 : 0;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants