Adding smoketest preset for simple correctness tests by gilbertlee-amd · Pull Request #266 · ROCm/TransferBench

gilbertlee-amd · 2026-04-25T10:21:34Z

Motivation

Adds a simple smoketest preset into TransferBench to perform simple correctness tests. Knobs exist to allow for more thorough tests, however defaults are chosen to try to keep run-times long.

Technical Details

This preset tests DMA/GFX copies between host and device, between two devices, as well as broadcast / gather and all to all.
It supports multi-rank, and has knobs for controlling the Transfer sizes to sweep (SIZE_LIST), as well as how many GFX subexecutors to use (GFX_SE_LIST). For performance reasons, SE_MAX_BYTES is used to limit the total amount of bytes a single GFX Subexecutor (CU/WGP) will Transfers - for example skipping tests copying 1GB using only 1 CU. TEST_LISTS can be used to specify a subset of tests to run, and accepts ranges such as 1-7. Copies are done in parallel

Test Plan

Testing of the various knobs were tried, as well as fault-injection to confirm that failures would be shown.

Test Result

Sample output:

[smoketest-preset Related]
CPU_MEM_TYPE         =            0 : Using default CPU (0=default, 1=coherent, 2=non-coherent, 3=uncached, 4=unpinned)
GFX_SE_LIST          =            2 : Testing GFX with subexecutor counts: 1,256
GPU_MEM_TYPE         =            0 : Using default GPU (0=default, 1=fine-grained, 2=uncached, 3=managed)
SIZE_LIST            =            3 : Transfer sizes tested: 1K,16M,256M
SE_MAX_BYTES         =         128M : Each SubExecutor can work on at most 134217728 bytes
TEST_LIST            =           14 : Running all tests

Running tests on 8 GPUs total across 1 rank(s)

                                                | GFX | GFX |
| Name                      | Test | DMA | Test | 001 | 256 |
|---------------------------|------|-----|------|-----|-----|
| Copy (H2D)                |  01  | *** |  08  | **. | .** |
| Copy (D2H)                |  02  | *** |  09  | **. | .** |
| Copy (D2D) (Remote Write) |  03  | *** |  10  | **. | .** |
| Copy (D2D) (Remote Read ) |  04  | *** |  11  | **. | .** |
| Broadcast  (One to All)   |  05  | *** |  12  | **. | .** |
| Gather     (All to One)   |  06  | *** |  13  | **. | .** |
| All To All                |  07  | *** |  14  | **. | .** |
|---------------------------|------|-----|------|-----|-----|

All tests passed

Copilot

Pull request overview

Adds a new smoketest preset to TransferBench to run a small suite of correctness-focused DMA/GFX transfer checks across common patterns (H2D/D2H/D2D, broadcast, gather, all-to-all), with environment-variable knobs to control sizes, subexecutor counts, and test selection.

Changes:

Introduces a new smoketest preset implementation and registers it in the preset dispatch map.
Adds EnvVars helpers for parsing string arrays and for printing string vectors.
Updates changelog; includes a small validation-path tweak and whitespace cleanup.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/client/Presets/SmokeTest.hpp	New smoketest preset logic, env-var knobs, and tabular output.
src/client/Presets/Presets.hpp	Registers the new `smoketest` preset.
src/client/EnvVars.hpp	Adds string-array env-var parsing + string-vector formatting helper.
src/header/TransferBench.hpp	Adjusts validation mismatch handling in `ValidateAllTransfers()`.
src/client/Utilities.hpp	Removes trailing whitespace.
CHANGELOG.md	Documents the new preset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T14:42:38Z

+          sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R%d%c%d %d %lu)",
+                  rank, MemTypeStr[isH2D ? cpuMemType : gpuMemType], isH2D ? cpuIdx : gpuIdx,
+                  rank, ExeTypeStr[exeType], gpuIdx,
+                  rank, MemTypeStr[isH2D ? gpuMemType : cpuMemType], isH2D ? gpuIdx : cpuIdx,
+                  numSubExec, numBytes);
+        } else if (isD2D_RW || isD2D_RR) {
+          // Copy from this GPU to "next" GPU
+          int dstRank = rank, dstGpuIdx = gpuIdx + 1;
+          if (dstGpuIdx >= GetNumExecutors(exeType, dstRank)) {
+            dstGpuIdx = 0;
+            dstRank = (rank+1) % numRanks;
+          }
+          sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R%d%c%d %d %lu)",
+                  rank, MemTypeStr[gpuMemType], gpuIdx,
+                  isD2D_RW ? rank : dstRank, ExeTypeStr[exeType], isD2D_RW ? gpuIdx : dstGpuIdx,
+                  dstRank, MemTypeStr[gpuMemType], dstGpuIdx,
+                  numSubExec, numBytes);
+        } else if (isBroadcast) {
+          // Split up the number of CUs across all Transfers
+          sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R*%c* %d %lu)",
+                  rank, MemTypeStr[gpuMemType], gpuIdx,
+                  rank, ExeTypeStr[exeType], gpuIdx,
+                  MemTypeStr[gpuMemType],
+                  numSubExec, numBytes);
+        } else if (isGather) {
+          // Split up the number of CUs across all Transfers
+          sprintf(transferStr, "-1 (R*%c* R%d%c%d R%d%c%d %d %lu)",
+                  MemTypeStr[gpuMemType],
+                  rank, ExeTypeStr[exeType], gpuIdx,
+                  rank, MemTypeStr[gpuMemType], gpuIdx,
+                  numSubExec, numBytes);
+        } else if (isAllToAll) {
+          // Split up the number of CUs across all Transfers
+          sprintf(transferStr, "-1 (R%d%c%d R%d%c%d R*%c* %d %lu)",
+                  rank, MemTypeStr[gpuMemType], gpuIdx,
+                  rank, ExeTypeStr[exeType], gpuIdx,
+                  MemTypeStr[gpuMemType],
+                  numSubExec, numBytes);


transferStr is a fixed 128-byte buffer, but the code uses sprintf() repeatedly to build strings that include ranks/indices/subexecutor counts and numBytes. This risks buffer overflow if values grow (or formatting changes). Please switch to snprintf() (and handle truncation) or build the string with std::string/ostringstream.

Copilot · 2026-04-25T14:42:39Z

+    for (int rank = 0; allPass && rank < numRanks; rank++) {
+      int numGpus = GetNumExecutors(exeType, rank);
+      for (int gpuIdx = 0; allPass && gpuIdx < numGpus; gpuIdx++) {
+        if (isH2D | isD2H) {


In the host<->device branch the condition uses bitwise OR (isH2D | isD2H). With booleans this happens to work, but it’s easy to misread and doesn’t short-circuit. Please use logical OR (||) for clarity/intent.

Suggested change

if (isH2D | isD2H) {

if (isH2D || isD2H) {

Copilot · 2026-04-25T14:42:39Z

+  std::string GetStr(std::vector<std::string> const& varnameList) const {
+    std::string result = "";
+    for (int i = 0; i < varnameList.size(); i++) {
+      if (i) result += ",";
+      result += varnameList[i];
+    }
+    return result;
+  }


The new GetStr(std::vectorstd::string ...) loop uses int i < varnameList.size(). This can trigger signed/unsigned warnings. Consider using size_t for the index or a range-based loop, consistent with typical STL usage.

Copilot · 2026-04-25T14:42:39Z

@@ -3805,7 +3805,6 @@ static bool IsConfiguredGid(union ibv_gid const& gid)
                transferIdx, i, dstIdx, t.dsts[dstIdx].memRank, expected[i], output[i]};
            }
          }


In ValidateAllTransfers(), if memcmp(output, expected, t.numBytes) is non-zero but the per-element loop doesn’t find a mismatch (e.g., +0.0 vs -0.0 compare equal as floats but differ by bytes), we now fall through and treat the transfer as valid. Please restore a fallback error return after the loop (or compare elements by bytes/bitwise) so any memcmp mismatch is always reported as an error.

Suggested change

}

}

return {ERR_FATAL, "Transfer %d: Destination %d on rank %d differs from expected output at the byte level, but no float-value mismatch was found",

transferIdx, dstIdx, t.dsts[dstIdx].memRank};

Copilot · 2026-04-25T14:42:39Z

+  static std::vector<std::string> GetEnvVarStrArray(std::string const& varname, std::vector<std::string> const& defaultValue)
+  {
+    if (getenv(varname.c_str())) {
+      std::vector<std::string> values;
+      char* arrayStr = getenv(varname.c_str());
+      char* token = strtok(arrayStr, ",");
+      while (token) {
+        values.push_back(token);
+        token = strtok(NULL, ",");
+      }
+      return values;


GetEnvVarStrArray() tokenizes the buffer returned by getenv() with strtok(), which mutates the underlying environment string and can cause surprising side effects (subsequent getenv("SIZE_LIST") calls will see a truncated value). It’s also not thread-safe. Please copy the getenv() value into a local std::string (or vector) and split it without modifying the environment buffer (e.g., manual split / stringstream / strtok_r on a copy).

Copilot · 2026-04-25T14:42:39Z

+  if (Utils::HasDuplicateHostname()) {
+    Utils::Print("[WARN] It is recommended to run TransferBench with one rank per host to avoid potential aliasing of executors\n");
+  }
+  return 0;


SmokeTestPreset() prints a warning when tests fail, but it always returns 0. For CI/scripting use, the preset should return a non-zero exit code when any test fails (e.g., return testsFailed ? 1 : 0).

Suggested change

return 0;

return testsFailed ? 1 : 0;

gilbertlee-amd added 4 commits April 25, 2026 04:45

Adding smoketest preset for simple correctness tests

4b85f28

Adjusting broadcast/allgather

6d94b95

Switching to using P to denote pass to make it easier to copy/paste

2a41b36

Changing display again

14d65da

gilbertlee-amd requested review from a team as code owners April 25, 2026 10:21

nileshnegi requested a review from Copilot April 25, 2026 14:37

Copilot started reviewing on behalf of nileshnegi April 25, 2026 14:38 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding smoketest preset for simple correctness tests#266

Adding smoketest preset for simple correctness tests#266
gilbertlee-amd wants to merge 4 commits intoROCm:candidatefrom
gilbertlee-amd:SmokeTest

gilbertlee-amd commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-          }
+          }
+          return {ERR_FATAL, "Transfer %d: Destination %d on rank %d differs from expected output at the byte level, but no float-value mismatch was found",
+            transferIdx, dstIdx, t.dsts[dstIdx].memRank};

Conversation

gilbertlee-amd commented Apr 25, 2026

Motivation

Technical Details

Test Plan

Test Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants