CI: build relocatable packages via TheRock SDK#260
Conversation
Adds a CI workflow that builds DEB/RPM/TGZ packages of TransferBench against the TheRock nightly ROCm SDK, modeled on the ROCmValidationSuite packaging workflow. Packages install to /opt/rocm/extras-<MAJOR> with $ORIGIN-relative RPATH so they are relocatable. - build_packages_local.sh: single source of truth for both local and CI builds. Detects Ubuntu vs AlmaLinux/manylinux, installs deps, fetches TheRock SDK tarball, configures CMake with relocatable RPATH and the new BUILD_RELOCATABLE_PACKAGE option, builds, and invokes CPack for DEB/RPM/TGZ. - .github/workflows/build-relocatable-packages.yml: parallel Ubuntu 22.04 + manylinux_2_28 jobs triggered on push, PR, daily cron, and workflow_dispatch. OIDC-based S3 upload gated on AWS_S3_BUCKET being set; apt/yum repo metadata generated for non-PR builds. Build report artifact summarizes S3 paths. - .github/workflows/README_BUILD_PACKAGES.md: workflow docs covering triggers, local usage, S3 layout, IAM trust policy, and apt/yum install snippets. - CMakeLists.txt: new BUILD_RELOCATABLE_PACKAGE option that bypasses rocm_install/rocm_create_package, names the package amdrocm<MAJOR>-transferbench, and honors caller-set install prefix and CPACK_*_PACKAGE_RELEASE env vars. Default cmake .. behavior is unchanged. Co-Authored-By: Claude Opus 4 <[email protected]>
The tarballs are published as: therock-dist-linux-<family>-<version>.tar.gz not <family>-<version>.tar.gz as I had it. Also TheRock does not publish a per-family LATEST.txt, so the auto-fetch path now lists the bucket via S3 ListObjectsV2 and picks the highest version key with `sort -V`. Updates the pinned fallback to a version that actually exists on the bucket today (7.13.0a20260423). Co-Authored-By: Claude Opus 4 <[email protected]>
The bucket includes non-release ad-hoc builds (e.g. therock-dist-linux-gfx94X-dcgpu-ADHOCBUILD-7.0.0rc20250625.tar.gz) which `sort -V | tail -1` was selecting because 'A' lexically sorts after digits. The downstream `printf '%02d'` then crashed trying to parse `ADHOCBUILD-7` as the ROCm major. Restrict the auto-fetch grep to keys matching <prefix><MAJOR>.<MINOR>.<patch+suffix>.tar.gz so only properly versioned releases are considered. Co-Authored-By: Claude Opus 4 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds CI + local tooling to build relocatable TransferBench packages (DEB/RPM/TGZ) against the TheRock nightly ROCm SDK, including a new CMake switch to use direct CPack-based packaging and new GitHub Actions workflow/docs.
Changes:
- Add
build_packages_local.shto fetch a TheRock SDK tarball, configure/build, and emit DEB/RPM/TGZ via CPack. - Add
BUILD_RELOCATABLE_PACKAGECMake option to bypassrocm_create_packageand produceamdrocm<MAJOR>-transferbenchpackages. - Add GitHub Actions workflow + README to build/upload artifacts (and optionally publish to S3 with repo metadata).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| build_packages_local.sh | New local/CI build script that installs deps, fetches TheRock SDK, builds, and runs CPack. |
| CMakeLists.txt | Adds relocatable packaging mode with direct CPack configuration and new option flag. |
| .github/workflows/build-relocatable-packages.yml | New workflow to build packages on Ubuntu + manylinux and optionally upload to S3. |
| .github/workflows/README_BUILD_PACKAGES.md | Documentation for triggers, local usage, and S3 repo layout/consumption. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The <barrier> header is C++20-only and missing from manylinux_2_28's libstdc++ (gcc 8.5 baseline), breaking the relocatable RPM build. No std::barrier / std::latch usage exists in the codebase, so the include is dead. Co-Authored-By: Claude Opus 4 <[email protected]>
- Gate S3 upload + repo metadata steps on github.repository ==
'ROCm/TransferBench' so forks that set AWS_S3_BUCKET don't try
unauthenticated uploads.
- Add EUID root check to build_packages_local.sh up front (clearer
error than failing later in apt/dnf).
- Rename TAROBALL_BASE -> TARBALL_BASE (typo).
- Sanitize PKG_RELEASE: collapse non-alphanumerics into dots so
feature-branch names stay valid in DEB/RPM release fields.
- Drop ${ROCM_PATH}/lib from RPATH_LIST so the ephemeral SDK
download path is not embedded into the packaged binary.
- Make CPACK_RPM_EXCLUDE_FROM_AUTO_FILELIST_ADDITION track the
actual CPACK_PACKAGING_INSTALL_PREFIX instead of hard-coded
/opt/rocm/extras-N paths.
Co-Authored-By: Claude Opus 4 <[email protected]>
manylinux_2_28's libstdc++ (gcc 8.5) doesn't transitively pull
<iomanip>, <numeric>, or <limits> the way newer libstdc++ does.
Add direct includes where the symbols are used:
- src/client/Utilities.hpp: <iomanip> for std::setprecision/setfill/setw
- src/client/Presets/NicPeerToPeer.hpp: <numeric> for std::iota
- src/client/Presets/AllToAll{,N}.hpp: <limits> for std::numeric_limits
Modern toolchains compile fine without these because <iostream>,
<vector>, etc. happen to drag them in; older standard libraries
don't. Fixes the relocatable RPM build under manylinux_2_28.
Co-Authored-By: Claude Opus 4 <[email protected]>
manylinux_2_28's gcc 8.x baseline ships std::filesystem in a separate library libstdc++fs that must be linked explicitly. On gcc 9+ libstdc++fs exists as a backward-compat stub, so linking it unconditionally when found is harmless on newer toolchains. Resolves the link-time failures: ld.lld: error: undefined symbol: std::filesystem::status(...) ld.lld: error: undefined symbol: std::filesystem::canonical(...) Co-Authored-By: Claude Opus 4 <[email protected]>
|
do you mind moving the header file changes to a different PR? |
find_library(stdc++fs) doesn't search gcc-private libdirs like /usr/lib/gcc/x86_64-redhat-linux/8/, where AlmaLinux 8 / manylinux_2_28 ships libstdc++fs.a. The compiler-driven `-lstdc++fs` does. On gcc 9+ toolchains this resolves to a no-op stub archive — harmless. Verified locally by reproducing the manylinux_2_28 build inside quay.io/pypa/manylinux_2_28_x86_64 (see repro_manylinux.sh). Co-Authored-By: Claude Opus 4 <[email protected]>
manylinux_2_28 build fixes — summaryGetting the 1. Remove unused
|
|
Closing this PR. Will create 2 separate PRs for this. |
|
Closing in favor of two narrower PRs:
No work is being thrown away — the same commits are being re-landed on the two new branches in cleaner form. The source branch |
Summary
.github/workflows/build-relocatable-packages.ymlandbuild_packages_local.shto build relocatable DEB/RPM/TGZ packages of TransferBench against the TheRock nightly ROCm SDK./opt/rocm/extras-<MAJOR>with\$ORIGIN-relative RPATH.BUILD_RELOCATABLE_PACKAGECMake option that switches packaging away fromrocm_create_packageand usesamdrocm<MAJOR>-transferbenchnaming. Defaultcmake ..behavior is unchanged..github/workflows/README_BUILD_PACKAGES.mddocumenting triggers, local usage, S3 path layout, IAM trust policy, and apt/yum install snippets.This PR opens primarily to exercise the new workflow in CI. The
build-ubuntuandbuild-manylinuxjobs should both succeed; S3 upload is gated onAWS_S3_BUCKETrepo variable being set, so it will skip cleanly until that is configured.Test plan
build-ubuntujob builds DEB + TGZ and uploads them as artifactsbuild-manylinuxjob builds RPM + TGZ and uploads them as artifactsrelease-summaryjob produces abuild-reportartifactsudo -E ./build_packages_local.shproduces relocatable packages🤖 Generated with Claude Code