Skip to content

CI: build relocatable packages via TheRock SDK#262

Open
thananon wants to merge 7 commits intodevelopfrom
users/thananon/therock-packages-workflow-v2
Open

CI: build relocatable packages via TheRock SDK#262
thananon wants to merge 7 commits intodevelopfrom
users/thananon/therock-packages-workflow-v2

Conversation

@thananon
Copy link
Copy Markdown
Contributor

Summary

Adds GitHub Actions workflow and build script for producing relocatable packages (DEB/RPM/TGZ) via the TheRock SDK. This workflow is the primary path for continuous integration and nightly/release builds of TransferBench packages, enabling self-contained deployments without requiring a pre-installed ROCm stack on the target machine.

Prerequisites: Merged #261 (manylinux portability fixes) to ensure builds succeed on manylinux_2_28 (AlmaLinux 8) with gcc 8-era libstdc++.

Implementation

  • .github/workflows/build-relocatable-packages.yml: Two-job workflow (Ubuntu 22.04 → DEB/TGZ, manylinux_2_28 → RPM/TGZ) that fetches the latest TheRock SDK tarball, builds with -DBUILD_RELOCATABLE_PACKAGE=ON, and uploads artifacts to GitHub Actions + optional S3 (via OIDC).
  • build_packages_local.sh: Standalone script that downloads TheRock SDK tarballs from S3 by ROCm version and GPU family (gfx94X-dcgpu, gfx950-dcgpu, gfx110X-all, gfx120X-all, gfx1151), extracts them, drives CMake with relocatable RPATH, and produces amdrocm<MAJOR>-transferbench packages.
  • CMake additions: New BUILD_RELOCATABLE_PACKAGE option that bypasses rocm_install/rocm_create_package and drives CPack directly with version-namespaced package naming (amdrocm7-transferbench) and relocatable RPATH.

S3 and OIDC

Workflow uploads packages to S3 using GitHub OIDC for authentication (no long-lived credentials). S3 paths:

  • Nightly/develop: s3://.../nightly/transferbench/{deb,rpm,tar} with apt/yum repo metadata.
  • Release branches: s3://.../release/transferbench/{deb,rpm,tar} with apt/yum metadata.
  • Pull requests: s3://.../transferbench/<branch>/<run-number>/{ubuntu-22.04,manylinux_2_28} (no metadata).

S3 upload only executes in the ROCm/TransferBench repository and requires AWS_S3_BUCKET (organization variable) and AWS_ROLE_ARN (organization secret).

Supported GPU Families

TheRock SDK tarballs are fetched per GPU family:

  • gfx94X-dcgpu: gfx940, gfx941, gfx942, gfx943 (MI300 series)
  • gfx950-dcgpu: gfx950 (upcoming MI400 architecture)
  • gfx110X-all: gfx1100, gfx1101, gfx1102 (RDNA3 client)
  • gfx120X-all: gfx1200, gfx1201 (RDNA4)
  • gfx1151: gfx1151 (Phoenix APU)

The workflow defaults to gfx94X-dcgpu (data-center workhorse); other families can be selected via workflow_dispatch input.

Trigger Conditions

  • Push to develop, mainline, release/**, users/thananon/therock-packages-workflow-v2 (the last one is temporary for pre-merge validation of this PR; will be removed after merge)
  • Pull request to develop or mainline
  • Daily scheduled run at 13:00 UTC (5:00 AM PST)
  • Manual via workflow_dispatch with optional ROCm version/GPU family override

Testing

Local invocation (no GitHub Actions):

export GPU_FAMILY=gfx94X-dcgpu BUILD_TYPE=Release
./build_packages_local.sh
ls build/*.{deb,rpm,tar.gz}

The workflow triggers on push to this PR's branch (users/thananon/therock-packages-workflow-v2) so the first CI run will validate the full build→package→upload flow before the PR is merged.

thananon and others added 5 commits April 23, 2026 18:31
Adds a CI workflow that builds DEB/RPM/TGZ packages of TransferBench
against the TheRock nightly ROCm SDK, modeled on the
ROCmValidationSuite packaging workflow. Packages install to
/opt/rocm/extras-<MAJOR> with $ORIGIN-relative RPATH so they are
relocatable.

- build_packages_local.sh: single source of truth for both local and
  CI builds. Detects Ubuntu vs AlmaLinux/manylinux, installs deps,
  fetches TheRock SDK tarball, configures CMake with relocatable
  RPATH and the new BUILD_RELOCATABLE_PACKAGE option, builds, and
  invokes CPack for DEB/RPM/TGZ.
- .github/workflows/build-relocatable-packages.yml: parallel
  Ubuntu 22.04 + manylinux_2_28 jobs triggered on push, PR, daily
  cron, and workflow_dispatch. OIDC-based S3 upload gated on
  AWS_S3_BUCKET being set; apt/yum repo metadata generated for
  non-PR builds. Build report artifact summarizes S3 paths.
- .github/workflows/README_BUILD_PACKAGES.md: workflow docs covering
  triggers, local usage, S3 layout, IAM trust policy, and apt/yum
  install snippets.
- CMakeLists.txt: new BUILD_RELOCATABLE_PACKAGE option that bypasses
  rocm_install/rocm_create_package, names the package
  amdrocm<MAJOR>-transferbench, and honors caller-set install prefix
  and CPACK_*_PACKAGE_RELEASE env vars. Default cmake .. behavior is
  unchanged.

Co-Authored-By: Claude Opus 4 <[email protected]>
The tarballs are published as:
  therock-dist-linux-<family>-<version>.tar.gz

not <family>-<version>.tar.gz as I had it. Also TheRock does not
publish a per-family LATEST.txt, so the auto-fetch path now lists
the bucket via S3 ListObjectsV2 and picks the highest version key
with `sort -V`. Updates the pinned fallback to a version that
actually exists on the bucket today (7.13.0a20260423).

Co-Authored-By: Claude Opus 4 <[email protected]>
The bucket includes non-release ad-hoc builds (e.g.
therock-dist-linux-gfx94X-dcgpu-ADHOCBUILD-7.0.0rc20250625.tar.gz)
which `sort -V | tail -1` was selecting because 'A' lexically
sorts after digits. The downstream `printf '%02d'` then crashed
trying to parse `ADHOCBUILD-7` as the ROCm major.

Restrict the auto-fetch grep to keys matching
  <prefix><MAJOR>.<MINOR>.<patch+suffix>.tar.gz

so only properly versioned releases are considered.

Co-Authored-By: Claude Opus 4 <[email protected]>
- Gate S3 upload + repo metadata steps on github.repository ==
  'ROCm/TransferBench' so forks that set AWS_S3_BUCKET don't try
  unauthenticated uploads.
- Add EUID root check to build_packages_local.sh up front (clearer
  error than failing later in apt/dnf).
- Rename TAROBALL_BASE -> TARBALL_BASE (typo).
- Sanitize PKG_RELEASE: collapse non-alphanumerics into dots so
  feature-branch names stay valid in DEB/RPM release fields.
- Drop ${ROCM_PATH}/lib from RPATH_LIST so the ephemeral SDK
  download path is not embedded into the packaged binary.
- Make CPACK_RPM_EXCLUDE_FROM_AUTO_FILELIST_ADDITION track the
  actual CPACK_PACKAGING_INSTALL_PREFIX instead of hard-coded
  /opt/rocm/extras-N paths.

Co-Authored-By: Claude Opus 4 <[email protected]>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a GitHub Actions-based CI path (and a local build script) to produce relocatable TransferBench packages using TheRock ROCm SDK, with a new CMake packaging mode that drives CPack directly.

Changes:

  • Added build_packages_local.sh to download TheRock SDK, configure relocatable RPATH, build, and run CPack to emit DEB/RPM/TGZ.
  • Added .github/workflows/build-relocatable-packages.yml to build packages on Ubuntu 22.04 and manylinux_2_28, upload artifacts, and optionally publish to S3 via OIDC with repo metadata generation.
  • Updated CMakeLists.txt with a BUILD_RELOCATABLE_PACKAGE option to bypass rocm_create_package and set CPack metadata/package naming.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
build_packages_local.sh Local/CI build driver: installs deps, downloads TheRock SDK, configures relocatable install/RPATH, builds + runs CPack.
CMakeLists.txt Adds relocatable packaging mode and new package naming/CPack settings.
.github/workflows/build-relocatable-packages.yml CI workflow to build/package on Ubuntu + manylinux, upload artifacts, and optionally push to S3 with OIDC + repo metadata.
.github/workflows/README_BUILD_PACKAGES.md Documentation for the new workflow/script, triggers, S3 layout, and usage instructions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CMakeLists.txt
Comment thread CMakeLists.txt
Comment thread .github/workflows/build-relocatable-packages.yml
Comment thread .github/workflows/build-relocatable-packages.yml
Comment thread .github/workflows/README_BUILD_PACKAGES.md Outdated
Packages declare runtime deps on hsa-rocr/numactl and only ship the
TransferBench binary; the install tree is movable via $ORIGIN RPATH but
target systems still need the ROCm/HSA runtime. Update README so the
claim matches the actual packaging behavior.
Comment thread build_packages_local.sh
-DCMAKE_VERBOSE_MAKEFILE=ON
-DBUILD_RELOCATABLE_PACKAGE=ON
-DBUILD_LOCAL_GPU_TARGET_ONLY=OFF
-DENABLE_NIC_EXEC=ON
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we want this ON by default
@gilbertlee-amd thoughts?

Comment thread build_packages_local.sh
build-essential cmake git curl tar xz-utils ca-certificates pkg-config \
python3 python3-pip \
libnuma-dev libibverbs-dev rdma-core ibverbs-providers \
libopenmpi-dev openmpi-bin \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this aligned with how TheRock sets/expects MPI dependencies for other packages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants