Phoenix GPU: bump nvhpc to 25.5 and use system openmpi by sbryngelson · Pull Request #1393 · MFlowCode/MFC

sbryngelson · 2026-05-02T14:32:06Z

Summary

Update Phoenix GPU module set: nvhpc/24.5 → nvhpc/25.5, drop hpcx/2.19-cuda, add system openmpi.

Test plan

source ./mfc.sh load -c p -m g succeeds on Phoenix
./mfc.sh build --gpu acc -j 8 succeeds on Phoenix GPU node
Run a representative GPU case to confirm runtime

qodo-code-review · 2026-05-02T14:32:30Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0)

1. Unpinned OpenMPI for NVHPC 🐞 Bug ☼ Reliability

Description

p-gpu now loads an unversioned openmpi module while exporting FC=nvfortran, but MPI is enabled
by default and CMake requires MPI::MPI_Fortran, which depends on a Fortran MPI installation
compatible with the active compiler. Because openmpi is not versioned/compiler-qualified here
(unlike other NVHPC module sets), the resulting toolchain can become non-reproducible and may fail
to configure/build if the default OpenMPI is not NVHPC-compatible.

Code

toolchain/modules[40]
+p-gpu python/3.12.5 nvhpc/25.5 cuda/12.1.1 openmpi

Evidence

The load mechanism concatenates *-all and *-gpu tokens and passes them directly to `module
load. Phoenix GPU explicitly sets FC=nvfortran`, while MFC enables MPI by default and the CMake
build unconditionally requires MPI Fortran when MPI is enabled. Therefore the module list must
reliably provide an MPI implementation whose Fortran modules/libs are compatible with NVHPC; using
an unpinned openmpi token makes that selection ambiguous and less reproducible than other NVHPC
GPU entries that encode the compiler/variant in the module selection.

toolchain/modules[37-42]
toolchain/bootstrap/modules.sh[106-115]
toolchain/mfc/state.py[13-23]
toolchain/mfc/cli/docs_gen.py[93-98]
toolchain/mfc/build.py[414-422]
CMakeLists.txt[562-571]
toolchain/modules[9-13]
toolchain/modules[105-113]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Phoenix GPU now loads a generic `openmpi` module while forcing `FC=nvfortran`, but MPI is enabled by default and the build requires a compiler-compatible MPI Fortran installation. Leaving `openmpi` unversioned/compiler-agnostic makes the toolchain selection ambiguous and can break configuration/build depending on what `openmpi` resolves to.
### Issue Context
- The module loader concatenates `p-all` + `p-gpu` and calls `module load` on the resulting tokens.
- The build enables MPI by default and CMake requires `MPI::MPI_Fortran`.
- Other NVHPC GPU entries encode the MPI/compiler coupling (e.g., `openmpi/...-nvhpc...` or explicitly using NVHPC MPI wrapper binaries).
### Fix Focus Areas
- toolchain/modules[37-41]
### Suggested changes
- Replace `openmpi` with the exact Phoenix system module that is built for NVHPC 25.5 (e.g., `openmpi/<ver>-nvhpc25.5` if that’s how Phoenix names it), OR
- Switch Phoenix GPU to use the MPI wrapper compilers from the intended MPI stack (set `CC/CXX/FC` to `mpicc/mpicxx/mpifort` paths for the NVHPC-compatible MPI), similar to how `h-gpu` is pinned to NVHPC comm_libs wrappers.
- Keep the module list version-pinned where possible for reproducible CI and user builds.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Redundant python module token 🐞 Bug ⚙ Maintainability

Description

p-gpu repeats python/3.12.5 even though p-all already loads it, and the loader concatenates
both lists, so Python is redundantly passed to module load. This increases drift risk if one line
is updated later but the other is not.

Code

toolchain/modules[40]
+p-gpu python/3.12.5 nvhpc/25.5 cuda/12.1.1 openmpi

Evidence

Phoenix defines python/3.12.5 in p-all and again in p-gpu; the module loader concatenates
p-all and p-gpu tokens and loads them as a single list, so Python will be requested twice.

toolchain/modules[37-41]
toolchain/bootstrap/modules.sh[106-112]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Phoenix GPU (`p-gpu`) redundantly lists `python/3.12.5` even though it is already included in `p-all`. The loader combines `p-all` and `p-gpu`, so the duplicate is unnecessary and can lead to future drift.
### Issue Context
The loader builds `ELEMENTS` as `(<slug>-all) + (<slug>-gpu)` and then runs a single `module load` over the resulting module tokens.
### Fix Focus Areas
- toolchain/modules[37-41]
- toolchain/bootstrap/modules.sh[106-112]
### Suggested change
- Remove `python/3.12.5` from the `p-gpu` module list (keep it only in `p-all`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Previous review results

Review updated until commit 717a8b3

Results up to commit 717a8b3

🐞 Bugs (2) 📘 Rule violations (0)

1. Unpinned OpenMPI for NVHPC 🐞 Bug ☼ Reliability

Description

p-gpu now loads an unversioned openmpi module while exporting FC=nvfortran, but MPI is enabled
by default and CMake requires MPI::MPI_Fortran, which depends on a Fortran MPI installation
compatible with the active compiler. Because openmpi is not versioned/compiler-qualified here
(unlike other NVHPC module sets), the resulting toolchain can become non-reproducible and may fail
to configure/build if the default OpenMPI is not NVHPC-compatible.

Code

toolchain/modules[40]
+p-gpu python/3.12.5 nvhpc/25.5 cuda/12.1.1 openmpi

Evidence

The load mechanism concatenates *-all and *-gpu tokens and passes them directly to `module
load. Phoenix GPU explicitly sets FC=nvfortran`, while MFC enables MPI by default and the CMake
build unconditionally requires MPI Fortran when MPI is enabled. Therefore the module list must
reliably provide an MPI implementation whose Fortran modules/libs are compatible with NVHPC; using
an unpinned openmpi token makes that selection ambiguous and less reproducible than other NVHPC
GPU entries that encode the compiler/variant in the module selection.

toolchain/modules[37-42]
toolchain/bootstrap/modules.sh[106-115]
toolchain/mfc/state.py[13-23]
toolchain/mfc/cli/docs_gen.py[93-98]
toolchain/mfc/build.py[414-422]
CMakeLists.txt[562-571]
toolchain/modules[9-13]
toolchain/modules[105-113]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Phoenix GPU now loads a generic `openmpi` module while forcing `FC=nvfortran`, but MPI is enabled by default and the build requires a compiler-compatible MPI Fortran installation. Leaving `openmpi` unversioned/compiler-agnostic makes the toolchain selection ambiguous and can break configuration/build depending on what `openmpi` resolves to.

### Issue Context
- The module loader concatenates `p-all` + `p-gpu` and calls `module load` on the resulting tokens.
- The build enables MPI by default and CMake requires `MPI::MPI_Fortran`.
- Other NVHPC GPU entries encode the MPI/compiler coupling (e.g., `openmpi/...-nvhpc...` or explicitly using NVHPC MPI wrapper binaries).

### Fix Focus Areas
- toolchain/modules[37-41]

### Suggested changes
- Replace `openmpi` with the exact Phoenix system module that is built for NVHPC 25.5 (e.g., `openmpi/<ver>-nvhpc25.5` if that’s how Phoenix names it), OR
- Switch Phoenix GPU to use the MPI wrapper compilers from the intended MPI stack (set `CC/CXX/FC` to `mpicc/mpicxx/mpifort` paths for the NVHPC-compatible MPI), similar to how `h-gpu` is pinned to NVHPC comm_libs wrappers.
- Keep the module list version-pinned where possible for reproducible CI and user builds.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Redundant python module token 🐞 Bug ⚙ Maintainability

Description

p-gpu repeats python/3.12.5 even though p-all already loads it, and the loader concatenates
both lists, so Python is redundantly passed to module load. This increases drift risk if one line
is updated later but the other is not.

Code

toolchain/modules[40]
+p-gpu python/3.12.5 nvhpc/25.5 cuda/12.1.1 openmpi

Evidence

Phoenix defines python/3.12.5 in p-all and again in p-gpu; the module loader concatenates
p-all and p-gpu tokens and loads them as a single list, so Python will be requested twice.

toolchain/modules[37-41]
toolchain/bootstrap/modules.sh[106-112]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Phoenix GPU (`p-gpu`) redundantly lists `python/3.12.5` even though it is already included in `p-all`. The loader combines `p-all` and `p-gpu`, so the duplicate is unnecessary and can lead to future drift.

### Issue Context
The loader builds `ELEMENTS` as `(<slug>-all) + (<slug>-gpu)` and then runs a single `module load` over the resulting module tokens.

### Fix Focus Areas
- toolchain/modules[37-41]
- toolchain/bootstrap/modules.sh[106-112]

### Suggested change
- Remove `python/3.12.5` from the `p-gpu` module list (keep it only in `p-all`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

coderabbitai · 2026-05-02T14:35:48Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4c23dd4f-670f-4f96-b510-b9b42ba9320b

📥 Commits

Reviewing files that changed from the base of the PR and between 1033ff6 and 717a8b3.

📒 Files selected for processing (1)

toolchain/modules

📝 Walkthrough

Walkthrough

The p-gpu module stack definition in toolchain/modules was updated to use NVHPC version 25.5 instead of 24.5. The hpcx/2.19-cuda component was removed from the module list and replaced with openmpi (without a specific version specified). The remaining components in the stack (python/3.12.5 and cuda/12.1.1) and subsequent configuration directives remain unchanged.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description includes a clear summary of changes and a test plan, but lacks the required template structure including issue reference, type of change, and checklist items.	Add the missing template sections: issue reference (Fixes #...), type of change checkbox, testing details, and completion checklist items as specified in the repository template.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and concisely describes the main changes: updating nvhpc from 24.5 to 25.5 and switching to system openmpi instead of hpcx.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-05-02T14:35:58Z

Persistent review updated to latest commit 717a8b3

codecov · 2026-05-02T17:23:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.75%. Comparing base (02eb4ec) to head (f716f51).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1393   +/-   ##
=======================================
  Coverage   64.75%   64.75%           
=======================================
  Files          71       71           
  Lines       18721    18721           
  Branches     1551     1551           
=======================================
+ Hits        12122    12123    +1     
+ Misses       5641     5640    -1     
  Partials      958      958

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson · 2026-05-02T17:26:59Z

nvhpc 25.5 is working but slower than 24.5 (15% slower)

Phoenix GPU: bump nvhpc to 25.5 and use system openmpi

717a8b3

sbryngelson marked this pull request as ready for review May 2, 2026 14:32

Update comment for input data reshaping

f716f51

sbryngelson closed this May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phoenix GPU: bump nvhpc to 25.5 and use system openmpi#1393

Phoenix GPU: bump nvhpc to 25.5 and use system openmpi#1393
sbryngelson wants to merge 2 commits intoMFlowCode:masterfrom
sbryngelson:phoenix-nvhpc-25.5

sbryngelson commented May 2, 2026

Uh oh!

qodo-code-review Bot commented May 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 2, 2026

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

qodo-code-review Bot commented May 2, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 2, 2026

Uh oh!

sbryngelson commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

sbryngelson commented May 2, 2026

Summary

Test plan

Uh oh!

qodo-code-review Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Previous review results

Uh oh!

coderabbitai Bot commented May 2, 2026

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

qodo-code-review Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 2, 2026

Codecov Report

Uh oh!

sbryngelson commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

qodo-code-review Bot commented May 2, 2026 •

edited

Loading

qodo-code-review Bot commented May 2, 2026 •

edited

Loading