Remove padding from scales for hipBLASlt calls by alextmagro · Pull Request #442 · ROCm/TransformerEngine

alextmagro · 2026-02-04T01:33:22Z

Removes padding for scale vectors that are used mainly for MXFP8.

ipanfilo · 2026-02-04T15:04:58Z

tests/cpp/operator/test_cublaslt_gemm.cu

+    if (params.m % 16 || params.n % 16) {
+      GTEST_SKIP() << "MXFP8 requires M & N to be multiples of 16";
+    }
+    if (params.k % 128) {


Is it hipblasLt limitation?

Yes, these are the values that hipblastlt team provided to us. I tested just in case, but nothing smaller that 128 works for k.

ipanfilo · 2026-02-04T15:06:15Z

transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp

  NVTE_DIM_CHECK(chunk_height > 0 && chunk_width > 0, "Attempted to get empty tensor chunk");
  NVTE_DIM_CHECK(chunk_height <= height && chunk_width <= width,
                 "Attempted to get out-of-bounds tensor chunk");
+#ifndef __HIP_PLATFORM_AMD__


I think this file is not currently compiled for ROCm - it is for UB

Yes, I can move it to the UB PR if you prefer?

ipanfilo · 2026-02-04T15:07:36Z

transformer_engine/common/gemm/rocm_gemm.cu

+    unpad_mxfp8_scales_kernel<<<blocks, threads, 0, stream>>>
+                (scale_dptr, unpadded_rows, unpadded_cols, padded_cols);
+
+    NVTE_CHECK_CUDA(hipStreamSynchronize(stream));


If we're using the same stream for GEMM, synchronize creates unnecessary bubble

ipanfilo · 2026-02-04T15:15:15Z

transformer_engine/common/gemm/rocm_gemm.cu

    // dimensions (with matrix interpreted in row-major order).
+
+    unpad_mxfp8_checkpoint(A, is_A_transposed, m, n, k, stream);
+


It is probably not the best place for this call - it does not update input parameters.
Also, what will be the code behaviour if we call GEMM twice with the same input - the first call will updad scale tensor and the second will do the same, right?

We talked about this during standup and the issues it would create in Python, I am currently working on moving the logic over there.

I have moved the check and unpadding to generic_gemm in pytorch. This provides a more robust way to check before calling the gemms, and directly modifies the pytorch tensors.

Remove padding from scales for hipBLASlt calls

1a24ff2

alextmagro requested review from ipanfilo, wangye805 and wenchenvincent as code owners February 4, 2026 01:33

alextmagro added 2 commits February 3, 2026 20:44

Unpadding for checkpoints

5bbfb4b

copyrights

cbbd027

ipanfilo reviewed Feb 4, 2026

View reviewed changes

alextmagro added 2 commits February 6, 2026 17:37

Updated unpadding func

9f8f611

rocm_gemm.cu revert

9a5980f

alextmagro requested a review from ipanfilo February 6, 2026 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove padding from scales for hipBLASlt calls#442

Remove padding from scales for hipBLASlt calls#442
alextmagro wants to merge 5 commits intodevfrom
mxfp8_scale_unpadding

alextmagro commented Feb 4, 2026

Uh oh!

ipanfilo Feb 4, 2026

Uh oh!

alextmagro Feb 4, 2026

Uh oh!

ipanfilo Feb 4, 2026

Uh oh!

alextmagro Feb 4, 2026

Uh oh!

ipanfilo Feb 4, 2026

Uh oh!

ipanfilo Feb 4, 2026

Uh oh!

alextmagro Feb 4, 2026

Uh oh!

alextmagro Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// dimensions (with matrix interpreted in row-major order).

		unpad_mxfp8_checkpoint(A, is_A_transposed, m, n, k, stream);

Conversation

alextmagro commented Feb 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants