feat(ops): add `RmsNorm` with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support by zhangyue207 · Pull Request #6 · InfiniTensor/InfiniOps

zhangyue207 · 2026-03-02T02:52:36Z

Add 'RmsNorm' operator with 'CPU', 'NVIDIA', and 'Iluvatar' implementations
Support fp32/fp16/bf16 on NVIDIA and Iluvatar; fp32 only on CPU
Add shared CUDA kernel (kernel.cuh) and backend-specific wrappers
Extend generate_wrappers.py and CMake for RmsNorm
Add tests covering backends and dtypes

…f16 support

…de cases

…eta, and trans parameters

zhangyue207 · 2026-03-02T08:58:50Z

Iluvatar

root@iluvatar:/workspace/InfiniOps# pytest
==================================== test session starts ====================================
platform linux -- Python 3.10.18, pytest-9.0.2, pluggy-1.6.0
rootdir: /workspace/InfiniOps
configfile: pyproject.toml
plugins: anyio-4.9.0, cov-7.0.0, xdist-3.8.0, typeguard-4.4.4
collected 572 items                                                                         

tests/test_add.py ....................................                                [  6%]
tests/test_gemm.py .................................................................. [ 17%]
..................................................................................... [ 32%]
..................................................................................... [ 47%]
..................................................................................... [ 62%]
..................................................................................... [ 77%]
..................................................................................... [ 92%]
.........                                                                             [ 93%]
tests/test_rms_norm.py ....................................                           [100%]

==================================== 572 passed in 1.61s ====================================

zhangyue207 · 2026-03-02T08:59:46Z

Nvidia

(python3.10) zhangyue@server:~/InfiniOps$ pytest
========================================== test session starts ==========================================
platform linux -- Python 3.10.19, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/zhangyue/InfiniOps
configfile: pyproject.toml
plugins: xdist-3.8.0, cov-7.0.0
collected 572 items                                                                                     

tests/test_add.py ....................................                                            [  6%]
tests/test_gemm.py .............................................................................. [ 19%]
................................................................................................. [ 36%]
................................................................................................. [ 53%]
................................................................................................. [ 70%]
................................................................................................. [ 87%]
..................................                                                                [ 93%]
tests/test_rms_norm.py ....................................                                       [100%]

========================================== 572 passed in 2.20s ==========================================

voltjia · 2026-03-03T02:42:54Z

CMakeLists.txt


+# NVIDIA and Iluvatar are parallel backends; only one GPU backend at a time.
+if(WITH_NVIDIA AND WITH_ILUVATAR)
+    message(FATAL_ERROR "WITH_NVIDIA and WITH_ILUVATAR cannot both be ON. Build one GPU backend at a time.")


使用 Markdown 语法："`WITH_NVIDIA` and `WITH_ILUVATAR` cannot both be `ON`. Build one GPU backend at a time."。

voltjia · 2026-03-03T03:40:15Z

CMakeLists.txt

    find_package(CUDAToolkit REQUIRED)
 endif()

+# Iluvatar: CUDA-compatible device, uses clang++ with -x ivcore (not nvcc).


使用 Markdown 语法：

# Iluvatar: CUDA-compatible device, uses `clang++` with `-x ivcore` (not `nvcc`). # Reference: `InfiniCore` `xmake/iluvatar.lua`.

voltjia · 2026-03-03T03:45:12Z

CMakeLists.txt

-    if(NOT WITH_NVIDIA)
-        enable_language(CUDA)
-        find_package(CUDAToolkit REQUIRED)
+    set(ILUVATAR_ARCH "ivcore20" CACHE STRING "Iluvatar GPU architecture")


天数上我开发的不太多，但是我记得之前开发 Add 和 Gemm 的时候好像没有这些也编译通过了，能简单讲一下这两段代码的作用和引入理由嘛。

voltjia · 2026-03-03T03:49:09Z

tests/test_rms_norm.py

+from tests.utils import Payload, empty_strided, randn_strided
+
+
+def _rms_norm(x, w, out, *, epsilon=1e-6):


_rms_norm 和 _torch_rms_norm 是私有的，应该放在文件后方，也就是 test_rms_norm 后面。

voltjia · 2026-03-03T03:51:03Z

tests/test_rms_norm.py

+    return out
+
+
+def _torch_rms_norm(x, w, out, *, epsilon=1e-6):


可以直接用 torch.nn.functional.rms_norm。

voltjia · 2026-03-03T05:41:14Z

src/cuda/rms_norm/kernel.cuh

+
+}  // namespace
+
+template <unsigned int BLOCK_SIZE, typename Tcompute, typename Tdata,


voltjia · 2026-03-03T05:48:26Z

src/nvidia/rms_norm/kernel.h

+struct NvidiaBackend {
+  using stream_t = cudaStream_t;
+
+  static constexpr auto setDevice = [](int) {};


这个地方为啥是空的？

voltjia · 2026-03-03T05:49:38Z

tests/test_rms_norm.py

文件中的命名和顺序应该按照前面说的改成跟 PyTorch 对齐的。

voltjia · 2026-03-03T05:50:07Z

src/CMakeLists.txt

+                                               CUDA_STANDARD_REQUIRED ON)
 endif()

+# Iluvatar: CUDA-compatible device; -x ivcore and flags from top-level CMakeLists.txt


这行注释可以去掉。

voltjia · 2026-03-03T05:51:18Z

src/CMakeLists.txt

    target_link_libraries(infiniops PUBLIC CUDA::cudart CUDA::cublas CUDA::cuda_driver)

+    set_target_properties(infiniops PROPERTIES CUDA_STANDARD 17
+                                             CUDA_STANDARD_REQUIRED ON)


CUDA_STANDARD_REQUIRED 和 CUDA_STANDARD 平齐吧，当然这个格式咱们暂时也没有确定的标准。

zhangyue207 changed the base branch from master to feat/dev-infra March 2, 2026 02:52

zhangyue207 added 2 commits March 2, 2026 02:57

feat(ops): add RmsNorm with Iluvatar, NVIDIA, CPU backends and fp16/b…

377ec59

…f16 support

refactor(test): align test_rms_norm with Payload pattern and add stri…

10187f4

…de cases

zhangyue207 force-pushed the feat/dev-rmsnorm-cuda branch from ea03f0f to 10187f4 Compare March 2, 2026 03:22

zhangyue207 added 2 commits March 2, 2026 06:27

refactor(ops): align RmsNorm with Add pattern, header-only CUDA backend

899d46c

refactor(ops): restruct rmsnorm IluvatarBackend

7f7ea45

zhangyue207 changed the title ~~feat(ops): add RmsNorm with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support~~ feat(ops): add 'RmsNorm' with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support Mar 2, 2026

zhangyue207 changed the title ~~feat(ops): add 'RmsNorm' with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support~~ feat(ops): add RmsNorm with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support Mar 2, 2026

refactor(gemm): update constructors to use std::optional for alpha, b…

e650290

…eta, and trans parameters

voltjia reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ops): add `RmsNorm` with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support#6

feat(ops): add `RmsNorm` with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support#6
zhangyue207 wants to merge 5 commits intofeat/dev-infrafrom
feat/dev-rmsnorm-cuda

zhangyue207 commented Mar 2, 2026

Uh oh!

zhangyue207 commented Mar 2, 2026 •

edited

Loading

Uh oh!

zhangyue207 commented Mar 2, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

voltjia Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from tests.utils import Payload, empty_strided, randn_strided


		def _rms_norm(x, w, out, *, epsilon=1e-6):


		} // namespace

		template <unsigned int BLOCK_SIZE, typename Tcompute, typename Tdata,

Conversation

zhangyue207 commented Mar 2, 2026

Uh oh!

zhangyue207 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangyue207 commented Mar 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangyue207 commented Mar 2, 2026 •

edited

Loading