Skip to content

togethercomputer/xorl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XoRL

High-performance distributed training for LLMs — RL, SFT, MoE, and beyond.

🚀 Installation · ⚡ Quick Start · 📚 Documentation


🔍 Overview

XoRL is a distributed training framework designed for large language models with composable parallelism and flexible training modes.

Two training modes:

  • Localtorchrun-based training for offline SFT and pretraining
  • Server — REST API-driven training for online RL loops where an external orchestrator (e.g. xorl_client) controls the training loop

Parallelism strategies — mix and match freely:

Strategy Description
FSDP2 Fully sharded data parallelism (PyTorch native)
Tensor Parallel Column/row weight sharding across GPUs
Pipeline Parallel Interleaved 1F1B schedule across stages
Context Parallel Ring attention + Ulysses sequence parallel
Expert Parallel MoE expert sharding via DeepEP

Fine-tuning methods — full weights, LoRA, and QLoRA (int4/nvfp4/block_fp8), all FSDP2-compatible.


🚀 Installation

git clone --recurse-submodules git@github.com:togethercomputer/xorl.git
cd xorl
uv sync

Already cloned without --recurse-submodules? Run git submodule update --init --recursive

See the installation guide for full setup including optional dependencies (DeepEP, Flash Attention).

⚡ Quick Start

# Local training on 8 GPUs
torchrun --nproc_per_node=8 -m xorl.cli.train examples/local/dummy/configs/full/qwen3_8b.yaml

See the quick start guide for more examples including MoE, server training, and LoRA.


📚 Documentation

Topic Link
Parallelism Overview
MoE & DeepEP MoE docs
LoRA / QLoRA Adapters
Server training Server docs
Config reference Local · Server

🧠 Supported Models

Model Type HuggingFace ID
Qwen3 Dense Qwen/Qwen3-8B, Qwen/Qwen3-32B, ...
Qwen3-MoE Mixture-of-Experts Qwen/Qwen3-30B-A3B, Qwen/Qwen3-235B-A22B, ...

Models are loaded directly from HuggingFace checkpoints — no preprocessing needed. See the supported models page for details.


🤝 Contributing

See CONTRIBUTING.md for development setup, coding conventions, and how to run tests.

Releases

No releases published

Packages

 
 
 

Contributors