Name	Name	Last commit message	Last commit date
parent directory ..
img	img
README.md	README.md
batch-build.md	batch-build.md
cli-reference.md	cli-reference.md
configuration.md	configuration.md
contributing.md	contributing.md
deployment.md	deployment.md
installation.md	installation.md
launchers.md	launchers.md
profiling.md	profiling.md
usage.md	usage.md

madengine Documentation

Complete documentation for madengine - AI model automation and distributed benchmarking platform.

📚 Documentation Index

Getting Started

Guide	Description
Installation	Complete installation instructions
Usage Guide	Commands, configuration, and examples (`--skip-model-run`)

Configuration & Deployment

Guide	Description
Configuration	Advanced configuration options (includes run log error pattern scan)
Batch Build	Selective builds with batch manifests
Deployment	Kubernetes and SLURM deployment
Launchers	Multi-node training frameworks

Advanced Topics

Guide	Description
Profiling	Performance analysis tools
Contributing	How to contribute to madengine

Reference

Guide	Description
CLI Reference	Complete command-line options and examples

🏗️ Architecture

The architecture diagram (Orchestration, Infrastructure, and Launcher layers) is in the main README. Summary:

CLI Layer - User interface with 5 commands (discover, build, run, report, database)
Model Discovery - Find and validate models from MAD package
Orchestration - BuildOrchestrator & RunOrchestrator manage workflows
Execution Targets - Local Docker, Kubernetes Jobs, or SLURM Jobs
Distributed Launchers - Training (torchrun, DeepSpeed, Megatron-LM, TorchTitan, Primus) and Inference (vLLM, SGLang)
Performance Output - CSV/JSON results with metrics
Post-Processing - Report generation (HTML/Email) and database upload (MongoDB)

🚀 Quick Links

Main Repository: https://github.com/ROCm/madengine
MAD Package: https://github.com/ROCm/MAD
Issues: https://github.com/ROCm/madengine/issues
ROCm Documentation: https://rocm.docs.amd.com/

📖 Documentation by Use Case

I want to...

Run a model locally → Installation → Usage Guide

Deploy to Kubernetes → Configuration → Deployment

Deploy to SLURM → Configuration → Deployment

Build multiple models selectively (CI/CD) → Batch Build

Profile model performance → Profiling

Multi-node distributed training → Launchers → Deployment

Contribute to madengine → Contributing

🔍 Key Concepts

MAD Package

madengine operates within the MAD (Model Automation and Dashboarding) ecosystem. The MAD package contains:

Model definitions (models.json)
Execution scripts (run.sh)
Docker configurations
Data provider configurations (data.json)
Credentials (credential.json)

CLI Interface

madengine - Modern CLI with:

Rich terminal output
Distributed deployment support (K8s, SLURM)
Build/run separation
Manifest-based execution

Deployment Targets

Local - Docker containers on local machine
Kubernetes - Cloud-native container orchestration
SLURM - HPC cluster job scheduling

Distributed Launchers

torchrun - PyTorch DDP/FSDP
deepspeed - ZeRO optimization
megatron - Large transformers (K8s + SLURM)
torchtitan - LLM pre-training
vllm - LLM inference
sglang - Structured generation

📝 Documentation Standards

This documentation follows these principles:

Task-oriented - Organized by what users want to accomplish
Progressive disclosure - Start simple, add complexity as needed
Examples first - Show working examples before explaining details
Consistent naming - Files follow simple naming pattern (no prefixes)
Up-to-date - Reflects current implementation (v2.0)

🤝 Contributing to Documentation

Documentation improvements are welcome! Please:

Keep examples working and tested
Use consistent formatting and style
Update cross-references when moving content
Mark deprecated content clearly
Follow the existing structure

See Contributing Guide for details.

📄 License

madengine is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

madengine Documentation

📚 Documentation Index

Getting Started

Configuration & Deployment

Advanced Topics

Reference

🏗️ Architecture

🚀 Quick Links

📖 Documentation by Use Case

I want to...

🔍 Key Concepts

MAD Package

CLI Interface

Deployment Targets

Distributed Launchers

📝 Documentation Standards

🤝 Contributing to Documentation

📄 License

FilesExpand file tree

docs

Directory actions

More options

Directory actions

More options

Latest commit

History

docs

Folders and files

parent directory

README.md

madengine Documentation

📚 Documentation Index

Getting Started

Configuration & Deployment

Advanced Topics

Reference

🏗️ Architecture

🚀 Quick Links

📖 Documentation by Use Case

I want to...

🔍 Key Concepts

MAD Package

CLI Interface

Deployment Targets

Distributed Launchers

📝 Documentation Standards

🤝 Contributing to Documentation

📄 License