a2go helps you run open-source AI models on your own hardware — locally, on a cloud GPU, or on a Mac.
- there are dozens of open-source models (LLMs, image gen, audio/TTS) in different quantizations, and figuring out which ones actually fit your GPU, how much VRAM they really need, and what performance you'll get is a pain — the info is scattered across huggingface cards, reddit, github issues, and trial-and-error
- a2go bundles all of that into a web configurator where you pick your GPU and it shows you exactly what fits, with real VRAM breakdowns (weights + kv cache + overhead — not just the file size), tokens-per-second benchmarks measured on actual hardware, and a live memory gauge that updates as you combine models
- every model in the registry has been tested on real GPUs — the numbers aren't theoretical, they account for things model cards never mention like compute graph buffers and runtime overhead
- when you're done picking, it generates a copy-paste deploy command — docker for linux/windows/runpod, mlx commands for mac — no config files to write, no flags to guess
- it supports multi-model setups (run an LLM + image gen + audio on the same GPU and see if it all fits), multi-gpu splitting, platform-aware variants (gguf for nvidia, mlx for apple silicon), and context length sliders that show you the real memory cost
- the whole point is: we already tested all of this so you don't have to — stop downloading models that don't fit, stop guessing at flags, just pick and deploy
Image: runpod/a2go:latest (~7 GB compressed)
- Pick models at a2go.run — select your GPU and the site shows what fits
- Read the security guide — OpenClaw agents can execute shell commands, read/write files, and fetch URLs on your machine. Understand what you're running: Security Guide
- Deploy — the site generates a ready-to-use command (Docker or MLX)
- Access the UI —
http://localhost:18789/?token=<A2GO_AUTH_TOKEN>
On Runpod the URL is https://<pod-id>-18789.proxy.runpod.net/?token=<A2GO_AUTH_TOKEN>.
First time: approve device pairing when prompted (SSH into the machine, run openclaw devices list then openclaw devices approve <requestId>).
The site generates this — or run it directly:
docker run --gpus all \
-e A2GO_CONFIG='{"llm":"unsloth/GLM-4.7-Flash-GGUF","audio":"LiquidAI/LFM2.5-Audio-1.5B-GGUF"}' \
-e A2GO_AUTH_TOKEN=changeme \
-e A2GO_API_KEY=changeme \
-p 8000:8000 -p 8080:8080 -p 18789:18789 \
-v a2go-models:/workspace \
runpod/a2go:latestModels download on first start and persist on the volume.
| Variable | Description | Default |
|---|---|---|
A2GO_CONFIG |
JSON config — models to load | {} (auto-detect) |
A2GO_AUTH_TOKEN |
Web UI + API auth token | changeme |
A2GO_API_KEY |
LLM API key (OpenAI-compatible endpoint) | changeme |
TELEGRAM_BOT_TOKEN |
Enable Telegram bot integration | — |
GITHUB_TOKEN |
GitHub auth for Claude Code | — |
Model names are case-insensitive. Use HuggingFace repo names or short IDs.
When A2GO_CONFIG is {} (the default), the container reads your GPU's VRAM via nvidia-smi and picks a default LLM that fits automatically. Useful when you just want something running without choosing.
| Port | Service |
|---|---|
| 8000/http | LLM API (OpenAI-compatible) |
| 8001/http | Media server (image gen, TTS — internal) |
| 8080/http | Media proxy + web UI |
| 18789/http | OpenClaw control UI + chat |
| 22/tcp | SSH |
a2go models # List available models
a2go fit # Show what fits on this GPU
a2go presets # List preset profiles
a2go registry status # Registry source + cache info
a2go tool image-generate --prompt "A cat" # Generate image
a2go tool text-to-speech "Hello world" # Text to speech
a2go tool speech-to-text audio.wav # Speech to text/workspace/ is persistent storage that survives pod restarts. All models and config live here.
| Path | Purpose |
|---|---|
/workspace/.openclaw/openclaw.json |
Main config — auto-generated on first boot, editable |
/workspace/openclaw/IDENTITY.md |
Agent identity — create your own to customize personality |
/workspace/openclaw/AGENTS.md |
Agent instructions & skills — create your own to add capabilities |
The entrypoint only generates openclaw.json if it doesn't exist, so your edits are safe across restarts.
On a Mac, you don't use Docker. Instead, you run model servers natively using Apple's MLX framework, which is optimized for Apple Silicon.
Select macOS on a2go.run and the site generates the exact commands for your selected models. Here's what the flow looks like:
# 1. Create a virtual environment
python3 -m venv ~/.a2go/venv
source ~/.a2go/venv/bin/activate
# 2. Install engines (the site tells you which ones you need)
pip install mlx-lm # for LLM models
pip install mlx-audio # for audio models
# 3. Start servers (each in a separate terminal)
python -m mlx_lm.server --model <repo> --host 0.0.0.0 --port 8000
python -m mlx_audio.server --host 0.0.0.0 --port 8001This only starts the model servers. To connect OpenClaw, you need a config file that tells the agent framework where to find them. The site generates this for you — save it as ~/.openclaw/openclaw.json.
For installing the agent framework itself (OpenClaw gateway + UI), see the OpenClaw install docs.
Not all models have MLX variants — the site will tell you which ones do.
- a2go.run — model configurator
- Security Guide — trust model, access control, hardening
- OpenClaw — the agent framework
- Runpod — GPU cloud
- Contributing models
