Add SGLang Ray inference example by xyuzh · Pull Request #40 · anyscale/examples

xyuzh · 2026-02-11T03:03:00Z

Add offline and online inference drivers with Dockerfile and Anyscale job configs for running SGLang on Ray.

…stness - Dockerfile: use sglang[all]==0.5.8 + sgl-kernel==0.3.21 instead of git fork - Drivers: add logging, named placement groups, exit codes, better error handling - Job configs: add NCCL_DEBUG, fix submit path comment - README: add How It Works, Troubleshooting, local run examples Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Rename sglang_ray_inference -> sglang_inference - Batch inference (job.yaml + driver_offline.py) fully working with multi-node TP=4, PP=2 using SGLang's use_ray=True mode - Ray Serve deployment (service.yaml + serve.py) uses same pattern as official Ray LLM SGLang integration with signal monkey-patching - Add query.py script for testing the service - Simplify configuration with environment variables The serving example is still being validated with multi-replica autoscaling. Single replica works; investigating occasional timeouts with multiple replicas. Co-Authored-By: Claude Opus 4.5 <[email protected]> Signed-off-by: Robert Nishihara <[email protected]>

xyuzh · 2026-03-08T22:57:51Z

sglang_inference/service.yaml

+  worker_nodes:
+    - instance_type: g5.12xlarge     # 4x A10G
+      min_nodes: 4
+      max_nodes: 8


I see the min_nodes max_nodes settings are different for offline and serve config, is there a reason for this?

Your fix is correct. 2 and 8 is right, since the replicas autoscale from 1-4.

xyuzh · 2026-03-08T23:35:13Z

sglang_inference/job.yaml

+    instance_type: m5.2xlarge        # CPU-only head
+  worker_nodes:
+    - instance_type: g5.12xlarge     # 4x A10G
+      min_nodes: 4


shouldn't we only need 2 nodes here?

xyuzh · 2026-03-08T23:36:07Z

sglang_inference/service.yaml

+      max_nodes: 8
+
+env_vars:
+  MODEL_PATH: "Qwen/Qwen3-1.7B"


have you succeeded with the 30B model

- Switch from Engine to sglang.srt.ray.engine.RayEngine - Upgrade base image to ray 2.54.0 and install from sglang main branch - Update default model to Qwen3.5-27B - Add threaded engine init with warmup in serve.py to avoid event loop conflicts - Fix node counts in job/service configs (4 -> 2 worker nodes)

- Remove threading and signal monkey-patching from serve.py; use RayEngine directly with async_generate - Add Dockerfile step to patch ray.serve replica.py with two-phase init support (compatible with Ray 2.54.0) - Improve logging setup to avoid duplicate handlers

xyuzh and others added 5 commits February 10, 2026 19:02

Add SGLang Ray inference example

ba1698e

Add offline and online inference drivers with Dockerfile and Anyscale job configs for running SGLang on Ray.

Add README for SGLang Ray inference example

4d0dad5

Revert Dockerfile to use sglang from feature branch fork

785a4c3

Co-Authored-By: Claude Opus 4.6 <[email protected]>

xyuzh commented Mar 8, 2026

View reviewed changes

xyuzh added 2 commits March 8, 2026 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SGLang Ray inference example#40

Add SGLang Ray inference example#40
xyuzh wants to merge 7 commits intomainfrom
sglang_ray

xyuzh commented Feb 11, 2026

Uh oh!

xyuzh Mar 8, 2026

Uh oh!

robertnishihara Mar 9, 2026

Uh oh!

xyuzh Mar 8, 2026

Uh oh!

xyuzh Mar 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xyuzh commented Feb 11, 2026

Uh oh!

xyuzh Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

robertnishihara Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

xyuzh Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

xyuzh Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xyuzh Mar 8, 2026 •

edited

Loading