add dmrlet - lightweight node agent for Docker Model Runner#627
add dmrlet - lightweight node agent for Docker Model Runner#627ericcurtin wants to merge 1 commit intomainfrom
Conversation
Summary of ChangesHello @ericcurtin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
pkg/dmrlet/container/manager.go,NewManagertakes acontainerdaddress but unconditionally selects a Docker CLI runtime and ignores the address, which is confusing given the daemon config and README; consider either wiring the address into an actual containerd-based runtime or renaming/removing the parameter to match the current behavior. - In
daemon.scaleUpyou ignore errors fromcontainer.NewSpecBuilderandmodelStore.GetModelPath(using_for the error), which can lead to creating containers with an empty model path or an unsupported backend; propagate or handle these errors so scaling up fails fast instead of silently misconfiguring replicas. - The daemon client’s error handling in
pkg/dmrlet/daemon/api.go(Client.Serve) assumes the error body is JSON-decodable into a string, but the server useshttp.Error(plain text), so the decode will likely fail and drop the real message; consider reading the body as raw bytes for non-200 responses and returning that content directly in the error.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `pkg/dmrlet/container/manager.go`, `NewManager` takes a `containerd` address but unconditionally selects a Docker CLI runtime and ignores the address, which is confusing given the daemon config and README; consider either wiring the address into an actual containerd-based runtime or renaming/removing the parameter to match the current behavior.
- In `daemon.scaleUp` you ignore errors from `container.NewSpecBuilder` and `modelStore.GetModelPath` (using `_` for the error), which can lead to creating containers with an empty model path or an unsupported backend; propagate or handle these errors so scaling up fails fast instead of silently misconfiguring replicas.
- The daemon client’s error handling in `pkg/dmrlet/daemon/api.go` (`Client.Serve`) assumes the error body is JSON-decodable into a string, but the server uses `http.Error` (plain text), so the decode will likely fail and drop the real message; consider reading the body as raw bytes for non-200 responses and returning that content directly in the error.
## Individual Comments
### Comment 1
<location> `pkg/dmrlet/daemon/daemon.go:590-593` </location>
<code_context>
+ return d.logAggregator.StreamLogs(context.Background(), deployment.Containers[0], lines, follow)
+}
+
+func (d *Daemon) allocatePort() int {
+ port := d.nextPort
+ d.nextPort++
+ return port
+}
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Port allocation is not concurrency-safe and can race between Serve/scaleUp calls
allocatePort updates d.nextPort without synchronization. Serve calls it under d.mu, but scaleUp calls it without holding the lock, so concurrent calls can race and assign duplicate ports. Please protect nextPort with d.mu (or an atomic), or move port assignment under a shared lock so all callers synchronize consistently.
</issue_to_address>
### Comment 2
<location> `pkg/dmrlet/daemon/api.go:500-503` </location>
<code_context>
+ }
+ defer resp.Body.Close()
+
+ if resp.StatusCode != http.StatusOK {
+ return c.fetchStatsEndpoint(ctx, endpoint)
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Client.Serve error handling assumes JSON string body but server uses http.Error with plain text
Because the server uses http.Error for non-200 responses, the body is plain text. Decoding it as JSON into a string will usually fail and drop the real error message. Instead, read resp.Body as raw bytes and surface that content in the error (falling back to resp.Status if the body is empty), and apply the same pattern to the other client methods that ignore the error body.
</issue_to_address>
### Comment 3
<location> `README.md:427` </location>
<code_context>
+| Feature | Kubernetes | dmrlet |
+|---------|------------|--------|
+| Multi-GPU setup | Device plugins + node selectors + resource limits YAML | `dmrlet serve llama3 --gpus all` |
+| Config overhead | 50+ lines YAML minimum | Zero YAML, CLI-only |
+| Time to first inference | Minutes (pod scheduling, image pull) | Seconds (model already local) |
+| Model management | External (mount PVCs, manage yourself) | Integrated with Docker Model Runner store |
</code_context>
<issue_to_address>
**suggestion (typo):** Consider adding "of" for smoother grammar in this table entry.
Change “50+ lines YAML minimum” to “50+ lines of YAML minimum” or “at least 50 lines of YAML” for clearer grammar.
```suggestion
| Config overhead | 50+ lines of YAML minimum | Zero YAML, CLI-only |
```
</issue_to_address>
### Comment 4
<location> `README.md:502` </location>
<code_context>
+# DAEMON: running
+# SOCKET: /var/run/dmrlet.sock
+#
+# GPUS:
+# GPU 0: NVIDIA A100 80GB 81920MB (in use: llama3.2)
+# GPU 1: NVIDIA A100 80GB 81920MB (available)
</code_context>
<issue_to_address>
**issue (typo):** Typo: "GPUS" should be "GPUs".
In the status example, change the header label from "GPUS" to "GPUs" to use the correct plural form.
```suggestion
# GPUs:
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces dmrlet, a new container orchestrator for AI inference. The changes are extensive, adding a new CLI tool and several backend packages for managing containers, GPUs, services, and more. The overall architecture is well-designed, with clear separation of concerns between components like the daemon, container manager, GPU allocator, and service registry.
My review focuses on improving the robustness and correctness of the implementation. I've identified a few high-priority issues, including the use of the docker CLI instead of the Go SDK which can be brittle, and some bugs in the API client related to log streaming and error handling. I've also included some medium-severity suggestions to address potential race conditions, incomplete features, and hardcoded values.
Overall, this is a great addition with a solid foundation. Addressing these points will make dmrlet more reliable and maintainable.
f451361 to
2022f0d
Compare
dmrlet is a "Kubelet for AI" that runs inference containers directly with zero YAML overhead. It provides a simple CLI to serve models: dmrlet serve ai/smollm2 # Pulls model, starts inference container, exposes OpenAI API Key features: - Reuses existing pkg/distribution for model management - containerd integration for container lifecycle - GPU detection and passthrough (NVIDIA/AMD) - Auto port allocation (30000-30999 range) - Health checking with configurable timeout - Backend auto-detection (llama-server for GGUF, vLLM for safetensors) Commands: serve, stop, list, pull, version Signed-off-by: Eric Curtin <[email protected]>
dmrlet is a "Kubelet for AI" that runs inference containers directly
with zero YAML overhead. It provides a simple CLI to serve models:
dmrlet serve ai/smollm2
Pulls model, starts inference container, exposes OpenAI API
Key features:
Commands: serve, stop, list, pull, version