kobe0938

Follow

🫨

Kobe Chen kobe0938

🫨

Follow

I work on Agents & Agents Eval, previously worked on Agents Infra.

23 followers · 12 following

Stanford University
Santa Clara
18:55 (UTC -07:00)

Organizations

kobe0938/README.md

Hi, I'm Kobe 👋

🚀 Currently Maintaining/Contributing

Terminal Bench — Benchmark for LLMs on complex terminal tasks. [paper]
SkillsBench — Evaluating how well skills work and how effective agents are at using them. [paper]
LMCache — The fastest KV cache layer for LLMs. [paper]
OT Agent — Open-source terminal agent from the Open Thoughts team. [blog] [blog]
ClawsBench — Benchmark for claw-like agents. [paper]

🛠️ Previous Projects

Agents & Evaluation

Harbor — Agent evaluation framework and RL environment toolkit. Contributor.
lmcache-agent-trace — Agent application, benchmark, and workload traces for LLM serving research.
claude-code-tracing — Tracing tooling for Claude Code agent runs. [blog]

LLM Inference & Serving Infra

vLLM / production-stack — High-throughput LLM inference engine and its K8s-native serving stack. Contributor.
inference-engine-arena — Postman & Chatbot Arena for inference benchmarking. (Open-sourced ~3 months before SemiAnalysisAI/InferenceX.)
cacheserve — KV-cache-aware serving experiments. [paper]
lmcache-trace-analysis / mooncake-trace-replayer — Trace analysis & replay for inference workloads.

Others

Continuum — Multi-turn LLM agent scheduling with KV-cache time-to-live for efficient serving. Contributor. [paper]
VidGen — Diffusion + autoregressive models for interactive video/game generation (Diffusive AI).
LAG — Research experiments.
citation-verifier — Verifying citations produced by LLM agents (TypeScript).

Pinned Loading

LMCache/LMCache LMCache/LMCache Public

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8k 1.1k
vllm-project/production-stack vllm-project/production-stack Public

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2.3k 391
LMCache/lmcache-agent-trace LMCache/lmcache-agent-trace Public

Agent application/benchmark/workload traces should be placed here.

Python 10 5
Inference-Engine-Arena/inference-engine-arena Inference-Engine-Arena/inference-engine-arena Public archive

Postman & Chatbot Arena for inference benchmarking.

Python 14