Skip to content

Develop#175

Open
solderzzc wants to merge 4 commits intomasterfrom
develop
Open

Develop#175
solderzzc wants to merge 4 commits intomasterfrom
develop

Conversation

@solderzzc
Copy link
Member

No description provided.

solderzzc and others added 4 commits March 20, 2026 23:53
…anity check

- Add MODEL_FAMILIES config table with per-model API params and server flags
- Add getModelApiParams() helper to inject reasoning_effort:none for Mistral
- Add delta.thinking fallback in streaming loop to capture thinking tokens
- Add streaming sanity check before benchmark run (detects empty-token loops)
- Add test-model-config.cjs with 17 unit tests for model detection logic
Add Mistral Small 4 (119B, IQ1_M + Q2_K_XL), NVIDIA Nemotron-3-Nano
(4B + 30B), Liquid LFM2 (1.2B + 24B), Qwen3.5-9B BF16, and
Qwen3.5-27B Q8_K_XL to the benchmark paper.

Key updates:
- Abstract: 7→16 models, 5 families, best local now 95.8%
- Models Under Test table: grouped by family, 16 rows
- Overall Scorecard: full 16-model ranking
- Key Finding 3: quantization precision > parameter count
- Conclusion: Qwen3.5-27B Q8 at 95.8%, Mistral-119B at 89.6%
Add Nemotron and LFM2 model families to MODEL_FAMILIES with
minTemperature: 1.0 — these models reject temperature < 1.0 with
HTTP 400. The benchmark now clamps temperature to the family minimum
before sending the request.

- Refactor getModelApiParams → getModelFamily (returns full config)
- Add resolveTemperature logic in llmCall params builder
- Update test-model-config.cjs: 27 tests including temperature clamp
- Fix Mistral serverFlags to match current llm-server-manager.cjs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant