[WIP] tool-call: experimental migration of all parsers to peg-parser infra (w/ better test coverage) #18353

ochafik · 2025-12-24T18:30:32Z

TL;DR: it's a lot, but there's a lot more testing than before.

Building on the PEG parser infrastructure introduced in #17136 by @aldehir, this is an experimental effort to migrate all chat template formats to the unified PEG approach.

Why migrate? The current monolithic common/chat.cpp has grown to ~25 ad-hoc parser implementations that are difficult to maintain. Lots of parsing bugs are hard to reproduce and diagnose (esp. if the user wasn't in --verbose mode).

The PEG infrastructure offers a cleaner path forward, w/ strong guarantees (modulo bugs) that what is allowed to be generated should be parseable.

How to Test

# Add the branch as a remote
git remote add ochafik https://github.com/ochafik/llama.cpp.git
git fetch ochafik peg-migration-complete
git checkout ochafik/peg-migration-complete

# Build
cmake -B build && cmake --build build -j

# Run llama-server w/ and w/o experimental new parsers
./build/bin/llama-server --experimental-new-parsers --jinja -m your-model.gguf

# Run parser tests (new & legacy)
./build/bin/test-chat

Changes:

common/chat-parsers/*.cpp - 28 modular parser implementations
Feature flag --experimental-new-parsers - defaults to off, nothing changes by default
- Kept old codepaths more or less intact to avoid regressions
PEG Infrastructure Changes to make it easier to write many PEG parsers (cc/ @aldehir - happy to revert if you disagree with any of these)
- Enum-based tags - Replaced string AST tags with integer tag IDs for type safety and faster dispatch
- Lambda mappers - Mapper functions defined as lambdas to reduce boilerplate
- Shared tag enums - Many parsers now share common tags (CONTENT, REASONING, TOOL_CALL, etc.)

New "Needle" Streaming Tests

Existing streaming tests (tools/server/tests/unit/test_tool_call.py) required loading real models and cover only a subset of formats. This PR adds systematic coverage for all 21 formats without the model-loading overhead.

This migration was designed to be safe through systematic test constraints:

21 formats x 6+ scenarios = up to 126 regression tests (some scenarios filtered based on format capabilities)

Each format tests:

Streaming:
- Verifying that partial parsing is "monotonous" (always parses at least as before when given new tokens)
- WIP: Verifying that each semantic field (content, reasoning_content, tool name & arguments) is streamed
Combos
- reasoning_content + tool_call / content (or 3 of them when template supports it)
- Tool calls (single and parallel)
- Tool choices (none, auto, required)
- json_schema + reasoning_content
Negative tests:
- Required mode doesn't tolerate parsing normal content

How Needle Tests Work

The "needle" technique injects unique marker pairs into each semantic field. For example, in Hermes 2 Pro format with thinking and a tool call:

<think>Thinking $N1R$ deeply $N2R$ done</think>
Before $N1C$ middle $N2C$ after
<tool_call>{"name":"$N1TN$_0$N2TN$_0","arguments":{"$N1AK$_0$N2AK$_0":"$N1AV$_0$N2AV$_0"}}</tool_call>

The test parses this message at every character boundary (simulating streaming), and verifies:

Check	What it catches
Both needles present	Content not lost or truncated
N1 before N2 (each pair)	Out-of-order emission, lack of streaming, buffering bugs
Tool names atomic	Function name never split mid-stream (tool name needles must land together, or none of them)
Arguments never regress	Tool args never get shorter during streaming
Keys complete in order	Key N finishes before key N+1 starts

This aims to prove parsers are truly incremental: partial input produces partial output, fields stream in proper order, and nothing is buffered unnecessarily.

Known Limitations

The PEG implementation has gaps vs legacy (TBC):

JSON schema edge cases: allOf/anyOf/$ref patterns not fully handled
maxLength grammar growth: Large constraints can explode grammar size (WIP: added until_max w/ weird implementation, maybe we just drop maxLength on xml formats)
Streaming edge cases: Malformed input handled differently
Ambiguous grammars: PEG requires unambiguous grammars

Proposed Migration Plan

Merge with safe default (legacy parsers active)
Gather feedback from users opting in via --experimental-new-parsers
Debug issues as they're reported
Flip default once stable
Drop legacy code entirely:
- common/chat-parser.cpp: ~28 legacy parser functions (~900 lines)
- common/chat.cpp: ~19 legacy init functions (~600 lines)
- common/chat-peg-parser.cpp/.h: class-based builders/mappers (~220 lines)
- common/chat-parser-xml-toolcall.cpp/.h: XML grammar builder (~900 lines) - new PEG parsers generate grammars directly from their parser definitions

TODOs before undrafting

Follow up work

Move new capability detections to Minja (minja#20) to simplify the test configuration:
- supports_tool_call_id - Whether tool calls include IDs
- reasoning_requires_tools - Whether thinking mode only works with tools
- tools_emit_content_with_calls - Whether tool calls can include content

aldehir · 2025-12-24T19:43:41Z

Holy...

happy to revert if you disagree with any of these

I'll have to look more closely, but nothing particular stands out as disagreeable.

JSON schema edge cases: allOf/anyOf/$ref patterns not fully handled

Is this in reference to the raw string parameters that circumvent json-schema-to-grammar? I had an implementation that used it to generate raw string patterns but unfortunately found many edge cases from lazy schema generators. It's also a challenge to constrain an arbitrary pattern from matching beyond the delimiter.

I intend to add a schema optimization step that should help a bit. Unfortunately, I'm starting to think we might have to loosen the grammar constraining and rely more on the agent feedback loop if model creators no longer want to emit JSON.

pwilkin · 2025-12-24T20:51:41Z

@ochafik actually I've been working on an alternative approach - an automatic peg parser generator based on differential template output analysis that will handle like 90% of all templates out of the box - then we only have to handle the really hard/weird/atypical ones.

ochafik · 2025-12-24T22:28:54Z

JSON schema edge cases: allOf/anyOf/$ref patterns not fully handled

Is this in reference to the raw string parameters that circumvent json-schema-to-grammar? I had an implementation that used it to generate raw string patterns but unfortunately found many edge cases from lazy schema generators. It's also a challenge to constrain an arbitrary pattern from matching beyond the delimiter.

@aldehir Sorry no, I meant for the handling of anyOf and some other schema refinements (minItem / maxItem, minimum/maximum, etc) which don't seem handled by common_peg_parser_builder::schema (although not $ref - that's before i realised you did resolve them)... and in particular of maxLength for string arguments, which feeds into the larger problem of raw string arguments, indeed.

Without a clean way to escape xml arg end blocks in string arguments (and corresponding fine-tuning) all the xml formats seem inherently fragile, unless we accept they contain a JSON string (which I'd done in my first Qwen3 parser). XML-esque escaping of < as < would be enough, but doesn't seem to be a thing.

Alternatively we could introduce token-based parsing (e.g. until_token(...) which resolves the token id from the vocab (tried in this branch) but many models don't have dedicated tokens for these special boundaries; when they do, it's likely when a model samples the arg string "</end_arg>" it would do so w/ individual chars, as opposed to a single special token).

Re/ strings args, I've added a schema_or_raw_string_until to do the same treatment across the afflicted parsers.

automatic peg parser generator based on differential template output analysis that will handle like 90% of all templates out of the box

@pwilkin this sounds super exciting, I wanted to do this in Minja at some point but never finished my prototype / got desperate as i realized how diverse the practices were in Jinja templates 😓

But more importantly... lots of small models output tool calls with an alarmingly high creativity, not at all guessable by just analyzing templates (Qwen 2.5 comes to my mind: I had to sample many tool calls to see what it comes up in practice, hence the horribly long and accommodating list of triggers in the Hermes 2 grammar & parser). Not to mention outright broken templates.

(could you integrate alternative syntaxes by having multiple templates feed into the same diff engine?)

Anyway, would love to see what you come up with! (oh and btw, feel free to host this in https://github.com/ochafik/minja if it needs too much testing infra for llama.cpp: it already has all the templates, could be a good fit / playground)

aldehir · 2025-12-24T23:50:41Z

Sorry no, I meant for the handling of anyOf and some other schema refinements (minItem / maxItem, minimum/maximum, etc) which don't seem handled by common_peg_parser_builder::schema

Oh, interesting. I thought this would be handled by delegating to builder.add_schema(). It does require resolving refs outside. I must have missed something.

I came to the same conclusions with the XML formats. The vLLM tool call parsing seems affected by the same problems. I'm not sure there's much more we can do.

unless we accept they contain a JSON string

I worry this will impact model performance. I've noticed that with Devstral (not an XML format), heavy constraining is detrimental to its performance.

I rather accept that it will fail when </parameter> is in the argument value and get the best performance for the remaining 95% use cases.

// GBNF doesn't have token-awareness, so same as text-based until

It does now (#17816).I was slowly working my way to making the parser token aware, but for the same reasons--i.e. the argument delimiter is not a unique token--it became a low priority. I mainly added it in to handle grammar enforced reasoning budget.

aldehir · 2025-12-25T04:03:04Z

There's a desire to refactor the common chat API (ref #18215). I was working on this, but given this PR and @pwilkin's effort, I'll wait to see what direction we go. Otherwise merging will be a challenge.

This PR accomplishes part of what I had in mind: splitting the parsers into separate files. I'd also like to maintain a stateful object during an entire generation request. That would let us remove the serialization mechanics in the PEG parser and implement a variant of incremental Packrat parsing (https://ohmjs.org/pubs/sle2017/incremental-packrat-parsing.pdf).

ochafik · 2025-12-26T00:16:43Z

I came to the same conclusions with the XML formats. The vLLM tool call parsing seems affected by the same problems. I'm not sure there's much more we can do.

@aldehir We could experiment w/ a bit of jinja hackery: ideally we'd want models to think of escaping </end_arg> to something like <end_tag>, and & to & (that's it I guess?). We could then (in minja) detect when strings are unescaped (after a teeny bit of that diff analysis @pwilkin mentioned to infer their end arg tag), and have an opt-in polyfill that would auto escape said arg, along with an extra system prompt that explains the escape scheme. Not a fan of prompt engineering in a low-level layer like minja, but as an opt-in, w/ said extra system prompt configurable by the library users, might be acceptable. Then, to see if it helps we could use the tool call benchmark stuff from #12034 on some coding task involving XML tags similar to the model's syntax

Longer term we should probably start a page about best practices for template writers / finetuning folks (how to pick your tool call syntax and make it work reliably by using the right special tokens, how to escape things properly, etc).

I'd also like to maintain a stateful object during an entire generation request. That would let us remove the serialization mechanics in the PEG parser

Sounds fab, just note that in the OG tool calls support PR it was pointed out sharing objects between the two sides could be a bit iffy wrt/ threading.

aldehir · 2025-12-27T14:07:09Z

Not a fan of prompt engineering in a low-level layer like minja, but as an opt-in, w/ said extra system prompt configurable by the library users, might be acceptable.

Not a fan either. My hope is model creators will wise up and add some way to disambiguate. It looks like Deepseek v3.2 has addressed this to a degree.

note that in the OG tool calls support PR it was pointed out sharing objects between the two sides could be a bit iffy wrt/ threading.

The parser is thread-safe, as it should not be mutated after creation. The parse context should share the same thread context as the accumulated generation string. I think we're good w.r.t thread safety.

I have a few nits, but I'll wait until you're done iterating.

ochafik · 2025-12-28T01:22:59Z

I have a few nits, but I'll wait until you're done iterating.

@aldehir It should look a bit less gruesome now (sorry for the moving target!)

I just wanna make another pass at the few parsers I couldn't trivially refactor to use the new helpers (build_json_tool_calls_peg_parser & build_generic_tool_calls_peg_parser; kimi k2 should be easy, and hermes 2 pro, llama 3.x, not 😅), and test-chat will also need a good lifting but already the grammars should be more complete in terms of json_schema support, mixing thoughts w/ content / tool-calls, etc (see TODO list for directions where i wanna take the test coverage)

aldehir · 2025-12-28T01:32:03Z

(build_json_tool_calls_peg_parser & build_generic_tool_calls_peg_parser;

I’m not a fan of this level of abstraction. I had a similar concern with the XML parsing implementation: it requires changes to these builders for every new model that’s close but needs a slight tweak. If an abstraction has to be modified for every new case, it’s not really abstracting much. More so the generic tool calls parser.

ochafik · 2025-12-28T16:13:57Z

(build_json_tool_calls_peg_parser & build_generic_tool_calls_peg_parser;

I’m not a fan of this level of abstraction. I had a similar concern with the XML parsing implementation: it requires changes to these builders for every new model that’s close but needs a slight tweak. If an abstraction has to be modified for every new case, it’s not really abstracting much. More so the generic tool calls parser.

@aldehir Agree on the principle, although it can be a good way to focus on the diffs esp. in building phase. I've now inlined build_json_tool_calls_peg_parser (better indeed), but I don't like the result of inlining the other.

Also, made tests prettier (split each parser into its own lil test file), there's lots of failures I had to skip, you can run them all (incl. skipped) w/ TEST=all build/bin/test-chat, or run invidual tests (e.g. TEST=magistral_legacy ...

aldehir · 2025-12-28T16:21:48Z

but I don't like the result of inlining the other.

Hmm, this kind of helper is what @pwilkin wanted as well so I guess I'm out-voted. We'll keep it as is then.

Awesome. I'll start reviewing the changes, but might take me a couple of days. It's a big PR!

Core PEG parser infrastructure improvements: - Replace string AST tags with integer tag IDs for type safety and faster dispatch - Mapper functions defined as lambdas to reduce boilerplate - Add token_tag and literal_tag helper methods for token-aware parsing - Improve GBNF grammar generation for optional repetitions and lazy mode - Make SPACE_RULE optional in JSON schema grammar generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Migrate 28 chat template formats to the unified PEG parser approach: - Each format gets its own module in common/chat-parsers/ - Shared internal header for common patterns and utilities - XML toolcall infrastructure for formats using XML-style tool calls - Updated routing in chat.cpp and chat-parser.cpp Formats migrated: Apertus, Apriel 1.5, Command R7B, DeepSeek R1/V3.1, Firefunction V2, Functionary V3.1/V3.2, Generic, GLM 4.5, GPT-OSS, Granite, Hermes 2 Pro, Kimi K2, LFM2, Llama 3.x, Magistral, MiniMax M2, Ministral 3, Mistral Nemo, Nemotron V2/V3, Qwen3 Coder XML, Seed OSS, Xiaomi MIMO. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add systematic streaming test coverage for all 28 parser formats: - Needle streaming tests inject unique markers into each semantic field - Tests verify incremental parsing at every character boundary - Cross-check declared capabilities against minja's runtime detection - 21 formats x 6+ scenarios = 126+ regression tests The needle technique catches: - Content loss or truncation (both needles must be present) - Out-of-order emission and buffering bugs (N1 before N2 per pair) - Function name splitting during streaming (atomic tool names) - Tool argument regression (args never get shorter) - Key ordering violations (key N finishes before key N+1) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add feature flag infrastructure for safe migration: - --experimental-new-parsers CLI flag (defaults to off) - LLAMA_USE_NEW_PARSERS env var for testing - Dual-path testing: legacy parsers active by default - Server integration for runtime parser selection Also includes: - Nemotron Nano template fix - Granite template rename for consistency - Test model fetching updates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

When tool_choice is "required", the model MUST output tool calls without any preceding content. This change ensures that the following parsers enforce this constraint: - apertus.cpp - glm-4-5.cpp - lfm2.cpp - minimax-m2.cpp - nemotron-v2.cpp - nemotron-v3.cpp - qwen3-coder-xml.cpp Previously, these parsers would accept content before tool calls even in required mode, allowing the model to bypass the tool call requirement. Now, when require_tools is true, the parser only matches tool calls (with optional reasoning blocks where supported). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Consolidates the common pattern in XML-based chat parsers for handling schema-based parameter values: - For string schemas: use until[_max](delimiter, maxLength?) - For non-string schemas: use schema() with optional space wrapping This reduces boilerplate across glm-4-5, minimax-m2, qwen3-coder-xml, and seed-oss parsers (removes ~99 lines, adds ~94 lines with docs). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

These templates' parsers don't allow content in tool_choice=required mode, but the test was generating content because tool_required_allows_content defaulted to true. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Make native mapper create tool calls lazily on TOOL_NAME, not TOOL_OPEN - Buffer pending TOOL_ID when it comes before TOOL_NAME (Command R7B format) - Add is_partial check for TOOL_ID to skip incomplete IDs - Fix test_parser_with_streaming to distinguish new vs updated tool calls - Integrate test_parser_with_streaming into needle test suite - Improve parse error messages with context 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Fixes GBNF grammar generation for templates with Unicode tokens. DeepSeek R1/V3 use fullwidth vertical line (U+FF5C) and lower one eighth block (U+2581) in their special tokens. The byte-based trie was generating invalid UTF-8 prefixes in exclusion patterns. Changes: - Add unicode_trie that works with code points instead of bytes - Update gbnf_escape_codepoint to handle non-ASCII characters - Add tests for DeepSeek-style Unicode token formats Enables: deepseek_r1:experimental, deepseek_v3_1:experimental 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Two fixes for streaming/partial parsing: 1. Skip partial TOOL_CLOSE nodes to prevent premature argument closing during streaming. Without this, arguments would be closed before seeing the full closing tag. 2. Wrap tool name+delimiter in p.atomic() to prevent TOOL_NAME emission on prefix match. This fixes the case where "special_function" would match as complete when input is "special_function_with_opt", causing tool count to decrease when more input arrives. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Move magistral detection before ministral_3 in template matching (both use [TOOL_CALLS][ARGS] patterns) - Add "Unsloth" check to distinguish magistral templates - Rewrite PEG parser to match actual template format: [TOOL_CALLS]name[ARGS]{...} (not JSON array with ids) - Update test to reflect format doesn't include tool call ids Enables: magistral:legacy, magistral:experimental 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Add p.space() before tool call patterns in parsers to handle models that output leading whitespace or newlines before tool calls. This is a mechanical fix applied consistently across parsers. Affected parsers: apertus, apriel-1-5, command-r7b, deepseek-v3-1, functionary-v3-1-llama-3-1, generic, gpt-oss, granite, hermes-2-pro, kimi-k2, lfm2, llama-3-x, mistral-nemo, nemotron-v2, xiaomi-mimo 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…alled Add `p.choice({with_tools, content_only})` pattern to parsers that were requiring tool calls when tools were provided, even for content-only responses. Files changed: - glm-4-5.cpp: Add content_only fallback - minimax-m2.cpp: Add content_only fallback - ministral-3.cpp: Add content_only fallback + atomic wrapper for tool name Enables: glm_4_5:experimental, minimax_m2:experimental, ministral_3:both 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Use `additional_stops` instead of inline `consume_eos()` pattern - Add `p.space()` after JSON args to handle model's trailing whitespace - Restructure parallel tool calls handling for cleaner parsing - Add test case matching server test scenario 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Fix `param_ends` to not include leading `\n` (already consumed by `space()` in value parser). Simplifies content-only path. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Kimi template splits messages into hist_msgs (up to last non-tool-call assistant) and suffix_msgs (after). Both get `<think></think>` tags, but: - hist_msgs: reasoning_content is discarded (empty think tags) - suffix_msgs: reasoning_content is preserved The needle tests use a single assistant message which becomes the "last non-tool-call assistant" and goes to hist_msgs, so reasoning is discarded. - Mark `supports_disable_thinking=No` since think tags are always output - Skip run_template_test_suite for experimental impl (needle tests incompatible with this message splitting) Enables: kimi_k2:experimental 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Fix `p.chars("0-9")` to `p.chars("[0-9]", 1, 10)` - the first argument is a regex character class pattern, not a range string. Also specify min/max repetitions (1-10 digits for tool call ID). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add NEW_PARSERS_UNSUPPORTED dict to document templates with known issues when using experimental parsers in server tests: - LFM2: requires special system message marker - Llama 3.x: builtin tools need custom TOOL_ARG_NAME handling - Functionary v3.2: python tool allows raw code fallback - Nemotron v3: tiny model generates invalid parameter structure - GPT-OSS: tiny model generates unparseable content - Kimi K2: tiny model generates format that fails to parse Also in test-chat.cpp: - Change test name separator from `_` to `:` for easier grep - Add skip logic for force_disable_thinking scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

When repeat(p, min, max) is called with max=0, return eps() instead of creating a repetition parser. This avoids issues with parsers that have no valid matches. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The legacy lfm2 parser requires a "force json schema." marker in the system message to enable tool call grammar. Skip run_template_test_suite for legacy mode since it uses generic inputs without this marker. The explicit tests in test-lfm2.cpp still run and cover the legacy parser behavior with the proper marker. Enables: lfm2:legacy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Upstream now defaults message content to empty string instead of null, which adds "content": "" to JSON output after tool_calls. Update both the PEG grammar and test expectation to handle this. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The test_peg_parser helper was always setting experimental_new_parsers=true, which didn't match the impl parameter being used for other test decisions. Now it respects the impl parameter like other test helpers. This enables nemotron_v3:legacy which was previously failing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

I was testing ggml-org/llama.cpp#18353 and `test-chat` is unbearably slow on MSVC (always has been). The cause is MSVC's `std::regex` engine. The `regex_search()` calls take a considerable amount of time during tokenization. I added some profiling and tested with the [Apertus-8B-Instruct.jinja](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509/blob/main/chat_template.jinja) template: ``` [Profile] parseExpression: 4.93307 s (102 calls) [Profile] non_text_regex: 0.0153389 s (319 calls) [Profile] block_keyword_regex: 0.0048339 s (216 calls) [Profile] block_close_regex: 0.0068446 s (216 calls) [Profile] block_open_regex: 0.0817274 s (535 calls) [Profile] expr_close_regex: 0.0035717 s (102 calls) [Profile] expr_open_regex: 0.0604591 s (637 calls) [Profile] comment_regex: 3.11919 s (639 calls) [Profile] tokenize: 18.1967 s Parsed template from: .\Apertus-8B-Instruct.jinja ``` **18.2s** to parse. This PR adds `std::regex_constants::match_continuous` to the `regex_search()` calls. This flag instructs the regex engine to only match from the start of the iterator, which aligns with the existing `match.position() == 0` check. ``` [Profile] parseExpression: 0.0225291 s (102 calls) [Profile] non_text_regex: 0.013297 s (319 calls) [Profile] block_keyword_regex: 0.0039072 s (216 calls) [Profile] block_close_regex: 0.0052241 s (216 calls) [Profile] block_open_regex: 0.0081572 s (535 calls) [Profile] expr_close_regex: 0.0025207 s (102 calls) [Profile] expr_open_regex: 0.0064239 s (637 calls) [Profile] comment_regex: 0.0076846 s (639 calls) [Profile] tokenize: 0.117085 s Parsed template from: .\Apertus-8B-Instruct.jinja ``` **0.12s**, much better.

loci-dev mentioned this pull request Dec 24, 2025

UPSTREAM PR #18353: [WIP] tool-call: experimental migration of all parsers to peg-parser infra (w/ better test coverage) auroralabs-loci/llama.cpp#692

Open

1 task

github-actions bot added documentation Improvements or additions to documentation script Script related testing Everything test related examples python python script changes server labels Dec 24, 2025

ochafik force-pushed the peg-migration-squashed branch 3 times, most recently from e8aa015 to 427afe6 Compare December 25, 2025 01:39

ochafik and others added 9 commits December 31, 2025 09:25

test required tool calls w/ new parsers

c880f4f

fix typo in test_tool_call

df463aa

ochafik and others added 24 commits December 31, 2025 09:43

fix TEST=llama_3_x_legacy test-chat

acb4212

fix TEST=gpt_oss_experimental

f5d3d2b

test-chat: better logging upon failure (dump messages)

50b8f69

rename: test_systematic_needle_streaming -> run_template_test_suite

7c5b46d

have TEST match substrings

7de1a40

fix typo in test-nemotron-v2

636fc51

pass param_ends as array of delimiters

304569c

qwen3-coder: use additional stops instead of consume_end_block()

93dfce6

qwen3-coder: add test case that required multiple param_ends

3085e34

make TOOL_NAME and TOOL_ARG_NAME ~atomic in mappers

40fcc3f

nemotron-v3: fix parameter delimiter parsing

32d2958

Fix `param_ends` to not include leading `\n` (already consumed by `space()` in value parser). Simplifies content-only path. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

ochafik force-pushed the peg-migration-squashed branch from bf5cb42 to c4ff3e4 Compare December 31, 2025 10:09

ochafik and others added 2 commits December 31, 2025 14:24

rm dead code: common_chat_peg_mapper_func, content_until in qwen3-coder

ee18b3b

ochafik force-pushed the peg-migration-squashed branch from 7afae43 to ae718c2 Compare December 31, 2025 15:20

aldehir mentioned this pull request Jan 1, 2026

Fix slow tokenize() on Windows/MSVC ochafik/minja#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] tool-call: experimental migration of all parsers to peg-parser infra (w/ better test coverage) #18353

[WIP] tool-call: experimental migration of all parsers to peg-parser infra (w/ better test coverage) #18353

ochafik commented Dec 24, 2025 •

edited

Loading

Uh oh!

aldehir commented Dec 24, 2025 •

edited

Loading

Uh oh!

pwilkin commented Dec 24, 2025

Uh oh!

ochafik commented Dec 24, 2025 •

edited

Loading

Uh oh!

aldehir commented Dec 24, 2025 •

edited

Loading

Uh oh!

aldehir commented Dec 25, 2025

Uh oh!

ochafik commented Dec 26, 2025 •

edited

Loading

Uh oh!

aldehir commented Dec 27, 2025

Uh oh!

ochafik commented Dec 28, 2025 •

edited

Loading

Uh oh!

aldehir commented Dec 28, 2025

Uh oh!

ochafik commented Dec 28, 2025 •

edited

Loading

Uh oh!

aldehir commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] tool-call: experimental migration of all parsers to peg-parser infra (w/ better test coverage) #18353

Are you sure you want to change the base?

[WIP] tool-call: experimental migration of all parsers to peg-parser infra (w/ better test coverage) #18353

Conversation

ochafik commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to Test

Changes:

New "Needle" Streaming Tests

How Needle Tests Work

Known Limitations

Proposed Migration Plan

TODOs before undrafting

Follow up work

Uh oh!

aldehir commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Dec 24, 2025

Uh oh!

ochafik commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Dec 25, 2025

Uh oh!

ochafik commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Dec 27, 2025

Uh oh!

ochafik commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Dec 28, 2025

Uh oh!

ochafik commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ochafik commented Dec 24, 2025 •

edited

Loading

aldehir commented Dec 24, 2025 •

edited

Loading

ochafik commented Dec 24, 2025 •

edited

Loading

aldehir commented Dec 24, 2025 •

edited

Loading

ochafik commented Dec 26, 2025 •

edited

Loading

ochafik commented Dec 28, 2025 •

edited

Loading

ochafik commented Dec 28, 2025 •

edited

Loading