Skip to content

FlexibleLiteralChoiceTerminal.parse() should offer the exact-choice match, not only the greedy shape regex #491

Description

@SJrX

Summary

The new list-of-successes engine (Combinator.parse(), #467) diverges from the original SyntacticMatch/SemanticMatch engine for FlexibleLiteralChoiceTerminal. This must be resolved before the new engine can become the default (it currently ships behind the ExperimentalSettings.useGrammarParseEngine flag).

The divergence

FlexibleLiteralChoiceTerminal matches a token's shape leniently (so a wrong value can still be located/highlighted) and marks it valid only when it equals one of the exact choices.

  • Old engine (SyntacticMatch/SemanticMatch): checks the exact choices via startsWith first, and only falls back to the lenient shape regex if no exact choice matches.
  • New engine (parse()): runs only the lenient shape regex (syntaticMatch.matchAt), takes that greedy match, then sets valid = choices.any { it == text }.

So given FlexibleLiteralChoiceTerminal("abc", "defg") and input "abcxx":

matched text valid? result
old abc yes semantic match, end=3
new abcx (regex [a-z]{1,4} greedy) no (abcx ∉ choices) semantic: no match; syntactic: abcx

This is captured today in GrammarParseTest.testFlexibleLiteralChoiceTerminalMatchesSemanticallyFirstLongest, which asserts the current (divergent) new-engine behaviour with a comment, so the difference is visible rather than silent.

Why it usually doesn't bite (but still should be fixed)

In the real grammars, FlexibleLiteralChoiceTerminal tokens are delimiter-terminated (whitespace, EOF, etc.), so the greedy regex stops at the delimiter and lands on the exact token anyway. The pathological case is an exact choice that is a strict prefix of a longer same-class run with no delimiter. It's latent, but it's a correctness gap that breaks the parity guarantee.

Proposed fix

In the spirit of list-of-successes, parse() should offer every way the terminal can match at the offset — i.e. yield a Parse for each exact choice that matches (valid = true), and the greedy shape match (valid = choices.any { it == text }) — rather than only the single greedy regex match. validate() already prefers the first fully-valid full parse, so offering the exact-choice match lets the valid branch win, restoring parity with the old engine and matching how LiteralChoiceTerminal.parse() already offers every matching choice.

Acceptance

  • FlexibleLiteralChoiceTerminal.parse() offers the exact-choice match(es) in addition to the greedy shape match.
  • GrammarParseTest.testFlexibleLiteralChoiceTerminalMatchesSemanticallyFirstLongest is updated to assert parity with GrammarTest (semantic match = abc, end=3) instead of the divergence.
  • Full suite passes under both engines (./gradlew test and ./gradlew test -Dsystemd.unit.grammarParseEngine=true).

Found while porting GrammarTest to the new engine in PR #490.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions