DFA: compact contexts after compilation, 54-85% memory reduction#1978
Merged
DFA: compact contexts after compilation, 54-85% memory reduction#1978
Conversation
Add matchDFAToAST — a DFA matcher that produces a structural MatchAST instead of using slot-based value computation. Uses minimal munch with backtracking: wildcards consume as few tokens as possible, decision points are recorded when literals are preferred over wildcards. Key changes: - dfa.ts: MatchAST types (TokenMatchNode, WildcardMatchNode, etc.) - dfaMatcher.ts: matchDFAToAST, matchDFAToASTWithSplitting, evaluateMatchAST (bottom-up value computation from grammar ValueNodes), isRuntimeChecked (fixes wildcard type checked/unchecked at match time) - nfaDfaParity.spec.ts: AST parity tests covering unchecked wildcards, two-wildcard rules, number/Ordinal/Cardinal entities, priority - dfaBenchmark.spec.ts: Added AST matcher timing + 5 real-world agent grammars (list, desktop, calendar, weather, browser) Benchmark results across 6 grammars: - DFA state ratio: 0.17x–0.54x (fewer states than NFA) - DFA memory: 1.5x–3.2x larger per state (transitions + captureInfo) - AST matcher: ~88x avg speedup (matched), ~158x (unmatched) vs NFA Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In AGR, "string" and "wildcard" are synonyms for untyped wildcards. The NFA interpreter and completion code already checked for both, but the DFA compiler and slot-based matcher only excluded "string". This caused $(track:wildcard) to be incorrectly marked as a checked entity. Fixed in: dfaCompiler.ts (compile-time isChecked), dfaMatcher.ts (runtime entryIsChecked + entity conversion guard), nfaCompiler.ts (deprecated compileWildcardPart path). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tor tests - Add O(1) first-token rejection to DFA/AST matchers (WeakMap-cached index built from DFA start state transitions). Unmatched requests now reject as fast as NFA+index (~0.02μs). - Fix weather grammar: remove double-quoted phrases (AGR parser treats quotes as literal chars), switch $(location:string) to wildcard. - Update grammarGenerator tests to match current output: wildcard not string, Cardinal not number, bare variable names in value expressions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r nested RulesPart The merge overwrote evaluateMatchAST with a version that assumed rule.value exists directly on the matched rule. But for grammars with alternatives like <Start> = <play> | <stop>, the DFA AST matcher inlines alternatives (producing token/wildcard nodes directly, not ruleRef nodes), so the value expression lives on the nested alternative inside the RulesPart, not on the wrapper rule. Restores findValueExpression + matchesRuleStructure to search through nested RulesPart structures and find the correct value expression by structural comparison. Also removes duplicate exports in index.ts from the merge. All 1102 tests pass (167 parity, 28 suites). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Execution contexts (DFAExecutionContext[]) were 48-77% of DFA memory but only needed during compilation, not at match time. After building the DFA: - Extract ruleIndex and activeRuleIndices directly onto DFAState - Free the contexts arrays via DFABuilder.compact() - Update all matchers to read from state.ruleIndex instead of contexts[] Results (DFA KB before -> after, % reduction): player: 80.6 -> 14.8 (82%) desktop: 677.1 -> 135.4 (80%) weather: 131.1 -> 19.9 (85%) list: 88.8 -> 28.3 (68%) calendar: 41.8 -> 19.2 (54%) browser: 46.8 -> 14.7 (69%) DFA now uses 0.28-0.80x the NFA's memory while being 30-700x faster. All 1102 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nsitions Standard DFA subset construction requires that for input symbol `a`, the target DFA state includes ALL reachable NFA states — both literal `a` transitions AND wildcard transitions. The previous code separated them, causing false negatives in dfaAccepts() when a literal token overlapped with a wildcard in a different grammar alternative. Example: browser grammar `click (on)? (the)? link $(keywords:wildcard)` vs `click (on)? $(keywords:wildcard)` — input "click on the sign up link" was rejected because state 48 (after matching "the" as literal) lost the wildcard path. Also adds 87 real agent grammar value parity tests covering player, desktop, calendar, weather, browser, and list grammars. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Execution contexts (
DFAExecutionContext[]) were 48-77% of all DFA memory but only needed during compilation — not at match time. This PR extracts the needed fields (ruleIndex,activeRuleIndices) directly ontoDFAState, then frees the contexts arrays viaDFABuilder.compact().Depends on #1976 (currently in merge queue).
Space: Before vs After
DFA now uses 0.28–0.80x the NFA's memory (previously 1.5–2.9x).
Speed: DFA/AST vs NFA (μs/call, 1000 iterations)
Avg matched speedup: ~96x. Avg unmatched speedup: ~1161x.
Changes
dfa.ts: AddruleIndex,activeRuleIndicestoDFAState; removecontextIndexfrombestPriority; addDFABuilder.compact()static methoddfaCompiler.ts: CallDFABuilder.compact(dfa)after builddfaMatcher.ts: Replace allcontexts[bestPriority.contextIndex]lookups with directstate.ruleIndex; usestate.activeRuleIndicesfor completionsTest plan
🤖 Generated with Claude Code