DFA: compact contexts after compilation, 54-85% memory reduction by steveluc · Pull Request #1978 · microsoft/TypeAgent

steveluc · 2026-03-03T23:36:35Z

Summary

Execution contexts (DFAExecutionContext[]) were 48-77% of all DFA memory but only needed during compilation — not at match time. This PR extracts the needed fields (ruleIndex, activeRuleIndices) directly onto DFAState, then frees the contexts arrays via DFABuilder.compact().

Depends on #1976 (currently in merge queue).

Space: Before vs After

Grammar	DFA States	Before (KB)	After (KB)	Reduction	DFA / NFA
player	62	80.6	14.8	82%	0.28x
list	120	88.8	28.3	68%	0.70x
desktop	724	677.1	135.4	80%	0.51x
calendar	99	41.8	19.2	54%	0.80x
weather	84	131.1	19.9	85%	0.44x
browser	76	46.8	14.7	69%	0.51x

DFA now uses 0.28–0.80x the NFA's memory (previously 1.5–2.9x).

Speed: DFA/AST vs NFA (μs/call, 1000 iterations)

Grammar	Request	NFA	NFA+idx	DFA/AST	AST speedup
player	pause	41.2	1.0	1.5	27x
player	play Shake It Off by Taylor...	238.5	190.2	3.5	68x
desktop	open chrome	120.2	22.9	0.6	189x
desktop	tile notepad and calculator	263.3	139.5	1.0	278x
weather	forecast for Chicago...	1397.6	1183.3	2.0	694x
browser	open google.com	60.6	7.6	0.6	99x
unmatched	install visual studio	80.6	0.1	0.02	3222x

Avg matched speedup: ~96x. Avg unmatched speedup: ~1161x.

Changes

dfa.ts: Add ruleIndex, activeRuleIndices to DFAState; remove contextIndex from bestPriority; add DFABuilder.compact() static method
dfaCompiler.ts: Call DFABuilder.compact(dfa) after build
dfaMatcher.ts: Replace all contexts[bestPriority.contextIndex] lookups with direct state.ruleIndex; use state.activeRuleIndices for completions

Test plan

All 1102 local tests pass (28 suites, 167 parity tests)
Policy check passes (2659 checks)
Benchmark confirms space reduction and no speed regression

🤖 Generated with Claude Code

Add matchDFAToAST — a DFA matcher that produces a structural MatchAST instead of using slot-based value computation. Uses minimal munch with backtracking: wildcards consume as few tokens as possible, decision points are recorded when literals are preferred over wildcards. Key changes: - dfa.ts: MatchAST types (TokenMatchNode, WildcardMatchNode, etc.) - dfaMatcher.ts: matchDFAToAST, matchDFAToASTWithSplitting, evaluateMatchAST (bottom-up value computation from grammar ValueNodes), isRuntimeChecked (fixes wildcard type checked/unchecked at match time) - nfaDfaParity.spec.ts: AST parity tests covering unchecked wildcards, two-wildcard rules, number/Ordinal/Cardinal entities, priority - dfaBenchmark.spec.ts: Added AST matcher timing + 5 real-world agent grammars (list, desktop, calendar, weather, browser) Benchmark results across 6 grammars: - DFA state ratio: 0.17x–0.54x (fewer states than NFA) - DFA memory: 1.5x–3.2x larger per state (transitions + captureInfo) - AST matcher: ~88x avg speedup (matched), ~158x (unmatched) vs NFA Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In AGR, "string" and "wildcard" are synonyms for untyped wildcards. The NFA interpreter and completion code already checked for both, but the DFA compiler and slot-based matcher only excluded "string". This caused $(track:wildcard) to be incorrectly marked as a checked entity. Fixed in: dfaCompiler.ts (compile-time isChecked), dfaMatcher.ts (runtime entryIsChecked + entity conversion guard), nfaCompiler.ts (deprecated compileWildcardPart path). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tor tests - Add O(1) first-token rejection to DFA/AST matchers (WeakMap-cached index built from DFA start state transitions). Unmatched requests now reject as fast as NFA+index (~0.02μs). - Fix weather grammar: remove double-quoted phrases (AGR parser treats quotes as literal chars), switch $(location:string) to wildcard. - Update grammarGenerator tests to match current output: wildcard not string, Cardinal not number, bare variable names in value expressions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r nested RulesPart The merge overwrote evaluateMatchAST with a version that assumed rule.value exists directly on the matched rule. But for grammars with alternatives like <Start> = <play> | <stop>, the DFA AST matcher inlines alternatives (producing token/wildcard nodes directly, not ruleRef nodes), so the value expression lives on the nested alternative inside the RulesPart, not on the wrapper rule. Restores findValueExpression + matchesRuleStructure to search through nested RulesPart structures and find the correct value expression by structural comparison. Also removes duplicate exports in index.ts from the merge. All 1102 tests pass (167 parity, 28 suites). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Execution contexts (DFAExecutionContext[]) were 48-77% of DFA memory but only needed during compilation, not at match time. After building the DFA: - Extract ruleIndex and activeRuleIndices directly onto DFAState - Free the contexts arrays via DFABuilder.compact() - Update all matchers to read from state.ruleIndex instead of contexts[] Results (DFA KB before -> after, % reduction): player: 80.6 -> 14.8 (82%) desktop: 677.1 -> 135.4 (80%) weather: 131.1 -> 19.9 (85%) list: 88.8 -> 28.3 (68%) calendar: 41.8 -> 19.2 (54%) browser: 46.8 -> 14.7 (69%) DFA now uses 0.28-0.80x the NFA's memory while being 30-700x faster. All 1102 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nsitions Standard DFA subset construction requires that for input symbol `a`, the target DFA state includes ALL reachable NFA states — both literal `a` transitions AND wildcard transitions. The previous code separated them, causing false negatives in dfaAccepts() when a literal token overlapped with a wildcard in a different grammar alternative. Example: browser grammar `click (on)? (the)? link $(keywords:wildcard)` vs `click (on)? $(keywords:wildcard)` — input "click on the sign up link" was rejected because state 48 (after matching "the" as literal) lost the wildcard path. Also adds 87 real agent grammar value parity tests covering player, desktop, calendar, weather, browser, and list grammars. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

steveluc and others added 6 commits March 3, 2026 12:55

Merge remote-tracking branch 'origin/main' into dfa-ast-parity-benchmark

58512ba

steveluc had a problem deploying to development-fork March 3, 2026 23:36 — with GitHub Actions Error

Merge remote-tracking branch 'origin/main' into dfa-space-optimization

30462c1

steveluc had a problem deploying to development-fork March 3, 2026 23:43 — with GitHub Actions Failure

steveluc temporarily deployed to development-fork March 3, 2026 23:43 — with GitHub Actions Inactive

steveluc had a problem deploying to development-fork March 4, 2026 00:23 — with GitHub Actions Error

DFA: assert improved completion — 'by' suggested after 'play the'

2634e26

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

steveluc temporarily deployed to development-fork March 4, 2026 00:24 — with GitHub Actions Inactive