perf: extract AST nodes for all languages in native engine#340
perf: extract AST nodes for all languages in native engine#340carlos-alm wants to merge 1 commit intomainfrom
Conversation
Add walk_ast_nodes_with_config() to helpers.rs with per-language configs for node type mappings (new, throw, await, string, regex). Each non-JS extractor now calls this during extract(), producing astNodes for Python, Go, Rust, Java, C#, Ruby, and PHP. On the JS side, buildAstNodes() now checks symbols.astNodes first (all languages) before falling back to WASM tree walk (JS/TS/TSX only). This eliminates the WASM dependency for AST extraction when using the native engine. Expected impact: astMs drops from ~651ms to ~50ms (DB inserts only) for native builds once the binary is compiled. Impact: 14 functions changed, 11 affected
|
Claude finished @carlos-alm's task —— View job PR Review: AST Nodes Native ExtractionI'm conducting a thorough review of this performance optimization PR following the structured methodology: Review Progress
Phase 1: Root Cause Analysis ✅Problem Being Solved: The current system has a "double-parse" performance bottleneck where AST nodes (new, throw, await, string, regex) are extracted twice:
This results in ~651ms spent in the Why This Approach: The solution eliminates duplication by:
Justification: This is a legitimate performance optimization addressing real inefficiency. Expected improvement: ~651ms → ~50ms (92% reduction). This is part 1 of 4 in a systematic double-parse elimination plan. Phase 2: Configuration Change Detection ✅No configuration changes detected. All modified files are implementation code:
No linting rules, build configs, test thresholds, or other risky configuration changes. Phase 3: Test Integrity ✅Test changes are strengthening, not weakening:
The test approach is proper - adding coverage for new functionality without relaxing existing assertions. Phase 4: Code Quality Review ✅Implementation Quality: High ✅ Positive Aspects:
🔍 Minor Observations:
🎯 Specific Technical Validation:
📋 Language Configuration Accuracy:
Root Cause AnalysisProblem Being Solved: Legitimate performance bottleneck in double AST parsing Critical ConcernsNone identified. This is a well-implemented performance optimization with proper testing. Final Recommendation
CI Dependencies: Success depends on |
Greptile SummaryThis PR moves AST node extraction (new, throw, await, string, regex) from JavaScript/WASM to the native Rust engine for 7 non-JS languages (Python, Go, Rust, Java, C#, Ruby, PHP), as part 1 of 4 in the double-parse elimination optimization plan. Key changes:
Expected performance impact: ~651ms → ~50ms for the Issue found:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Parse file with tree-sitter] --> B{Language type?}
B -->|Non-JS languages| C[Native Rust extractor]
B -->|JS/TS/TSX| D[Native Rust extractor]
C --> E[walk_node for symbols]
C --> F[walk_ast_nodes_with_config]
F --> G[Extract new/throw/await/string/regex]
G --> H[Populate symbols.astNodes]
D --> I[walk_node for symbols]
D --> J[walk_ast_nodes for JS]
J --> K[Populate symbols.astNodes]
H --> L[buildAstNodes in ast.js]
K --> L
L --> M{symbols.astNodes exists?}
M -->|Yes| N[Use native astNodes directly]
M -->|No, JS/TS/TSX| O[Fallback: WASM tree walk]
M -->|No, other lang| P[Skip AST nodes]
N --> Q[Insert into DB]
O --> Q
P --> Q
style F fill:#90EE90
style G fill:#90EE90
style H fill:#90EE90
style N fill:#FFD700
style O fill:#FFB6C1
Last reviewed commit: 6bc4a05 |
| await_types: &["await_expression"], | ||
| string_types: &["string_literal", "verbatim_string_literal"], | ||
| regex_types: &[], | ||
| quote_chars: &['"', '@'], |
There was a problem hiding this comment.
C# verbatim strings (@"text") may lose @ characters from the string content itself. For example, @"test@" would incorrectly become test instead of test@ because trim_end_matches removes all trailing @ and " characters. The @ is a prefix modifier, not a delimiter, so it should only be stripped from the start. Consider handling verbatim strings separately or removing @ from quote_chars.
Summary
walk_ast_nodes_with_config()to Rusthelpers.rswith per-language AST node type configs (LangAstConfigstruct)walk_ast_nodes_with_config()duringextract(), producingastNodesnativelybuildAstNodes()inast.jsnow checkssymbols.astNodesfirst for all languages, falling back to WASM tree walk only for JS/TS/TSX when no native data is presentLanguage coverage
raise_statementawaitstringinterpreted_string_literal,raw_string_literalawait_expressionstring_literal,raw_string_literalobject_creation_expressionthrow_statementstring_literalobject_creation_expressionthrow_statement,throw_expressionawait_expressionstring_literal,verbatim_string_literalstringregexobject_creation_expressionthrow_expressionstring,encapsed_stringExpected performance impact
Once the native binary is compiled with these changes:
astMsphase: ~651ms → ~50ms (DB inserts only, no WASM tree walk needed)Test plan
buildAstNodesacceptsastNodesfor non-JS files (.py, .java)build-native.ymlcompiles successfully with new Rust code