perf: compute dataflow analysis in Rust native engine#343
perf: compute dataflow analysis in Rust native engine#343carlos-alm merged 7 commits intoperf/cfg-in-rustfrom
Conversation
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
a8dcbe4 to
a01992b
Compare
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Greptile SummaryThis PR ports per-file dataflow extraction from the WASM/JS path into the Rust native engine, adding a 1450-line Remaining issues:
Confidence Score: 4/5
Sequence DiagramsequenceDiagram
participant B as builder.js
participant P as parser.js / parseFilesAuto
participant N as Native Engine (Rust)
participant D as dataflow.js / buildDataflowEdges
participant DB as SQLite
B->>P: parseFilesAuto(filePaths, rootDir, {engine, dataflow: true})
P->>N: native.parseFiles(filePaths, rootDir, include_dataflow=true)
N->>N: parse_files_parallel()
N->>N: extract_symbols()
N->>N: extract_dataflow() [if include_dataflow]
N-->>P: FileSymbols[] with .dataflow populated
P->>P: normalizeNativeSymbols() — maps DataflowResult to camelCase
P-->>B: Map(relPath → symbols with .dataflow)
B->>D: buildDataflowEdges(db, astComplexitySymbols, rootDir, engineOpts)
loop for each file
D->>D: data = symbols.dataflow (native fast-path)
alt dataflow present (native)
D->>D: use data directly, skip WASM
else no dataflow (WASM path)
D->>D: extract via WASM tree / re-parse
end
D->>DB: INSERT INTO dataflow (flows_to / returns / mutates edges)
end
|
| call_node: Some("call_expression"), | ||
| call_nodes: &[], |
There was a problem hiding this comment.
Rust method calls silently excluded from dataflow
RUST_DATAFLOW only lists call_node: Some("call_expression"), but in the tree-sitter Rust grammar, method calls like obj.method(arg) produce a method_call_expression node — not a call_expression. As a result, is_call_node returns false for every Rust method call, so:
handle_call_expris never triggered for method calls → noarg_flowsrecordedhandle_var_declarator/handle_assignmentalso fail theis_call_nodecheck →let x = iter.collect(),result = conn.query(sql), etc. generate no assignments
This is a silent correctness gap: Rust dataflow coverage will be limited to free-function calls only. call_nodes should include both variants:
| call_node: Some("call_expression"), | |
| call_nodes: &[], | |
| call_node: None, | |
| call_nodes: &["call_expression", "method_call_expression"], |
Context Used: Rule from dashboard - CLAUDE.md (source)
There was a problem hiding this comment.
Good catch — this is a real correctness gap. Rust method calls (obj.method(arg)) produce method_call_expression in tree-sitter, not call_expression. Will fix by switching to call_nodes: &["call_expression", "method_call_expression"] as suggested.
There was a problem hiding this comment.
Fixed in 81dfd89 — changed to call_node: None, call_nodes: &["call_expression", "method_call_expression"].
| var_declarator_nodes: &["short_var_declaration", "var_declaration"], | ||
| var_name_field: "left", | ||
| var_value_field: Some("right"), |
There was a problem hiding this comment.
Go var_declaration field names don't match tree-sitter schema
var_declarator_nodes includes "var_declaration", but var_name_field: "left" and var_value_field: Some("right") only exist on short_var_declaration (:= syntax). In the tree-sitter Go grammar, var_declaration contains var_spec children with name, type, and value fields — it has no left/right fields at the top level.
In handle_var_declarator, both child_by_field_name("left") and child_by_field_name("right") return None for a var_declaration node, so the function hits the name_n = None → return early exit without recording anything. Declarations like:
var result *Result = db.Query(sql)are silently skipped. Only := short declarations are actually tracked. Since var_declaration is listed in var_declarator_nodes but produces no output, it may be worth either removing it from the list (to document the known limitation), or adding dedicated handling for the var_spec child pattern.
Context Used: Rule from dashboard - CLAUDE.md (source)
There was a problem hiding this comment.
Valid — var_declaration has var_spec children with name/type/value fields, not left/right. The current config silently exits early for var x T = foo() style declarations. Will either remove var_declaration from the list (documenting the limitation to := only) or add dedicated var_spec child handling. The latter is more correct but requires walking one level deeper.
There was a problem hiding this comment.
Fixed in 81dfd89 — removed "var_declaration" from var_declarator_nodes, leaving only "short_var_declaration". Added a comment documenting the limitation.
| parameters: (result.dataflow.parameters || []).map((p) => ({ | ||
| funcName: p.funcName ?? p.func_name, | ||
| paramName: p.paramName ?? p.param_name, | ||
| paramIndex: p.paramIndex ?? p.param_index, | ||
| line: p.line, | ||
| })), | ||
| returns: (result.dataflow.returns || []).map((r) => ({ | ||
| funcName: r.funcName ?? r.func_name, | ||
| expression: r.expression ?? '', | ||
| referencedNames: r.referencedNames ?? r.referenced_names ?? [], | ||
| line: r.line, | ||
| })), | ||
| assignments: (result.dataflow.assignments || []).map((a) => ({ | ||
| varName: a.varName ?? a.var_name, | ||
| callerFunc: a.callerFunc ?? a.caller_func ?? null, | ||
| sourceCallName: a.sourceCallName ?? a.source_call_name, | ||
| expression: a.expression ?? '', | ||
| line: a.line, | ||
| })), | ||
| argFlows: (result.dataflow.argFlows ?? result.dataflow.arg_flows ?? []).map((f) => ({ | ||
| callerFunc: f.callerFunc ?? f.caller_func ?? null, | ||
| calleeName: f.calleeName ?? f.callee_name, | ||
| argIndex: f.argIndex ?? f.arg_index, | ||
| argName: f.argName ?? f.arg_name ?? null, | ||
| binding: | ||
| (f.bindingType ?? f.binding_type) ? { type: f.bindingType ?? f.binding_type } : null, | ||
| confidence: f.confidence, | ||
| expression: f.expression ?? '', | ||
| line: f.line, | ||
| })), | ||
| mutations: (result.dataflow.mutations || []).map((m) => ({ | ||
| funcName: m.funcName ?? m.func_name ?? null, | ||
| receiverName: m.receiverName ?? m.receiver_name, | ||
| binding: | ||
| (m.bindingType ?? m.binding_type) ? { type: m.bindingType ?? m.binding_type } : null, | ||
| mutatingExpr: m.mutatingExpr ?? m.mutating_expr, | ||
| line: m.line, | ||
| })), |
There was a problem hiding this comment.
Redundant snake_case fallbacks in normalizeNativeSymbols
The napi-rs #[napi(js_name = "...")] attributes on all DataflowParam, DataflowReturn, DataflowAssignment, DataflowArgFlow, and DataflowMutation structs guarantee that the NAPI binary always exposes camelCase property names (e.g., funcName, paramName, argIndex). The snake_case fallbacks (p.func_name, p.param_name, f.arg_index, etc.) can therefore never be reached at runtime.
These fallbacks are harmless but add noise and could mislead future readers into thinking the binary sometimes returns snake_case. Consider removing them:
| parameters: (result.dataflow.parameters || []).map((p) => ({ | |
| funcName: p.funcName ?? p.func_name, | |
| paramName: p.paramName ?? p.param_name, | |
| paramIndex: p.paramIndex ?? p.param_index, | |
| line: p.line, | |
| })), | |
| returns: (result.dataflow.returns || []).map((r) => ({ | |
| funcName: r.funcName ?? r.func_name, | |
| expression: r.expression ?? '', | |
| referencedNames: r.referencedNames ?? r.referenced_names ?? [], | |
| line: r.line, | |
| })), | |
| assignments: (result.dataflow.assignments || []).map((a) => ({ | |
| varName: a.varName ?? a.var_name, | |
| callerFunc: a.callerFunc ?? a.caller_func ?? null, | |
| sourceCallName: a.sourceCallName ?? a.source_call_name, | |
| expression: a.expression ?? '', | |
| line: a.line, | |
| })), | |
| argFlows: (result.dataflow.argFlows ?? result.dataflow.arg_flows ?? []).map((f) => ({ | |
| callerFunc: f.callerFunc ?? f.caller_func ?? null, | |
| calleeName: f.calleeName ?? f.callee_name, | |
| argIndex: f.argIndex ?? f.arg_index, | |
| argName: f.argName ?? f.arg_name ?? null, | |
| binding: | |
| (f.bindingType ?? f.binding_type) ? { type: f.bindingType ?? f.binding_type } : null, | |
| confidence: f.confidence, | |
| expression: f.expression ?? '', | |
| line: f.line, | |
| })), | |
| mutations: (result.dataflow.mutations || []).map((m) => ({ | |
| funcName: m.funcName ?? m.func_name ?? null, | |
| receiverName: m.receiverName ?? m.receiver_name, | |
| binding: | |
| (m.bindingType ?? m.binding_type) ? { type: m.bindingType ?? m.binding_type } : null, | |
| mutatingExpr: m.mutatingExpr ?? m.mutating_expr, | |
| line: m.line, | |
| })), | |
| // Before (redundant): | |
| funcName: p.funcName ?? p.func_name, | |
| // After: | |
| funcName: p.funcName, |
Context Used: Rule from dashboard - CLAUDE.md (source)
There was a problem hiding this comment.
Agreed — the snake_case fallbacks are unreachable dead code since napi-rs #[napi(js_name)] guarantees camelCase. They were added defensively during development but serve no runtime purpose. Will remove them to reduce noise and avoid misleading future readers.
There was a problem hiding this comment.
Fixed in 81dfd89 — removed all snake_case fallbacks from the dataflow section of normalizeNativeSymbols.
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
Claude finished @carlos-alm's task —— View job Code Review CompleteReviewing PR #343: perf: compute dataflow analysis in Rust native engine Review Checklist:
Root Cause AnalysisProblem Being Solved: This PR ports dataflow analysis from the WASM/JS path to the Rust native engine as part 3 of 4 in a performance optimization series. The goal is to eliminate tree-sitter WASM dependency for dataflow analysis by moving computation to native Rust. Why This Approach: The approach is well-justified. The PR follows an established pattern set by previous PRs (AST extraction, CFG construction) and maintains WASM fallback compatibility. The native engine approach aligns with the project's dual-engine architecture. Risk Assessment: Medium risk due to the complexity of the 1,389-line dataflow.rs implementation and the silent correctness issues identified by Greptile. Critical Concerns1. Rust Method Call Coverage Gap (HIGH PRIORITY)The
Fix: Change lines 257-258 to use 2. Go Variable Declaration Mismatch (MEDIUM PRIORITY)At lines 208-210, 3. Code Quality: Redundant Fallbacks (LOW PRIORITY)The Test Integrity Assessment
Code Quality AssessmentThe implementation demonstrates solid architectural understanding:
Configuration Change DetectionNo high-risk configuration changes detected:
Final Recommendation
Required Changes:
Positive Notes:
|
|
Claude finished @carlos-alm's task —— View job Code Review CompleteReviewing PR #343: perf: compute dataflow analysis in Rust native engine Review Checklist:
Root Cause AnalysisProblem Being Solved: This PR ports per-file dataflow analysis from the WASM/JS path to the Rust native engine as part 3 of 4 in a performance optimization series. The goal is to eliminate tree-sitter WASM dependency for dataflow analysis by moving computation to native Rust, which should provide significant performance improvements. Why This Approach: The approach is well-justified and follows an established pattern:
Risk Assessment: Medium risk due to complexity of the 1,389-line dataflow.rs implementation, but the approach is architecturally sound and all identified issues have been addressed. Configuration Change DetectionNo high-risk configuration changes detected:
Test Integrity AssessmentTest integrity is maintained:
Critical Issues ResolutionAll 3 identified issues have been correctly fixed in commit 81dfd89:
Code Quality AssessmentThe implementation demonstrates solid architectural understanding:
Final Recommendation
Positive Notes:
This PR is ready for merge after the comprehensive fixes applied in commit 81dfd89. |
| symbols.dataflow = extract_dataflow(&tree, &source, lang.lang_id_str()); | ||
| symbols.line_count = Some(line_count); |
There was a problem hiding this comment.
extract_dataflow runs unconditionally on every file
extract_dataflow is called in both parse_files_parallel and parse_file regardless of whether the caller actually needs dataflow data. On the WASM path, dataflow extraction only happens inside buildDataflowEdges, which is only called when --dataflow is enabled. The native engine now pays the full dataflow extraction cost for every build — including builds where --dataflow is not set.
For projects with hundreds of source files, this is non-trivial CPU overhead on every cold build. Consider accepting a compute_dataflow: bool flag (or a feature-flag on FileSymbols) so callers can opt out:
// In parse_files_parallel / parse_file, conditionally compute:
if opts.compute_dataflow {
symbols.dataflow = extract_dataflow(&tree, &source, lang.lang_id_str());
}PR 4 in the series may address this, but it is worth tracking explicitly.
There was a problem hiding this comment.
Fixed in fa11961 — added include_dataflow: bool parameter to parse_files_parallel() and parse_file(). The JS side passes !!opts.dataflow from builder.js engine options. Dataflow extraction is now skipped entirely when not requested.
| } | ||
|
|
||
| #[allow(clippy::too_many_arguments)] | ||
| fn visit( | ||
| node: &Node, | ||
| rules: &DataflowRules, | ||
| source: &[u8], | ||
| scope_stack: &mut Vec<ScopeFrame>, | ||
| parameters: &mut Vec<DataflowParam>, | ||
| returns: &mut Vec<DataflowReturn>, | ||
| assignments: &mut Vec<DataflowAssignment>, | ||
| arg_flows: &mut Vec<DataflowArgFlow>, | ||
| mutations: &mut Vec<DataflowMutation>, | ||
| ) { | ||
| let t = node.kind(); | ||
|
|
||
| // Enter function scope | ||
| if is_function_node(rules, t) { | ||
| enter_scope(node, rules, source, scope_stack, parameters); | ||
| let cursor = &mut node.walk(); | ||
| for child in node.named_children(cursor) { | ||
| visit(&child, rules, source, scope_stack, parameters, returns, assignments, arg_flows, mutations); | ||
| } |
There was a problem hiding this comment.
Unbounded recursion may overflow the stack on deeply nested ASTs
visit passes 9 mutable references through each stack frame and recurses into every named child. For deeply nested constructs — minified JS/TS, macro-expanded Rust, or deeply nested closures — the call depth can reach thousands of levels. Rust's default thread stack is 8 MB, and each visit frame is substantial (9 pointer-sized arguments + return address + locals).
The existing cfg.rs presumably uses the same pattern, but dataflow visits the entire AST (not just CFG blocks), so the recursion depth ceiling is higher. Consider adding an explicit depth guard or converting the recursion to an explicit stack for production resilience:
fn visit(node: &Node, ..., depth: usize) {
if depth > 1_000 { return; } // guard against pathological inputs
// ...
for child in node.named_children(cursor) {
visit(&child, ..., depth + 1);
}
}There was a problem hiding this comment.
Fixed in fa11961 — added const MAX_VISIT_DEPTH: usize = 200 and a depth parameter to visit(). Returns early when depth exceeded. All 7 recursive call sites pass depth + 1.
| if let Some(fn_node) = expr.child_by_field_name(rules.call_function_field) { | ||
| if rules.member_node.is_some_and(|m| m == fn_node.kind()) { | ||
| if let Some(prop) = fn_node.child_by_field_name(rules.member_property_field) { | ||
| method_name = Some(node_text(&prop, source).to_string()); | ||
| } | ||
| receiver = member_receiver(&fn_node, rules, source); | ||
| } | ||
| } |
There was a problem hiding this comment.
Rust mutating_methods list is dead code — method call mutations never recorded
handle_expr_stmt_mutation attempts to extract the method name using expr.child_by_field_name(rules.call_function_field). For Rust, call_function_field = "function", but a method_call_expression node has no "function" field (only receiver, name, and arguments). This means fn_node is always None for every Rust method call, so method_name is never populated and the function exits early via let method = match method_name { Some(m) => m, None => return }.
The call_object_field fallback also doesn't apply since RUST_DATAFLOW.call_object_field = None. As a result, no mutation records are ever generated for Rust despite the mutating_methods list defining ["push", "pop", "insert", "remove", "clear", "sort", "reverse"].
This same failure mode existed in the WASM Rust path (which only recognized call_expression and thus never even entered the mutation handler for method calls), so it is not a behavioral regression. But now that method_call_expression is correctly in call_nodes, the mutation handler is reached but silently fails to extract the method name.
The fix is to add a method_call_expression-aware extraction path, reusing the same "name" field fallback that resolve_callee_name already uses:
// Standard pattern: call(fn: member(obj, prop))
if let Some(fn_node) = expr.child_by_field_name(rules.call_function_field) {
if rules.member_node.is_some_and(|m| m == fn_node.kind()) {
if let Some(prop) = fn_node.child_by_field_name(rules.member_property_field) {
method_name = Some(node_text(&prop, source).to_string());
}
receiver = member_receiver(&fn_node, rules, source);
}
}
// Rust/Ruby pattern: method_call_expression has receiver + name fields directly on the call node
if method_name.is_none() {
if let Some(name_n) = expr.child_by_field_name("name").or_else(|| expr.child_by_field_name("method")) {
method_name = Some(node_text(&name_n, source).to_string());
// Extract receiver from the call node's receiver field
if let Some(recv_n) = expr.child_by_field_name("receiver") {
if is_ident(rules, recv_n.kind()) {
receiver = Some(node_text(&recv_n, source).to_string());
}
}
}
}There was a problem hiding this comment.
Fixed in fa11961 — added \ to . Set to \ for Rust where \ uses a \ field instead of . \ now falls back to this field when the standard \ lookup fails.
| fn truncate(s: &str, max: usize) -> String { | ||
| if s.len() <= max { | ||
| s.to_string() | ||
| } else { | ||
| let mut result = String::with_capacity(max + 3); | ||
| // Take at most `max` bytes, but don't split a char | ||
| for ch in s.chars() { | ||
| if result.len() + ch.len_utf8() > max { | ||
| break; | ||
| } | ||
| result.push(ch); | ||
| } | ||
| result.push('…'); | ||
| result | ||
| } | ||
| } |
There was a problem hiding this comment.
truncate byte vs character count parity difference with WASM
The initial overflow check s.len() <= max compares byte length against max (120), while the WASM counterpart uses str.length which counts UTF-16 code units (approximately Unicode code points for BMP characters). For ASCII-only source code (the vast majority) they are equivalent, but for source files containing non-ASCII identifiers, comments-as-expressions, or string literals in non-Latin scripts, the truncation point will differ between native and WASM outputs.
Parity of expression values is important because buildDataflowEdges stores them verbatim in the SQLite dataflow table. Consider aligning the semantics by comparing character count:
fn truncate(s: &str, max: usize) -> String {
let char_count = s.chars().count();
if char_count <= max {
s.to_string()
} else {
let truncated: String = s.chars().take(max).collect();
format!("{}…", truncated)
}
}There was a problem hiding this comment.
Fixed in fa11961 — changed s.len() to s.chars().count() for the length comparison, and used s.char_indices().nth(max) to find the correct byte offset for truncation. Character-level counting now matches JS str.length behavior.
|
Claude finished @carlos-alm's task —— View job Code Review CompleteReviewing PR #343: perf: compute dataflow analysis in Rust native engine Review Checklist:
Root Cause AnalysisProblem Being Solved: This PR ports per-file dataflow analysis from the WASM/JS path to the Rust native engine as part 3 of 4 in a performance optimization series. The goal is to eliminate tree-sitter WASM dependency for dataflow analysis by moving computation to native Rust. Why This Approach: The approach is well-justified. The PR follows an established pattern set by previous PRs (AST extraction, CFG construction) and maintains WASM fallback compatibility. The native engine approach aligns with the project's dual-engine architecture. Risk Assessment: Medium risk due to the complexity of the 1,389-line dataflow.rs implementation and the silent correctness issues identified by Greptile. Critical ConcernsAll Previously Identified Issues Have Been Fixed ✅
Configuration Change DetectionNo high-risk configuration changes detected:
Test Integrity Assessment
Code Quality AssessmentThe implementation demonstrates solid architectural understanding:
Final Recommendation
Positive Notes:
This PR is ready for merge after the comprehensive fixes applied in commit 81dfd89. |
|
Addressed all 4 new review comments in fa11961:
|
| if let Some(name_field) = rules.method_call_name_field { | ||
| if let Some(name_n) = expr.child_by_field_name(name_field) { | ||
| method_name = Some(node_text(&name_n, source).to_string()); | ||
| // Extract receiver: try member_object_field on the call expr itself | ||
| if let Some(recv_node) = expr.child_by_field_name(rules.member_object_field) { | ||
| if is_ident(rules, recv_node.kind()) { | ||
| receiver = Some(node_text(&recv_node, source).to_string()); | ||
| } else if rules.member_node.is_some_and(|m| m == recv_node.kind()) { | ||
| receiver = member_receiver(&recv_node, rules, source); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Rust mutation receiver detection still uses wrong field name
The method_call_name_field fix correctly extracts the method name via "name" (line 1373), but the receiver lookup on line 1376 still uses rules.member_object_field which is "value" for Rust. In the tree-sitter-rust grammar, method_call_expression exposes its receiver as "receiver", not "value" — that field belongs to field_expression:
method_call_expression: receiver ..., name ..., arguments ...
field_expression: value ..., field ...
So expr.child_by_field_name("value") always returns None for any Rust method_call_expression, recv_node is never set, receiver stays None, and the early-exit at line 1411 fires for every Rust method call. The RUST_DATAFLOW.mutating_methods list remains dead code despite the intended fix.
A minimal fix is to introduce a separate method_call_receiver_field: Option<&'static str> rule and set it to Some("receiver") for Rust, then use that in the receiver lookup instead of rules.member_object_field.
There was a problem hiding this comment.
Fixed in 10872c4 — added method_call_receiver_field: Option<&str> to DataflowRules. Set to Some("receiver") for Rust. handle_expr_stmt_mutation now uses this field when extracting the receiver after matching via method_call_name_field, with fallback to member_object_field.
|
Claude finished @carlos-alm's task —— View job Code Review CompleteReviewing PR #343: perf: compute dataflow analysis in Rust native engine Review Checklist:
Root Cause AnalysisProblem Being Solved: This PR ports per-file dataflow analysis from the WASM/JS path to the Rust native engine as part 3 of 4 in a performance optimization series. The goal is to eliminate tree-sitter WASM dependency for dataflow analysis by moving computation to native Rust. Why This Approach: The approach is well-justified. The PR follows an established pattern set by previous PRs (AST extraction, CFG construction) and maintains WASM fallback compatibility. The native engine approach aligns with the project's dual-engine architecture. Risk Assessment: Medium risk due to the complexity of the 1,389-line dataflow.rs implementation, but all previously identified correctness issues have been systematically addressed. Configuration Change DetectionNo high-risk configuration changes detected:
Test Integrity Assessment
Critical Issues ResolutionAll 4 identified issues have been correctly fixed in commit fa11961:
✅ Previous 3 issues also remain fixed:
Code Quality AssessmentThe implementation demonstrates solid architectural understanding:
Outstanding IssuesNo critical issues remaining. The latest Greptile comment about receiver field names has been properly addressed with the Final Recommendation
Positive Notes:
This PR is ready for merge after the comprehensive fixes applied in commit fa11961. |
|
Addressed the new receiver field comment in 10872c4 — added method_call_receiver_field to DataflowRules, set to Some("receiver") for Rust. |
| fn collect_identifiers(node: &Node, out: &mut Vec<String>, rules: &DataflowRules, source: &[u8]) { | ||
| if is_ident(rules, node.kind()) { | ||
| out.push(node_text(node, source).to_string()); | ||
| return; | ||
| } | ||
| let cursor = &mut node.walk(); | ||
| for child in node.named_children(cursor) { | ||
| collect_identifiers(&child, out, rules, source); | ||
| } | ||
| } |
There was a problem hiding this comment.
collect_identifiers has no recursion depth guard
visit received a MAX_VISIT_DEPTH = 200 guard to prevent stack overflows, but collect_identifiers — called recursively on every named child — has none. It is invoked from inside visit on return-expression sub-trees (line ~988), so it can recurse arbitrarily deeper than the 200-level cap that protects visit. For pathological inputs (macro-expanded Rust, minified JS with deeply nested ternaries, or auto-generated code), the call stack could grow well beyond MAX_VISIT_DEPTH before a match is found.
Consider adding a depth parameter matching the same pattern used in visit:
fn collect_identifiers(
node: &Node,
out: &mut Vec<String>,
rules: &DataflowRules,
source: &[u8],
depth: usize,
) {
if depth > MAX_VISIT_DEPTH {
return;
}
if is_ident(rules, node.kind()) {
out.push(node_text(node, source).to_string());
return;
}
let cursor = &mut node.walk();
for child in node.named_children(cursor) {
collect_identifiers(&child, out, rules, source, depth + 1);
}
}Update the two call sites in visit to pass depth + 1.
There was a problem hiding this comment.
Fixed in 84b70cb — added depth: usize parameter to collect_identifiers with MAX_VISIT_DEPTH guard. Both call sites (self-recursive and from visit) pass depth + 1.
|
Claude finished @carlos-alm's task —— View job Code Review CompleteReviewing PR #343: perf: compute dataflow analysis in Rust native engine Review Checklist:
Root Cause AnalysisProblem Being Solved: This PR ports per-file dataflow analysis from the WASM/JS path to the Rust native engine as part 3 of 4 in a performance optimization series. The goal is to eliminate tree-sitter WASM dependency for dataflow analysis by moving computation to native Rust. Why This Approach: The approach is well-justified. The PR follows an established pattern set by previous PRs (AST extraction, CFG construction) and maintains WASM fallback compatibility. The native engine approach aligns with the project's dual-engine architecture. Risk Assessment: Medium risk due to the complexity of the 1,447-line dataflow.rs implementation, but all previously identified correctness issues have been systematically addressed through multiple commits. Configuration Change DetectionNo high-risk configuration changes detected:
Test Integrity Assessment
Critical Issues ResolutionAll 8+ identified issues have been correctly fixed across 5 commits: Fixed in commit 81dfd89:
Fixed in commit fa11961:
Fixed in commit 10872c4:
Fixed in commit 84b70cb:
Code Quality AssessmentThe implementation demonstrates solid architectural understanding:
Outstanding IssuesNone. All critical issues have been resolved. The latest Greptile concern about Final Recommendation
Positive Notes:
This PR is ready for merge after the comprehensive fixes applied across commits 81dfd89, fa11961, 10872c4, and 84b70cb. |
|
Addressed the collect_identifiers depth guard in 84b70cb — added depth parameter with MAX_VISIT_DEPTH check. |
|
|
||
| // Engine selection: 'native', 'wasm', or 'auto' (default) | ||
| const engineOpts = { engine: opts.engine || 'auto' }; | ||
| const engineOpts = { engine: opts.engine || 'auto', dataflow: opts.dataflow !== false }; |
There was a problem hiding this comment.
Dataflow computed unconditionally in CFG-only pending analysis pass
engineOpts.dataflow is set to opts.dataflow !== false (defaults to true) for all parseFilesAuto calls. In the incremental "pending analysis" path at line 551, parseFilesAuto is called inside if (needsCfg || needsDataflow). When only needsCfg is true (e.g. a build that only needs to refresh CFG data), the native engine will traverse every file's full AST and populate symbols.dataflow — work that is immediately discarded because if (needsDataflow) at line 556 is false.
Before this PR, engineOpts had no dataflow key, so !!opts.dataflow resolved to false and the native engine skipped dataflow extraction entirely on this path. Now it is always computed by default, adding non-trivial overhead to incremental CFG-only rebuilds on large codebases.
A targeted fix for the pending-analysis call site:
const analysisOpts = {
...engineOpts,
dataflow: needsDataflow && opts.dataflow !== false,
};
const analysisSymbols = await parseFilesAuto(files, rootDir, analysisOpts);The main build path (line 711 → allSymbols) is fine as-is because buildDataflowEdges is guarded by if (opts.dataflow !== false) and uses the same symbol map.
There was a problem hiding this comment.
Fixed in e52c533 — the pending-analysis call site now overrides engineOpts.dataflow to only compute dataflow when needsDataflow is true. CFG-only incremental rebuilds no longer pay the dataflow extraction cost.
|
Claude finished @carlos-alm's task —— View job Code Review CompleteReviewing PR #343: perf: compute dataflow analysis in Rust native engine Review Checklist:
Root Cause AnalysisProblem Being Solved: This PR ports per-file dataflow analysis from the WASM/JS path to the Rust native engine as part 3 of 4 in a performance optimization series. The goal is to eliminate tree-sitter WASM dependency for dataflow analysis by moving computation to native Rust. Why This Approach: The approach is well-justified. The PR follows an established pattern set by previous PRs (AST extraction, CFG construction) and maintains WASM fallback compatibility. The native engine approach aligns with the project's dual-engine architecture. Risk Assessment: Medium risk due to the complexity of the 1,389-line dataflow.rs implementation and the silent correctness issues identified by Greptile. Critical ConcernsAll Previously Identified Issues Have Been Fixed ✅
Configuration Change DetectionNo high-risk configuration changes detected:
Test Integrity Assessment
Code Quality AssessmentThe implementation demonstrates solid architectural understanding:
Final Recommendation
Positive Notes:
This PR is ready for merge after the comprehensive fixes applied across commits 81dfd89, fa11961, 10872c4, 84b70cb, and e52c533. |
|
|
||
| /// Collect all identifier names referenced within a node. | ||
| fn collect_identifiers(node: &Node, out: &mut Vec<String>, rules: &DataflowRules, source: &[u8], depth: usize) { | ||
| if depth > MAX_VISIT_DEPTH { |
There was a problem hiding this comment.
collect_identifiers depth guard uses > while visit uses >=, allowing one extra recursion level beyond the stated ceiling.
The visit function guards with if depth >= MAX_VISIT_DEPTH (line 967), while collect_identifiers uses if depth > MAX_VISIT_DEPTH (line 855). This allows collect_identifiers to execute at depth 200 and recurse to 201, inconsistent with the main guard. Align both to >= for consistency:
| if depth > MAX_VISIT_DEPTH { | |
| fn collect_identifiers(node: &Node, out: &mut Vec<String>, rules: &DataflowRules, source: &[u8], depth: usize) { | |
| if depth >= MAX_VISIT_DEPTH { |
There was a problem hiding this comment.
Fixed — changed collect_identifiers depth guard from > to >= to match visit(). Both now consistently use if depth >= MAX_VISIT_DEPTH { return; }.
73d4160 to
e020420
Compare
e52c533 to
ce33010
Compare
Port per-file dataflow extraction from WASM (JS) to the Rust native engine, eliminating the tree-sitter WASM dependency for dataflow. Rust changes: - Add dataflow.rs with DataflowRules struct, 8 per-language static configs (JS/TS, Python, Go, Rust, Java, C#, PHP, Ruby), ParamStrategy enum for per-language param extraction, scope tracking with binding confidence, and extract_dataflow() recursive visitor - Add 6 NAPI structs to types.rs (DataflowParam, DataflowReturn, DataflowAssignment, DataflowArgFlow, DataflowMutation, DataflowResult) and dataflow field on FileSymbols - Call extract_dataflow() after extract_symbols() in parallel.rs - Add lang_id_str() to LanguageKind for rules lookup JS changes: - Extend normalizeNativeSymbols() to map native dataflow result - Add native bypass in buildDataflowEdges(): use symbols.dataflow when present, fall back to WASM extraction otherwise Impact: 30 functions changed, 32 affected
- Add method_call_expression to RUST_DATAFLOW call_nodes so Rust obj.method(arg) calls are no longer silently excluded from dataflow - Remove var_declaration from GO_DATAFLOW var_declarator_nodes since its child fields don't match the left/right schema used by short_var_declaration - Remove unreachable snake_case fallbacks in normalizeNativeSymbols since napi-rs #[napi(js_name)] guarantees camelCase property names Impact: 1 functions changed, 3 affected
…n, depth limit, method mutations, truncation parity - Add include_dataflow flag to parse_files_parallel/parse_file to skip dataflow extraction when not needed (threaded from builder.js opts) - Add MAX_VISIT_DEPTH (200) to prevent stack overflow on deeply nested ASTs - Add method_call_name_field to DataflowRules for languages where method calls use a different field than call_function_field (fixes dead mutating_methods for Rust's method_call_expression) - Fix truncate() to use chars().count() instead of byte length for parity with JS str.length on non-ASCII content Impact: 11 functions changed, 31 affected
Impact: 1 functions changed, 4 affected
Impact: 2 functions changed, 2 affected
When only needsCfg is true in the pending analysis path, the native engine was still computing dataflow for every file via engineOpts defaulting dataflow to true. The results were immediately discarded since the needsDataflow guard was false. Override engineOpts.dataflow in the analysis call site so it only runs when actually needed. Impact: 1 functions changed, 0 affected
Impact: 1 functions changed, 4 affected
0c0db4b to
654a60b
Compare
|
Claude finished @carlos-alm's task —— View job Code Review CompleteReviewing PR #343: perf: compute dataflow analysis in Rust native engine Review Checklist:
Root Cause AnalysisProblem Being Solved: This PR ports per-file dataflow analysis from the WASM/JS path to the Rust native engine as part 3 of 4 in a performance optimization series. The goal is to eliminate tree-sitter WASM dependency for dataflow analysis by moving computation to native Rust. Why This Approach: The approach is well-justified. The PR follows an established pattern set by previous PRs (AST extraction, CFG construction) and maintains WASM fallback compatibility. The native engine approach aligns with the project's dual-engine architecture. Risk Assessment: Medium risk due to the complexity of the 1,389-line dataflow.rs implementation, but all previously identified correctness issues have been systematically addressed through multiple commits. Configuration Change DetectionNo high-risk configuration changes detected:
Test Integrity Assessment
Critical Issues ResolutionAll 8+ identified issues have been correctly fixed across 5 commits: Fixed in commit 81dfd89:
Fixed in commit fa11961:
Fixed in commit 10872c4:
Fixed in commit 84b70cb:
Code Quality AssessmentThe implementation demonstrates solid architectural understanding:
Outstanding IssuesNone. All critical issues have been resolved. The latest Greptile concern about Final Recommendation
Positive Notes:
This PR is ready for merge after the comprehensive fixes applied across commits 81dfd89, fa11961, 10872c4, and 84b70cb. |
|
Claude finished @carlos-alm's task —— View job Code Review CompleteReviewing PR #343: perf: compute dataflow analysis in Rust native engine Review Checklist:
Root Cause AnalysisProblem Being Solved: This PR ports per-file dataflow analysis from the WASM/JS path to the Rust native engine as part 3 of 4 in a performance optimization series. The goal is to eliminate tree-sitter WASM dependency for dataflow analysis by moving computation to native Rust. Why This Approach: The approach is well-justified. The PR follows an established pattern set by previous PRs (AST extraction, CFG construction) and maintains WASM fallback compatibility. The native engine approach aligns with the project's dual-engine architecture. Risk Assessment: Medium risk due to the complexity of the 1,447-line dataflow.rs implementation, but all previously identified correctness issues have been systematically addressed through multiple commits. Configuration Change DetectionNo high-risk configuration changes detected:
Test Integrity Assessment
Critical Issues ResolutionAll 8+ identified issues have been correctly fixed across 5 commits: Fixed in commit 0b64e25:
Fixed in commit 46eaa11:
Fixed in commit 6b0a9e7:
Fixed in commit 5c67b54:
Fixed in commit 01d4b5f:
Fixed in commit 654a60b:
Code Quality AssessmentThe implementation demonstrates solid architectural understanding:
Outstanding IssuesNone. All critical issues have been resolved. The multiple rounds of review feedback have been comprehensively addressed with targeted fixes across 5 commits. Final Recommendation
Positive Notes:
This PR is ready for merge after the comprehensive fixes applied across commits 0b64e25, 46eaa11, 6b0a9e7, 5c67b54, 01d4b5f, and 654a60b. |
Summary
dataflow.rswithDataflowRulesstruct, 8 per-language static configs,ParamStrategyenum, scope tracking, andextract_dataflow()recursive visitortypes.rsanddataflow: Option<DataflowResult>onFileSymbolsnormalizeNativeSymbols()maps native dataflow,buildDataflowEdges()bypasses WASM when native data is presentContext
PR 3 of 4 in the native engine performance series:
perf/ast-nodes-all-langs): AST node extraction — mergedperf/cfg-in-rust): CFG construction — base of this PRTest plan
cargo buildcompiles (no Rust toolchain on dev machine — follows identical patterns to existingcfg.rs)node src/cli.js build . --engine nativeproduces dataflow edgesnode src/cli.js build . --engine wasmstill works (WASM fallback preserved)