Skip to content

perf: fix v3.0.1 build regression (14.1 → ~5.8 ms/file)#325

Merged
carlos-alm merged 1 commit intomainfrom
fix/build-perf-regression
Mar 4, 2026
Merged

perf: fix v3.0.1 build regression (14.1 → ~5.8 ms/file)#325
carlos-alm merged 1 commit intomainfrom
fix/build-perf-regression

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Eliminate redundant WASM parsing: Complexity was clearing _tree after each file, forcing CFG and dataflow to each re-create parsers and re-parse all files via WASM. Removed that nullification (builder already clears trees after all phases), added ensureWasmTrees() for a single shared pre-parse pass, and memoized createParsers() so WASM grammars only load once.
  • Extend incremental filtering to CFG/dataflow: The existing astComplexitySymbols filter (which excludes reverse-dep-only files) is now also applied to CFG and dataflow phases, skipping unchanged files that only needed edge rebuilding.
  • Report wasmPreMs in phase timing for visibility into the pre-parse cost.

Benchmarks (self-build, 172 files, native engine)

Metric v3.0.1 (before) After Change
ms/file (phases) 14.1 ~5.8 -59%
CFG phase ~450ms ~250ms -44%
Dataflow phase ~450ms ~255ms -43%
Complexity phase ~8ms ~8ms unchanged
WASM pre-parse N/A (hidden in CFG+dataflow) ~287ms (single pass) saves ~287ms

Test plan

  • All 194 relevant tests pass (build, builder, complexity, parser, cfg, dataflow)
  • Pre-existing parity failures (3) confirmed unrelated to these changes
  • Full build produces correct CFG (cfg buildGraph --format mermaid) and dataflow (dataflow buildGraph)
  • Incremental build tested with real file change — CFG/dataflow correctly scoped to changed files
  • diff-impact --staged -T shows clean 4-function impact

Three optimizations to recover build speed after CFG/dataflow default-on:

1. Eliminate redundant WASM parsing (biggest win):
   - Remove complexity.js clearing _tree after each file (builder already
     clears all trees after all phases complete)
   - Add ensureWasmTrees() in parser.js for a single WASM pre-parse pass
     before CFG/dataflow, replacing two independent parse passes
   - Memoize createParsers() so repeated calls return cached parsers

2. Filter CFG/dataflow to changed files only:
   - Build changedSymbols map excluding reverse-dep-only files
   - Pass changedSymbols (not allSymbols) to buildCFGData/buildDataflowEdges
   - Reverse-dep files only need edge rebuilding, not AST re-analysis

3. Report wasmPreMs in phase timing output for visibility

Impact: 4 functions changed, 28 affected

Impact: 4 functions changed, 10 affected
@claude
Copy link

claude bot commented Mar 4, 2026

Claude finished @carlos-alm's task —— View job


Code Review Complete

Reviewing PR: perf: fix v3.0.1 build regression (14.1 → ~5.8 ms/file)

Todo List:

  • Analyze root cause and verify problem statement
  • Review technical approach for eliminating redundant WASM parsing
  • Examine changes to incremental filtering extension
  • Check for configuration changes or test modifications
  • Assess code quality and potential issues
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: 59% performance regression (5.8ms → 14.1ms/file) introduced when CFG/dataflow analysis became default. Root causes identified:

  1. Complexity phase cleared _tree after each file, forcing CFG/dataflow to re-parse
  2. CFG and dataflow each created independent parsers and parsed all files separately
  3. Incremental builds didn't filter reverse-dependency-only files from CFG/dataflow

Why This Approach: The solution directly targets each wasteful operation:

  • Eliminates redundant parsing via shared pre-parse pass and parser memoization
  • Extends incremental filtering to expensive analysis phases
  • Adds timing visibility for WASM pre-parse cost

Risk Assessment: LOW RISK - Changes are surgical optimizations that preserve existing behavior while eliminating waste.

Technical Implementation Review

Strengths:

  1. Parser Memoization (src/parser.js:41-43, 72-73, 102): Clean caching prevents expensive WASM grammar reloading
  2. Shared Pre-parse (src/parser.js:121-159): New ensureWasmTrees() performs single parsing pass for CFG+dataflow to share
  3. Incremental Filtering Extension (src/builder.js:1275-1293, 1337, 1349): Correctly excludes reverse-dep-only files from AST re-analysis
  4. Tree Lifecycle Fix (removed src/complexity.js:1772-1774): Stops premature clearing; builder already handles cleanup
  5. Observability (src/builder.js:1322-1330, 1449): Adds wasmPreMs timing for performance monitoring

Code Quality:

  • Changes are minimal and focused (72 additions, 8 deletions across 3 files)
  • No configuration changes or test modifications detected
  • Comments updated appropriately to reflect new behavior
  • Error handling preserved in new ensureWasmTrees() function

Performance Logic:

  • Before: CFG parses all files → dataflow parses all files (2× redundant parsing)
  • After: Single pre-parse → CFG+dataflow share trees (1× parsing total)
  • Incremental builds now scope expensive analysis to only changed files

Critical Concerns

None identified. This is a well-executed performance optimization with:

  • Clear problem identification and targeted solutions
  • Minimal code surface area changes
  • No behavioral changes to output or functionality
  • Comprehensive benchmarks demonstrating 59% speedup
  • All tests passing per PR description

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality performance fix that directly addresses identified regression with minimal risk. Missing 5th star only due to lack of accompanying performance regression tests to prevent future similar issues.

The implementation is technically sound, well-documented, and delivers the promised performance improvements through targeted waste elimination.

@greptile-apps
Copy link

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR delivers a substantial 59% performance improvement (14.1 → 5.8 ms/file) by eliminating redundant WASM parsing across build phases.

Key changes:

  • Removed premature _tree clearing in complexity.js that forced CFG and dataflow to re-parse all files independently
  • Added parser memoization in createParsers() to avoid reloading WASM grammars multiple times
  • Introduced ensureWasmTrees() pre-parse pass so CFG and dataflow share a single parse operation
  • Extended incremental build filtering (previously only AST/complexity) to also skip CFG/dataflow for unchanged reverse-dep-only files
  • Added wasmPreMs timing metric for observability

The optimization is well-designed and maintains correctness while significantly reducing build times. Trees are still properly released at the end of the build process for garbage collection.

Confidence Score: 5/5

  • This PR is safe to merge with no issues found
  • Clean performance optimization with sound architectural decisions: parser memoization is safe (stateless), tree sharing eliminates redundant work without correctness impact, incremental filtering extension is logically correct, and proper GC cleanup remains intact
  • No files require special attention

Important Files Changed

Filename Overview
src/builder.js Extended incremental filtering to CFG/dataflow phases, added WASM pre-parse pass to eliminate redundant parsing, added wasmPreMs timing metric
src/complexity.js Removed premature _tree clearing that was forcing CFG/dataflow to re-parse all files
src/parser.js Added parser memoization to avoid reloading WASM grammars, added ensureWasmTrees() for single shared pre-parse pass

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph "Before (v3.0.1)"
        A1[Complexity Phase] --> B1[Parse files via WASM]
        B1 --> C1[Compute metrics]
        C1 --> D1[Clear _tree]
        D1 --> E1[CFG Phase]
        E1 --> F1[Re-parse files via WASM]
        F1 --> G1[Build CFG]
        G1 --> H1[Dataflow Phase]
        H1 --> I1[Re-parse files via WASM again]
        I1 --> J1[Build dataflow]
    end
    
    subgraph "After (v3.0.1 fix)"
        A2[ensureWasmTrees] --> B2[Parse once via memoized parsers]
        B2 --> C2[Complexity Phase]
        C2 --> D2[Use cached _tree]
        D2 --> E2[CFG Phase]
        E2 --> F2[Reuse _tree from Complexity]
        F2 --> G2[Dataflow Phase]
        G2 --> H2[Reuse _tree from Complexity]
        H2 --> I2[Clear _tree after all phases]
    end
Loading

Last reviewed commit: 5915af4

@carlos-alm carlos-alm merged commit b906f6d into main Mar 4, 2026
9 of 16 checks passed
@carlos-alm carlos-alm deleted the fix/build-perf-regression branch March 4, 2026 05:51
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant