Skip to content

perf: use single transaction for AST node insertion (3.6s → 350ms)#333

Merged
carlos-alm merged 2 commits intomainfrom
perf/ast-single-transaction
Mar 4, 2026
Merged

perf: use single transaction for AST node insertion (3.6s → 350ms)#333
carlos-alm merged 2 commits intomainfrom
perf/ast-single-transaction

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • buildAstNodes was calling tx(rows) per-file inside the loop, creating 172 separate SQLite transactions each with a BEGIN/COMMIT/fsync cycle
  • All other build phases (CFG, dataflow, complexity) correctly used a single wrapping transaction — only AST had this bug
  • Fix: collect all rows across files first, then insert in one tx(allRows) call

Benchmark results

Metric Before After
Native astMs ~3600ms 350ms
WASM astMs ~1800ms 547ms
Native per-file 24.9 ms/file 8.5 ms/file
WASM per-file ~14 ms/file 11.8 ms/file

Test plan

  • All 1431 tests pass (38 AST-specific tests)
  • node scripts/benchmark.js confirms astMs drop
  • AST node count unchanged (~42k rows)

@claude
Copy link

claude bot commented Mar 4, 2026

Claude finished @carlos-alm's task —— View job


PR Review: AST Transaction Optimization ⭐⭐⭐⭐

Todo List:

  • Read CLAUDE.md for repo guidelines
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Final assessment and rating

Root Cause Analysis

Problem Being Solved: The buildAstNodes function was creating 172 separate SQLite transactions (one per file) instead of using a single transaction, causing a massive performance bottleneck due to frequent BEGIN/COMMIT/fsync cycles.

Why This Approach: The fix correctly addresses the root cause by collecting all AST node rows across all files first, then inserting them in a single transaction call. This matches the pattern used by other build phases (CFG, dataflow, complexity) which already correctly used single transactions.

Risk Assessment: LOW - This is a genuine performance optimization addressing a clear inefficiency without changing the semantic behavior.

Configuration Changes & Test Integrity ✅

Configuration Analysis: No high-risk configuration changes detected:

  • No linting rules disabled or relaxed in biome.json
  • No test timeouts increased or coverage thresholds lowered
  • No TypeScript strict mode flags modified

Test Changes Review:

  • tests/engines/parity.test.js: Minor addition to enable "extended kinds" in parity tests - this enables more comprehensive testing, not less
  • tests/integration/build-parity.test.js: Very minor scope clarification - no weakened assertions
  • ✅ No test expectations lowered or assertions weakened

Code Quality Review

Implementation Quality (src/ast.js:147-236):

  • ✅ Correct approach: Changes rows = [] per-file to allRows = [] collected across all files, then single tx(allRows) call
  • ✅ Preserves semantics: Same insertion logic, same data validation, same error handling
  • ✅ Maintains transaction safety: Uses existing transaction wrapper - just changes scope from per-file to per-batch
  • ✅ Clean refactoring: Removes unnecessary totalInserted counter, improves debug message consistency

Performance Impact:

  • Excellent: 10x improvement (3600ms → 350ms native, 1800ms → 547ms WASM)
  • Follows established pattern: CFG, dataflow, complexity phases already use single transactions correctly
  • Addresses genuine bottleneck: SQLite fsync is expensive; batching dramatically reduces I/O overhead

Dependencies & Changes:

  • package-lock.json: Standard dependency updates, no concerning changes
  • Rust extractor: Minor addition for extended kinds support - enhances functionality without breaking changes
  • No unsafe patterns or shortcuts detected

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Per-file SQLite transactions causing massive I/O overhead (172 × BEGIN/COMMIT/fsync cycles)
**Why This Approach**: Correct fix - collect rows first, then single transaction. Matches existing pattern in other phases.
**Risk Assessment**: LOW - Pure performance optimization with no semantic changes

## Critical Concerns
None. Clean implementation addressing genuine performance bottleneck.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE
- Reasoning: Excellent performance fix with clean implementation. Addresses real inefficiency without compromising correctness. Benchmarks show dramatic improvement. All tests pass.

Approval Justification: This PR demonstrates textbook performance optimization - identifies bottleneck, applies correct solution (transaction batching), maintains semantic correctness, and delivers measurable results. Code quality is high and follows project patterns.

@greptile-apps
Copy link

greptile-apps bot commented Mar 4, 2026

Greptile Summary

Excellent performance optimization that fixes the per-file transaction antipattern in buildAstNodes. The core issue was creating 172 separate SQLite transactions (each with BEGIN/COMMIT/fsync) instead of one wrapping transaction - all other build phases (CFG, dataflow, complexity) already used the correct single-transaction pattern.

  • Main fix (src/ast.js): Collect all AST rows across files first, then insert in single tx(allRows) call - reduces astMs from ~3.6s → 350ms native, ~1.8s → 547ms WASM
  • Parser optimization (src/parser.js): Memoize parsers to avoid reloading WASM grammars; add ensureWasmTrees() to pre-parse files once for CFG+dataflow instead of each phase creating parsers independently
  • Tree lifecycle (src/complexity.js, src/builder.js): Remove early _tree = null cleanup - trees now shared across AST/complexity/CFG/dataflow, released once at end (line 1356-1360)
  • Correctness improvements: CFG/dataflow now use astComplexitySymbols (skip reverse-dep-only files in incremental builds); native binary v3.0.1 fixes constant scope bug (only extract program-level consts, not local variables inside functions)
  • Test adjustments: Parity tests updated for extended kinds (children, receiver); build-parity excludes constants due to documented scope bug

All changes follow established patterns in the codebase and maintain test coverage (1431 tests passing).

Confidence Score: 5/5

  • Safe to merge - textbook transaction optimization with comprehensive test coverage
  • Clean refactor that fixes a performance antipattern and adds complementary optimizations. All changes follow existing patterns (other phases already used single-transaction approach), benchmark confirms 10x speedup, 1431 tests pass including 38 AST-specific tests, and no logic changes to extraction.
  • No files require special attention

Important Files Changed

Filename Overview
src/ast.js Main perf fix: collects all AST rows first, then single transaction (was per-file). Clean refactor from rows.pushallRows.push, no logic changes.
src/builder.js Added ensureWasmTrees pre-parse step and wasmPreMs timing metric. Changed CFG/dataflow to use astComplexitySymbols for consistency. Tree cleanup still happens at line 1356-1360.
src/complexity.js Removed early symbols._tree = null - trees now shared across AST/complexity/CFG/dataflow phases, released once at end of builder.js.
src/parser.js Added parser memoization (_cachedParsers) and ensureWasmTrees to pre-parse files missing trees for downstream phases. Avoids redundant WASM grammar loading and re-parsing.
crates/codegraph-core/src/extractors/javascript.rs Fixed scope bug: only extract const literals as constants when NOT inside function bodies (checks find_parent_of_types). Test added at line 1227.

Last reviewed commit: 95e9aef

buildAstNodes was calling tx(rows) per-file inside the loop, creating
172 separate SQLite transactions each with a BEGIN/COMMIT/fsync cycle.
All other phases (CFG, dataflow, complexity) correctly used a single
wrapping transaction.

Collect all rows across files first, then insert in one tx(allRows)
call. Benchmarked: astMs drops from ~3600ms to ~350ms (native) and
~547ms (WASM). Native per-file build time: 24.9 → 8.5 ms/file.

Impact: 1 functions changed, 12 affected
@carlos-alm carlos-alm force-pushed the perf/ast-single-transaction branch from ef76756 to ef23a81 Compare March 4, 2026 08:18
@claude
Copy link

claude bot commented Mar 4, 2026

Claude finished @carlos-alm's task —— View job


PR Review: AST Transaction Optimization ⭐⭐⭐⭐

Todo List:

  • Read CLAUDE.md for repo guidelines
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity assessment
  • Phase 4: Code quality review
  • Final assessment and rating

Root Cause Analysis

Problem Being Solved: The buildAstNodes function was creating 172 separate SQLite transactions (one per file) instead of using a single transaction, causing a massive performance bottleneck due to frequent BEGIN/COMMIT/fsync cycles.

Why This Approach: The fix correctly addresses the root cause by collecting all AST node rows across all files first, then inserting them in a single transaction call. This matches the pattern used by other build phases (CFG, dataflow, complexity) which already correctly used single transactions.

Risk Assessment: LOW - This is a genuine performance optimization addressing a clear inefficiency without changing the semantic behavior.

Configuration Changes & Test Integrity ✅

Configuration Analysis: No high-risk configuration changes detected:

  • No linting rules disabled or relaxed in biome.json
  • No test timeouts increased or coverage thresholds lowered
  • No TypeScript strict mode flags modified

Test Changes Review: No test files were modified in this PR. The commit only touches:

  • src/ast.js - the core implementation fix
  • Version bump files (package.json, Cargo.toml, package-lock.json)
  • CHANGELOG.md - documentation update

✅ No test expectations lowered or assertions weakened

Code Quality Review

Implementation Quality (src/ast.js:147-236):

  • ✅ Correct approach: Changes rows = [] per-file to allRows = [] collected across all files, then single tx(allRows) call
  • ✅ Preserves semantics: Same insertion logic, same data validation, same error handling
  • ✅ Maintains transaction safety: Uses existing transaction wrapper - just changes scope from per-file to per-batch
  • ✅ Clean refactoring: Removes unnecessary totalInserted counter, improves debug message consistency

Performance Impact:

  • Excellent: 10x improvement (3600ms → 350ms native, 1800ms → 547ms WASM)
  • Follows established pattern: CFG, dataflow, complexity phases already use single transactions correctly
  • Addresses genuine bottleneck: SQLite fsync is expensive; batching dramatically reduces I/O overhead

Dependencies & Changes:

  • package-lock.json: Standard dependency updates, no concerning changes
  • Rust Cargo.toml: Version bump only
  • No unsafe patterns or shortcuts detected

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Per-file SQLite transactions causing massive I/O overhead (172 × BEGIN/COMMIT/fsync cycles)
**Why This Approach**: Correct fix - collect rows first, then single transaction. Matches existing pattern in other phases.
**Risk Assessment**: LOW - Pure performance optimization with no semantic changes

## Critical Concerns
None. Clean implementation addressing genuine performance bottleneck.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE
- Reasoning: Excellent performance fix with clean implementation. Addresses real inefficiency without compromising correctness. Benchmarks show dramatic improvement. All tests pass.

Approval Justification: This PR demonstrates textbook performance optimization - identifies bottleneck, applies correct solution (transaction batching), maintains semantic correctness, and delivers measurable results. Code quality is high and follows project patterns.


@carlos-alm carlos-alm merged commit c24f063 into main Mar 4, 2026
21 checks passed
@carlos-alm carlos-alm deleted the perf/ast-single-transaction branch March 4, 2026 08:26
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant