When a task involves multiple steps (e.g., implement + commit + PR), complete ALL steps in sequence without stopping. If creating a branch, committing, and opening a PR, finish the entire chain.
Always commit after every turn. Don't wait for the user to ask — if you made changes, commit them before responding.
msgvault is an offline Gmail archive tool that exports and stores email data locally with full-text search capabilities. The goal is to archive 20+ years of Gmail data from multiple accounts, make it searchable, and eventually delete emails from Gmail once safely archived.
Single-binary Go application:
msgvault/
├── cmd/msgvault/ # CLI entrypoint
│ └── cmd/ # Cobra commands
├── internal/ # Core packages
│ ├── tui/ # Bubble Tea TUI
│ ├── query/ # DuckDB query engine over Parquet
│ ├── store/ # SQLite database access
│ ├── deletion/ # Deletion staging and manifest
│ ├── gmail/ # Gmail API client
│ ├── sync/ # Sync orchestration
│ ├── oauth/ # OAuth2 flows (browser + device)
│ └── mime/ # MIME parsing
│
├── go.mod # Go module
└── Makefile # Build targets
# Build
make build # Debug build
make build-release # Release build (optimized)
make install # Install to ~/.local/bin or GOPATH
make test # Run tests
make lint # Run linter
# CLI usage
./msgvault init-db # Initialize database
./msgvault add-account you@gmail.com # Browser OAuth
./msgvault add-account you@gmail.com --headless # Device flow
./msgvault sync-full you@gmail.com --limit 100 # Sync with limit
./msgvault sync-full you@gmail.com --after 2024-01-01 # Sync date range
./msgvault sync-incremental you@gmail.com # Incremental sync
# TUI and analytics
./msgvault tui # Launch TUI
./msgvault tui --account you@gmail.com # Filter by account
./msgvault build-cache # Build Parquet cache
./msgvault build-cache --full-rebuild # Full rebuild
./msgvault stats # Show archive stats
# Maintenance
./msgvault repair-encoding # Fix UTF-8 encoding issuesroot.go- Cobra root command, config loadingsyncfull.go- Full sync command implementationsyncincremental.go- Incremental sync commandtui.go- TUI command, cache auto-buildbuild_cache.go- Parquet cache builder (DuckDB)repair_encoding.go- UTF-8 encoding repair
tui/model.go- Bubble Tea TUI model and update logictui/view.go- View rendering with lipgloss stylingquery/engine.go- DuckDB query engine over Parquet filesquery/sqlite.go- SQLite query engine (fallback)store/store.go- SQLite database operationsstore/schema.sql- Core SQLite schemastore/schema_sqlite.sql- FTS5 virtual tabledeletion/manifest.go- Deletion staging and manifest generationgmail/client.go- Gmail API client with rate limitingoauth/oauth.go- OAuth2 flows (browser + device)sync/sync.go- Sync orchestration, MIME parsingmime/parse.go- MIME message parsing
j/kor↑/↓- Navigate rowsEnter- Drill down into selectionEscorBackspace- Go backTab- Cycle views (Senders → Sender Names → Recipients → Recipient Names → Domains → Labels → Time)s- Cycle sort field (Name → Count → Size)r- Reverse sort directiont- Jump to Time view (cycle granularity when already in Time)a- Filter by accountf- Filter by attachmentsSpace- Toggle selectionA- Select all visiblex- Clear selectiond- Stage selected for deletionD- Stage all messages matching current filter/- Search?- Helpq- Quit
Core tables:
sources- Gmail accounts with history_id for incremental syncconversations- Gmail thread abstractionmessages- Message metadata, foreign key to conversationmessage_raw- Raw MIME blob (zlib compressed)labels/message_labels- Gmail labels (many-to-many)participants/message_recipients- From/To/Cc/Bcc addressesattachments- Attachment metadata with content-hash deduplicationmessages_fts- FTS5 virtual tablesync_runs/sync_checkpoints- Sync state for resumability
Schema files in internal/store/:
schema.sql- Core SQLite schemaschema_sqlite.sql- FTS5 virtual table
The TUI uses denormalized Parquet files for fast aggregate queries (~3000x faster than SQLite JOINs).
~/.msgvault/
├── msgvault.db # SQLite: System of record
└── analytics/ # Parquet: Aggregate analytics
├── messages/year=*/ # Partitioned by year
└── _last_sync.json # Incremental sync state
Workflow:
- Sync emails:
./msgvault sync-full you@gmail.com - Launch TUI:
./msgvault tui(auto-builds cache if needed)
Parquet schema:
- Denormalized:
from_email,from_domain,to_emails[],labels[], etc. - Partitioned by
yearfor efficient time-range queries - Compact: small fraction of SQLite size (excludes message bodies)
The TUI automatically builds/updates the Parquet cache on launch when new messages are detected.
- Gmail Sync: Full/incremental sync, OAuth (browser + headless), rate limiting, resumable checkpoints
- MIME Parsing: Subject, body (text/HTML), attachments, charset detection
- Parquet ETL: DuckDB-based SQLite → Parquet export with incremental updates
- Query Engine: DuckDB over Parquet for fast aggregate analytics
- TUI: Full-featured TUI with drill-down navigation, search, selection, deletion staging
- UTF-8 Repair: Comprehensive encoding repair for all string fields
- Deletion Execution: Execute staged deletions via Gmail API (trash or permanent delete)
- App-level encryption: Encrypt database and attachments at rest
- Web UI: Browser-based interface
./msgvault init-db
./msgvault add-account you@gmail.com
./msgvault sync-full you@gmail.com --after 2024-12-01 --before 2024-12-15
./msgvault tuiSync is read-only - no modifications to Gmail.
After making any Go code changes, always run go fmt ./... and go vet ./... before committing. Stage ALL resulting changes, including formatting-only files.
When committing changes, always stage ALL modified files (including formatting, generated files, and ancillary changes). Run git diff and git status before committing to ensure nothing is left unstaged.
All code must pass formatting and linting checks before commit. A pre-commit hook is available to enforce this automatically:
make setup-hooks # Enable pre-commit hook (fmt + lint)
make test # Run tests
make fmt # Format code (go fmt)
make lint # Run linter (golangci-lint)
go vet ./... # Check for issuesStandards:
- Default gofmt configuration
- Use
errorreturn values, wrap with context usingfmt.Errorf - Table-driven tests
- Use Bubble Tea for TUI, lipgloss for styling
- DuckDB for Parquet queries, go-duckdb driver
- SQLite via marcboeker/go-duckdb for cache building, mattn/go-sqlite3 for store
- Context-based cancellation for long operations
- Route all DB operations through
Storestruct - Charset detection via gogs/chardet, encoding via golang.org/x/text/encoding
-
Never use SELECT DISTINCT with JOINs - Use EXISTS subqueries instead (becomes semi-joins)
-
EXISTS is faster (stops at first match) and avoids duplicates at the source
-
Example - instead of:
SELECT DISTINCT m.id FROM messages m JOIN message_recipients mr ON mr.message_id = m.id WHERE mr.recipient_type = 'from' AND ...
Use:
SELECT m.id FROM messages m WHERE EXISTS ( SELECT 1 FROM message_recipients mr WHERE mr.message_id = m.id AND mr.recipient_type = 'from' AND ... )
-
Never JOIN or scan
message_bodiesin list/aggregate/search queries — this table is separated frommessagesspecifically to keep the messages B-tree small for fast scans. Only accessmessage_bodiesvia direct PK lookup (WHERE message_id = ?) when displaying a single message detail view. For text search, use FTS5 (messages_fts); if FTS is unavailable, searchsubject/snippetonly.
All data defaults to ~/.msgvault/:
~/.msgvault/config.toml- Configuration file~/.msgvault/msgvault.db- SQLite database~/.msgvault/attachments/- Content-addressed attachment storage~/.msgvault/tokens/- OAuth tokens per account~/.msgvault/analytics/- Parquet cache files
Override with MSGVAULT_HOME environment variable.
[data]
# data_dir = "~/custom/path"
[oauth]
client_secrets = "/path/to/client_secret.json"
[sync]
rate_limit_qps = 5