How Forge Works
import { Aside } from ‘@astrojs/starlight/components’;
The big picture
Section titled “The big picture”Your repo (files on disk) │ ▼ forge index .┌───────────────────────────────────┐│ Indexing pipeline ││ ││ 1. File scanner ││ Walk repo tree, apply ││ .gitignore + language filters ││ ││ 2. tree-sitter AST parser ││ Parse each source file into ││ a concrete syntax tree ││ ││ 3. Symbol extractor ││ Functions, classes, types, ││ interfaces, imports, exports ││ ││ 4. Dependency edge builder ││ Resolve imports → file paths ││ Build directed graph in SQLite││ ││ 5. (optional) Tantivy indexer ││ Full-text search over code ││ chunks with camelCase aware ││ tokenization ││ ││ 6. (optional) Git ingester ││ Commit history + blame data ││ into SQLite │└───────────────────────────────────┘ │ ▼ stored locally┌───────────────────┐│ ~/.forge/<id>/ ││ ││ forge.db │ ← AST graph, symbols, imports, health cache│ search/ │ ← Tantivy full-text index│ git.db │ ← Commit history (when --with-git)└───────────────────┘ │ ▼ forge serve .┌───────────────────────────────────┐│ MCP stdio server ││ ││ JSON-RPC 2.0 over stdin/stdout ││ ││ On connect: inject server ││ instructions into agent context ││ ││ On tool call: query index, ││ return structured JSON │└───────────────────────────────────┘ │ ▼ MCP tools┌────────────────────────────────────────┐│ AI agent ││ (Claude Code, Cursor, Windsurf, ...) ││ ││ forge_prepare → plan refactors ││ forge_search → find code by concept ││ forge_trace_dependents → find callers││ forge_health_check → find problems ││ ... (18 more tools) │└────────────────────────────────────────┘Phase 1: Indexing
Section titled “Phase 1: Indexing”forge index . runs the indexing pipeline. It’s designed to be run incrementally
— on first run it processes every file; on subsequent runs it checks modification
times and only re-processes changed files.
What gets parsed
Section titled “What gets parsed”Forge uses tree-sitter grammars to parse source files into concrete syntax trees (CSTs). From each CST, Forge extracts:
- Functions and methods — name, line range, parameter list, return type (where inferrable)
- Classes and structs — name, implemented interfaces, parent class
- TypeScript/Rust types and interfaces — name, field list
- Import statements — the import path, what’s being imported (named vs default vs namespace)
- Export statements — what’s being exported, whether it’s a re-export
Every extracted symbol is stored in SQLite with its file path, line number, and
language. Every import edge is stored as a directed edge: (source_file, target_file, specifier).
What doesn’t get parsed (at AST level)
Section titled “What doesn’t get parsed (at AST level)”Files in languages Forge doesn’t have tree-sitter grammars for — Ruby, Java, C, C++,
Swift, Kotlin, etc. — are still indexed by the full-text search layer (if
--with-search was used) and appear in file-level search results. They just don’t
contribute to the dependency graph or symbol table.
Incremental vs full index
Section titled “Incremental vs full index”forge index . is incremental by default. It reads the last-modified timestamp of
each file and skips unchanged files. This makes daily-use re-indexing fast (typically
seconds for a repo with <10 changed files).
forge index . --full wipes the SQLite database and re-indexes from scratch. Use
this when you’ve changed Forge’s config, added a new language, or suspect the index
has drifted from the actual files.
Phase 2: Storage
Section titled “Phase 2: Storage”Forge uses three storage layers:
SQLite (forge.db)
Section titled “SQLite (forge.db)”The primary store. Contains:
files— every indexed file path, language, last-modified timesymbols— every extracted function/class/type with file + lineimports— dependency edges: source_file → target_fileexports— named exports per filehealth_findings— cached results from the lastforge healthrunheartbeat— license validation cache (added v1.3.0)
SQLite was chosen because it’s zero-dependency, handles concurrent readers, and gives Forge’s dependency graph queries predictable performance.
Tantivy full-text index (search/)
Section titled “Tantivy full-text index (search/)”Tantivy is a Rust-native full-text search library (similar to Lucene). When you
run --with-search, Forge chunks each source file into function/class/block
segments and indexes them. The tokenizer is camelCase-aware: searching for payment
matches processPayment, PaymentService, and payment_handler without wildcards.
Built only on request because it adds ~10–30 seconds to the initial index time and consumes more disk space (~20% of repo size in search index).
Git data (embedded in forge.db)
Section titled “Git data (embedded in forge.db)”When --with-git is used, Forge reads the git object database with the gix crate
and stores commit metadata and blame records in SQLite. This powers:
forge_git_history— last N commits touching a fileforge_git_blame— per-line author and commit for a file range
Phase 3: MCP server
Section titled “Phase 3: MCP server”forge serve . starts the MCP server. It listens on stdin and writes to stdout
using JSON-RPC 2.0. The MCP client (Claude Code, etc.) launches Forge as a
subprocess via the config in .mcp.json.
Server instructions (the key behavior)
Section titled “Server instructions (the key behavior)”At MCP connection time (the initialize handshake), Forge injects a block of
behavioral instructions into the agent’s system prompt. These instructions teach
the agent:
- Call
forge_preparebefore modifying any file - Call
forge_validateafter edits are complete - Use
forge_understandwhen encountering unfamiliar code - Use specific tools for targeted lookups vs workflow composites for typical tasks
This is why Forge’s behavior in Claude Code is automatic without user prompting. The agent learns the correct usage pattern from Forge’s own server instructions at the start of every session.
Tool dispatch
Section titled “Tool dispatch”When the agent calls a tool, Forge:
- Validates the input against the tool’s JSON Schema
- Runs the appropriate query against SQLite (and/or Tantivy)
- Returns a structured JSON result
- Logs the call to
~/.forge/stats.json(local only, never transmitted)
All queries run in-process — there are no sub-processes or HTTP calls during normal tool operation.
Why local-first matters
Section titled “Why local-first matters”Every architecture decision in Forge optimizes for local-first operation:
- SQLite over a network database — no latency, no authentication, no network dependency for queries
- Static binary with bundled runtime — no Node.js, no Python, no JVM to install or version-manage
- Incremental index — re-indexing fits into a normal development loop without blocking the agent
The tradeoff is that Forge doesn’t sync across machines by default. Each machine has its own index. Team tier adds CI cache support for sharing indexes across CI runs.