How Forge Works

import { Aside } from ‘@astrojs/starlight/components’;

The big picture

Your repo (files on disk)
        │
        ▼ forge index .
┌───────────────────────────────────┐
│         Indexing pipeline         │
│                                   │
│  1. File scanner                  │
│     Walk repo tree, apply         │
│     .gitignore + language filters │
│                                   │
│  2. tree-sitter AST parser        │
│     Parse each source file into   │
│     a concrete syntax tree        │
│                                   │
│  3. Symbol extractor              │
│     Functions, classes, types,    │
│     interfaces, imports, exports  │
│                                   │
│  4. Dependency edge builder       │
│     Resolve imports → file paths  │
│     Build directed graph in SQLite│
│                                   │
│  5. (optional) Tantivy indexer    │
│     Full-text search over code    │
│     chunks with camelCase aware   │
│     tokenization                  │
│                                   │
│  6. (optional) Git ingester       │
│     Commit history + blame data   │
│     into SQLite                   │
└───────────────────────────────────┘
        │
        ▼ stored locally
┌───────────────────┐
│   ~/.forge/<id>/  │
│                   │
│  forge.db         │  ← AST graph, symbols, imports, health cache
│  search/          │  ← Tantivy full-text index
│  git.db           │  ← Commit history (when --with-git)
└───────────────────┘
        │
        ▼ forge serve .
┌───────────────────────────────────┐
│          MCP stdio server         │
│                                   │
│  JSON-RPC 2.0 over stdin/stdout   │
│                                   │
│  On connect: inject server        │
│  instructions into agent context  │
│                                   │
│  On tool call: query index,       │
│  return structured JSON           │
└───────────────────────────────────┘
        │
        ▼ MCP tools
┌────────────────────────────────────────┐
│              AI agent                  │
│  (Claude Code, Cursor, Windsurf, ...)  │
│                                        │
│  forge_prepare → plan refactors        │
│  forge_search → find code by concept  │
│  forge_trace_dependents → find callers│
│  forge_health_check → find problems   │
│  ... (18 more tools)                  │
└────────────────────────────────────────┘

Phase 1: Indexing

forge index . runs the indexing pipeline. It’s designed to be run incrementally — on first run it processes every file; on subsequent runs it checks modification times and only re-processes changed files.

What gets parsed

Forge uses tree-sitter grammars to parse source files into concrete syntax trees (CSTs). From each CST, Forge extracts:

Functions and methods — name, line range, parameter list, return type (where inferrable)
Classes and structs — name, implemented interfaces, parent class
TypeScript/Rust types and interfaces — name, field list
Import statements — the import path, what’s being imported (named vs default vs namespace)
Export statements — what’s being exported, whether it’s a re-export

Every extracted symbol is stored in SQLite with its file path, line number, and language. Every import edge is stored as a directed edge: (source_file, target_file, specifier).

What doesn’t get parsed (at AST level)

Files in languages Forge doesn’t have tree-sitter grammars for — Ruby, Java, C, C++, Swift, Kotlin, etc. — are still indexed by the full-text search layer (if --with-search was used) and appear in file-level search results. They just don’t contribute to the dependency graph or symbol table.

Incremental vs full index

forge index . is incremental by default. It reads the last-modified timestamp of each file and skips unchanged files. This makes daily-use re-indexing fast (typically seconds for a repo with <10 changed files).

forge index . --full wipes the SQLite database and re-indexes from scratch. Use this when you’ve changed Forge’s config, added a new language, or suspect the index has drifted from the actual files.

Phase 2: Storage

Forge uses three storage layers:

SQLite (`forge.db`)

The primary store. Contains:

files — every indexed file path, language, last-modified time
symbols — every extracted function/class/type with file + line
imports — dependency edges: source_file → target_file
exports — named exports per file
health_findings — cached results from the last forge health run
heartbeat — license validation cache (added v1.3.0)

SQLite was chosen because it’s zero-dependency, handles concurrent readers, and gives Forge’s dependency graph queries predictable performance.

Tantivy full-text index (`search/`)

Tantivy is a Rust-native full-text search library (similar to Lucene). When you run --with-search, Forge chunks each source file into function/class/block segments and indexes them. The tokenizer is camelCase-aware: searching for payment matches processPayment, PaymentService, and payment_handler without wildcards.

Built only on request because it adds ~10–30 seconds to the initial index time and consumes more disk space (~20% of repo size in search index).

Git data (embedded in `forge.db`)

When --with-git is used, Forge reads the git object database with the gix crate and stores commit metadata and blame records in SQLite. This powers:

forge_git_history — last N commits touching a file
forge_git_blame — per-line author and commit for a file range

Phase 3: MCP server

forge serve . starts the MCP server. It listens on stdin and writes to stdout using JSON-RPC 2.0. The MCP client (Claude Code, etc.) launches Forge as a subprocess via the config in .mcp.json.

Server instructions (the key behavior)

At MCP connection time (the initialize handshake), Forge injects a block of behavioral instructions into the agent’s system prompt. These instructions teach the agent:

Call forge_prepare before modifying any file
Call forge_validate after edits are complete
Use forge_understand when encountering unfamiliar code
Use specific tools for targeted lookups vs workflow composites for typical tasks

This is why Forge’s behavior in Claude Code is automatic without user prompting. The agent learns the correct usage pattern from Forge’s own server instructions at the start of every session.

Tool dispatch

When the agent calls a tool, Forge:

Validates the input against the tool’s JSON Schema
Runs the appropriate query against SQLite (and/or Tantivy)
Returns a structured JSON result
Logs the call to ~/.forge/stats.json (local only, never transmitted)

All queries run in-process — there are no sub-processes or HTTP calls during normal tool operation.

Why local-first matters

Every architecture decision in Forge optimizes for local-first operation:

SQLite over a network database — no latency, no authentication, no network dependency for queries
Static binary with bundled runtime — no Node.js, no Python, no JVM to install or version-manage
Incremental index — re-indexing fits into a normal development loop without blocking the agent

The tradeoff is that Forge doesn’t sync across machines by default. Each machine has its own index. Team tier adds CI cache support for sharing indexes across CI runs.