Skip to content

How Forge Works

import { Aside } from ‘@astrojs/starlight/components’;

Your repo (files on disk)
▼ forge index .
┌───────────────────────────────────┐
│ Indexing pipeline │
│ │
│ 1. File scanner │
│ Walk repo tree, apply │
│ .gitignore + language filters │
│ │
│ 2. tree-sitter AST parser │
│ Parse each source file into │
│ a concrete syntax tree │
│ │
│ 3. Symbol extractor │
│ Functions, classes, types, │
│ interfaces, imports, exports │
│ │
│ 4. Dependency edge builder │
│ Resolve imports → file paths │
│ Build directed graph in SQLite│
│ │
│ 5. (optional) Tantivy indexer │
│ Full-text search over code │
│ chunks with camelCase aware │
│ tokenization │
│ │
│ 6. (optional) Git ingester │
│ Commit history + blame data │
│ into SQLite │
└───────────────────────────────────┘
▼ stored locally
┌───────────────────┐
│ ~/.forge/<id>/ │
│ │
│ forge.db │ ← AST graph, symbols, imports, health cache
│ search/ │ ← Tantivy full-text index
│ git.db │ ← Commit history (when --with-git)
└───────────────────┘
▼ forge serve .
┌───────────────────────────────────┐
│ MCP stdio server │
│ │
│ JSON-RPC 2.0 over stdin/stdout │
│ │
│ On connect: inject server │
│ instructions into agent context │
│ │
│ On tool call: query index, │
│ return structured JSON │
└───────────────────────────────────┘
▼ MCP tools
┌────────────────────────────────────────┐
│ AI agent │
│ (Claude Code, Cursor, Windsurf, ...) │
│ │
│ forge_prepare → plan refactors │
│ forge_search → find code by concept │
│ forge_trace_dependents → find callers│
│ forge_health_check → find problems │
│ ... (18 more tools) │
└────────────────────────────────────────┘

forge index . runs the indexing pipeline. It’s designed to be run incrementally — on first run it processes every file; on subsequent runs it checks modification times and only re-processes changed files.

Forge uses tree-sitter grammars to parse source files into concrete syntax trees (CSTs). From each CST, Forge extracts:

  • Functions and methods — name, line range, parameter list, return type (where inferrable)
  • Classes and structs — name, implemented interfaces, parent class
  • TypeScript/Rust types and interfaces — name, field list
  • Import statements — the import path, what’s being imported (named vs default vs namespace)
  • Export statements — what’s being exported, whether it’s a re-export

Every extracted symbol is stored in SQLite with its file path, line number, and language. Every import edge is stored as a directed edge: (source_file, target_file, specifier).

Files in languages Forge doesn’t have tree-sitter grammars for — Ruby, Java, C, C++, Swift, Kotlin, etc. — are still indexed by the full-text search layer (if --with-search was used) and appear in file-level search results. They just don’t contribute to the dependency graph or symbol table.

forge index . is incremental by default. It reads the last-modified timestamp of each file and skips unchanged files. This makes daily-use re-indexing fast (typically seconds for a repo with <10 changed files).

forge index . --full wipes the SQLite database and re-indexes from scratch. Use this when you’ve changed Forge’s config, added a new language, or suspect the index has drifted from the actual files.

Forge uses three storage layers:

The primary store. Contains:

  • files — every indexed file path, language, last-modified time
  • symbols — every extracted function/class/type with file + line
  • imports — dependency edges: source_file → target_file
  • exports — named exports per file
  • health_findings — cached results from the last forge health run
  • heartbeat — license validation cache (added v1.3.0)

SQLite was chosen because it’s zero-dependency, handles concurrent readers, and gives Forge’s dependency graph queries predictable performance.

Tantivy is a Rust-native full-text search library (similar to Lucene). When you run --with-search, Forge chunks each source file into function/class/block segments and indexes them. The tokenizer is camelCase-aware: searching for payment matches processPayment, PaymentService, and payment_handler without wildcards.

Built only on request because it adds ~10–30 seconds to the initial index time and consumes more disk space (~20% of repo size in search index).

When --with-git is used, Forge reads the git object database with the gix crate and stores commit metadata and blame records in SQLite. This powers:

  • forge_git_history — last N commits touching a file
  • forge_git_blame — per-line author and commit for a file range

forge serve . starts the MCP server. It listens on stdin and writes to stdout using JSON-RPC 2.0. The MCP client (Claude Code, etc.) launches Forge as a subprocess via the config in .mcp.json.

At MCP connection time (the initialize handshake), Forge injects a block of behavioral instructions into the agent’s system prompt. These instructions teach the agent:

  • Call forge_prepare before modifying any file
  • Call forge_validate after edits are complete
  • Use forge_understand when encountering unfamiliar code
  • Use specific tools for targeted lookups vs workflow composites for typical tasks

This is why Forge’s behavior in Claude Code is automatic without user prompting. The agent learns the correct usage pattern from Forge’s own server instructions at the start of every session.

When the agent calls a tool, Forge:

  1. Validates the input against the tool’s JSON Schema
  2. Runs the appropriate query against SQLite (and/or Tantivy)
  3. Returns a structured JSON result
  4. Logs the call to ~/.forge/stats.json (local only, never transmitted)

All queries run in-process — there are no sub-processes or HTTP calls during normal tool operation.

Every architecture decision in Forge optimizes for local-first operation:

  • SQLite over a network database — no latency, no authentication, no network dependency for queries
  • Static binary with bundled runtime — no Node.js, no Python, no JVM to install or version-manage
  • Incremental index — re-indexing fits into a normal development loop without blocking the agent

The tradeoff is that Forge doesn’t sync across machines by default. Each machine has its own index. Team tier adds CI cache support for sharing indexes across CI runs.