AST Parsing

import { Aside } from ‘@astrojs/starlight/components’;

Why AST instead of regex?

Forge uses AST (Abstract Syntax Tree) parsing for all structural analysis — dependency edges, symbol extraction, dead export detection, and pattern search. This is a deliberate choice with real consequences.

The regex failure mode

Consider this code:

// This is a comment: import { foo } from './utils'
const str = "import { bar } from './helpers'";
import { baz } from './services';

A regex like ^import .* from '(.*)' would match all three lines. Two of them are not real imports — one is in a comment, one is in a string literal.

Multiply this by 50,000 files and you get a dependency graph full of phantom edges that don’t represent actual code relationships.

An AST parser reads the file the same way the compiler does. It understands that the first line is a comment node, the second is a string literal, and only the third is an actual import declaration. Only the real import becomes an edge.

What AST parsing enables

Precise import edges mean:

Accurate dependent counts — forge_trace_dependents returns actual callers, not grep matches
Real dead export detection — an export with 0 AST-level consumers is actually unused, not just lexically unmatched
Circular dependency detection — cycle detection on the graph only works if the graph is accurate
Structural pattern search — forge_pattern_search matches code structure, not text. The pattern async function $NAME($$$) { $$$BODY } matches every async function regardless of whitespace or formatting differences

The tree-sitter advantage

Forge uses tree-sitter grammars. tree-sitter is:

Error-tolerant — it parses files with syntax errors and returns a partial CST rather than failing entirely. This means indexing continues on real-world repos that have files with minor syntax issues.
Fast — tree-sitter parsers are hand-optimized state machines, not general recursive-descent parsers. Parsing a 1,000-line file takes microseconds.
Incremental — tree-sitter supports incremental re-parsing (updating only the changed subtree when a file is modified). Forge uses this in --watch mode.
Multi-language — the same API works for all supported languages. Forge’s symbol extractor code is language-agnostic; the grammar handles language specifics.

Supported languages (v1.4.0)

Language	AST parsing	Import edges	Export tracking	Symbol extraction	Pattern search
TypeScript	Yes	Yes	Yes	Functions, classes, interfaces, types	Yes
JavaScript	Yes	Yes	Yes	Functions, classes	Yes
Python	Yes	Yes	Yes (module-level)	Functions, classes	Yes
Rust	Yes	Yes (use declarations)	Yes (pub items)	Functions, structs, traits, enums	Yes
Go	Yes	Yes	Yes (exported identifiers)	Functions, structs, interfaces	Yes

All other languages are indexed for full-text search only. They appear in forge_search results but don’t contribute to the dependency graph.

What “import edges” means per language

The way imports work differs by language, so Forge handles each slightly differently:

TypeScript / JavaScript:

import { foo } from './utils';     // named import → edge to ./utils
import type { Bar } from './types'; // type-only import → same edge
export { baz } from './other';      // re-export → edge to ./other

All three produce edges. Dynamic import() calls are tracked if the path is a string literal.

Python:

from .utils import foo          # relative import → edge to utils module
import services.auth            # absolute import → edge to services/auth
from typing import Optional     # stdlib → no edge (not in repo)

Relative and project-internal imports produce edges. Standard library and third-party package imports (no local file found) are recorded but don’t create edges in the dependency graph.

Rust:

use crate::utils::foo;          // crate-internal → edge
use super::parent_mod;          // super → edge
use std::collections::HashMap;  // std → no edge
use tokio::runtime;             // external crate → no edge

crate:: and super:: imports produce edges. External crate dependencies are noted but don’t appear in the dependency graph.

Go:

import "myproject/internal/auth"  // project-internal → edge
import "github.com/user/pkg"      // external → no edge
import "fmt"                       // stdlib → no edge

Only imports that match files in the indexed repo produce edges.

SCIP: upgrading to compiler-resolved edges

tree-sitter parses the AST but doesn’t run the compiler. This means import edge resolution is heuristic — Forge maps import paths to files using naming conventions and tsconfig.json / pyproject.toml path maps where available.

For projects that need compiler-precise edges, Forge supports SCIP ingestion:

# Generate a SCIP index with your language toolchain
# (rust-analyzer, scip-python, scip-typescript, etc.)
# Then ingest it:
forge ingest-scip ./index.scip

After SCIP ingestion, the affected import edges are upgraded from heuristic to compiler-resolved. This is especially valuable for TypeScript projects with complex path aliases or Python projects with namespace packages.

See How-To: Multi-language Repo for SCIP setup per language.

Limits and caveats

Alias resolution: TypeScript path aliases defined in tsconfig.json (@/components/* → src/components/*) are resolved when Forge finds the tsconfig.json at index time. Dynamic aliases (runtime module resolution tricks) cannot be resolved statically.

Dynamic imports: require() calls and import() expressions where the path is computed at runtime (e.g., require('./plugins/' + name)) are not tracked as edges.

Monorepos: Forge indexes one repo at a time. Cross-repo edges (one package importing from a sibling package in a monorepo) are tracked within the same index if all packages are under the indexed root. See How-To: Configure Monorepo.

Re-exports: export { foo } from './bar' correctly creates an edge to ./bar and marks foo as a re-export (not a dead export of ./bar’s foo).