pi-repo-baby
Repository map generator for Pi — gives the agent structural awareness of any codebase via Tree-sitter
Package details
Install pi-repo-baby from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-repo-baby- Package
pi-repo-baby- Version
2.3.1- Published
- May 14, 2026
- Downloads
- 248/mo · 248/wk
- Author
- k2-888
- License
- MIT
- Types
- extension
- Size
- 42.1 KB
- Dependencies
- 0 dependencies · 3 peers
Pi manifest JSON
{
"extensions": [
"./index.ts"
],
"image": "https://raw.githubusercontent.com/k2-888/pi-repo-baby/main/screenshot.png"
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
Repo Baby is a Pi extension that gives the agent an explore_codebase
tool — a Tree-sitter–powered structural map of any codebase. 19 languages, cross-file
reference ranking, zero injection, zero setup.
Table of Contents
- What It Does
- When to Use (and When Not To)
- Quick Start
- Usage
- Supported Languages
- How It Works
- Comparison
- Architecture
- Design Decisions
- License
What It Does
The agent calls explore_codebase and gets a ranked structural map:
- rich/console.py:
class Console (line 581) ← 104 files
class ConsoleOptions (line 113) ← 32 files
class Group (line 450) ← 13 files
function group (line 483) ← 6 files
- rich/progress.py:
function open (line 372) ← 36 files
class Progress (line 1061) ← 6 files
class TextColumn (line 616) ← 4 files
class BarColumn (line 646) ← 4 files
- rich/style.py:
class Style (line 40) ← 44 files
class StyleStack (line 765) ← 6 files
Symbols are ranked by cross-file reference count. The more files that reference a
symbol, the higher it appears. Entry points (main, App, index) get a boost.
Test files and dunder methods (__init__, __str__) are filtered out.
The agent uses this to jump straight to the right file instead of chaining ls →
find → rg → read. After making edits, it calls explore_codebase again to
verify the structure is intact.
When to Use (and When Not To)
explore_codebase is built for large, unfamiliar codebases. If you're
opening a PR against an open-source project you've never seen, onboarding to a
monorepo, tracing a bug through a legacy system, or auditing a codebase for
security review — the map saves 5–15 exploration turns by telling the agent
exactly where everything lives.
| Environment | Use it? | Why |
|---|---|---|
| Open-source projects (first contribution) | ✅ | You don't know the layout. The map shows entry points, core classes, and cross-file call chains in one call. |
| Monorepos (50+ packages) | ✅ | The dependency graph spans dozens of directories. ls can't show you that auth.ts is referenced by 47 files across 6 packages. |
| Enterprise codebases | ✅ | Hundreds of files, years of accretion. The ranking surfaces the files that actually matter instead of the ones that happen to sort first alphabetically. |
| Legacy systems (no docs) | ✅ | No README, no architecture diagram. The map IS the documentation — ranked symbols with reference counts. |
| Code review / security audit | ✅ | "Find every file that touches authentication" — the map shows the call graph before you read a single line. |
| Refactoring (cross-cutting) | ✅ | "What breaks if I rename this class?" The ref count tells you exactly how many files reference it. |
| After making edits | ✅ | Call explore_codebase to verify your changes landed correctly and no symbols were orphaned. |
| Your own project (you know it) | ❌ | You already know where everything is. The map adds no information. |
| 3-file scripts / utilities | ❌ | ls shows you everything in one command. The map is overhead. |
| Single-file edits in a known file | ❌ | You're going straight to read + edit. The map is an extra call with no payoff. |
To disable: /repo-baby off hides the tool from the agent entirely.
/repo-baby on brings it back. Use off when you're in familiar territory
or working on a single file — no point paying the ~10s generation cost for zero benefit.
The extension loads by default. If most of your work is small projects, set it to start disabled and toggle on for big ones:
/repo-baby off # default off for quick work
/repo-baby on # toggle on when you clone something big
Quick Start
git clone https://github.com/k2-888/pi-repo-baby ~/.pi/agent/extensions/repo-baby
That's it. The extension auto-installs tree-sitter-language-pack into its own venv/
on first session. You'll see a toast notification during install, then it's
silent forever. No pip install, no manual steps.
Usage
| Command | Effect |
|---|---|
/repo-baby |
Show usage and current state |
/repo-baby on |
Enable the tool (default) |
/repo-baby off |
Disable — tool hidden from agent |
/repo-baby status |
Show enabled state + dependency health |
/repo-baby doctor |
Re-check Python/Tree-sitter deps |
/repo-baby refresh |
Reminder: explore_codebase gives a fresh snapshot |
The agent calls explore_codebase on its own — typically as its first action when
entering a codebase, and again after edits. Three mechanisms guide adoption:
promptSnippet— one-liner in the agent'sAvailable tools:listpromptGuidelines— behavioral rules in theGuidelines:section- Mid-turn steer nudge — if the agent chains 2+
ls/fd/findexploration commands without usingexplore_codebase, a reminder message is injected
Supported Languages
All 19 languages via Tree-sitter, bundled in tree-sitter-language-pack. No regex.
| Language | Extensions |
|---|---|
| Python | .py |
| JavaScript | .js |
| TypeScript | .ts |
| TSX | .tsx |
| Go | .go |
| Rust | .rs |
| Ruby | .rb |
| Java | .java |
| C | .c, .h |
| C++ | .cpp, .cc, .cxx, .hpp, .hxx |
| C# | .cs |
| PHP | .php |
| Kotlin | .kt, .kts |
| Swift | .swift |
| Scala | .scala, .sc |
| Bash | .sh, .bash |
| SQL | .sql |
| Lua | .lua |
| Terraform / HCL | .tf, .tfvars, .hcl |
How It Works
File Discovery
Prefers git ls-files (respects .gitignore). Falls back to os.walk with
ignore rules for node_modules, .venv, dist, vendor, etc.
Symbol Extraction
Each file is parsed with the appropriate Tree-sitter grammar. The AST is walked
to extract function, class, method, interface, struct, trait, and impl declarations.
Class/module context is tracked so methods display as ClassName.method().
Dunder methods (__init__, __str__, __repr__) and config files (YAML, JSON,
Markdown, HTML, CSS, TOML) are excluded — they add noise without signal.
Ranking
Uses in-degree reference counting: for each symbol, counts how many other files contain its name. A word-boundary tokenizer scans each file once (O(file_size)), then set-intersection with symbol names produces reference counts.
Core structural types (classes, interfaces) get a 1.5× boost. Test files get a 20× demotion.
Output
Symbols are sorted by reference count (descending) then grouped by file. Output
is trimmed to the token budget. Reference counts are shown inline so the agent
can see why something ranks high without verifying with rg.
Comparison
vs RAG / Embeddings
| RAG | Repo Baby | |
|---|---|---|
| Infrastructure | Vector DB, embeddings model, chunking | Single Python script |
| Cost | API calls, storage, re-indexing | Free, deterministic, local |
| Freshness | Must re-index after changes | On-demand regeneration (~0.5s file discovery) |
| Signal | Semantic chunks, retrieval misses possible | Exact symbol map with call graph |
| Failure mode | Silent retrieval gaps | Binary — map exists or it doesn't |
Architecture
index.ts (TypeScript) ──pi.exec()──→ repo-baby.py --path <cwd> --token-budget <N>
│ │
│ explore_codebase tool │ git ls-files OR os.walk
│ /repo-baby command │ Tree-sitter parse each file
│ session_start → ensureDeps() │ Word-tokenizer reference counting
│ tool_call → exploration tracker │ In-degree ranking + formatting
│ tool_execution_end → steer nudge │
│ ↓
└── All paths call the same script stdout → returned to agent
index.ts (~335 lines)
explore_codebasetool withpromptSnippetandpromptGuidelines/repo-babyslash command with tab-completionensureDeps()— one-time auto-install of tree-sitter-language-pack- Exploration tracking — steer nudge after 2+ bash exploration commands
repo-baby.py (~620 lines)
- File discovery:
git ls-files→os.walkfallback - Symbol extraction: Tree-sitter AST walk with class/module context
- Ranking: single-pass word tokenizer + set-intersection reference counting
- Formatting: token-budget-aware output with inline reference counts
Design Decisions
| Decision | Rationale |
|---|---|
| Tool agency over injection | Pi agents have tools. Pushing data into context fights the architecture. |
| Tree-sitter only | If the parser isn't available, skip silently. No fallback, no ambiguity. |
| Auto-install deps | User never sees a dep command. Extension handles its own requirements. |
| In-degree ranking | Simpler than PageRank, equally effective, no networkx dependency. |
| Reference counts in output | ← 104 files tells the agent why something ranks high. Builds trust. |
| No caching (yet) | Regenerates from scratch every call (~10s for 200+ files). Freshness guaranteed; speed improvements planned. |
| Code-only discovery | YAML, JSON, Markdown, HTML, CSS, TOML excluded — config noise dilutes signal. |
| Test file demotion (20×) | Production code surfaces first. Tests still appear if they have genuine cross-file importance. |
| Dunder filter | __init__, __str__, __repr__ are boilerplate. Filtered at extraction time. |
| Deduplication | Same-name symbols in one file appear once. No function open × 3. |
| Action-verb name | explore_codebase sits naturally alongside read, write, edit, bash. |
License
MIT

