pi-auggie-router
Opinionated sub-agent router for Pi: tightly couples SKILL.md execution with the Augment Code (auggie) Context Engine.
Package details
Install pi-auggie-router from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-auggie-router- Package
pi-auggie-router- Version
1.4.1- Published
- Jun 7, 2026
- Downloads
- 843/mo · 8/wk
- Author
- ngsoftware
- License
- MIT
- Types
- extension
- Size
- 511 KB
- Dependencies
- 1 dependency · 1 peer
Pi manifest JSON
{
"extensions": [
"./extension.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-auggie-router
Opinionated
/skill:sub-agent router for the Pi framework. Tightly couples Anthropic-styleSKILL.mdexecution with the Augment Code (auggie) Context Engine via MCP.
New to pi-auggie-router? Start with the Getting Started guide for a step-by-step walkthrough of your first skill workflow.
Installing via
pi install(pi.dev bridge)? Use the slash form/skill <name>— the colon form/skill:<name>falls through unintercepted under the extension bridge. See Getting Started.
pi-auggie-router intercepts /skill:<name> commands inside a Pi host,
parses the matching SKILL.md, runs a 2-pass Actor/Judge brief loop
against a cheap routing model, then dispatches the work to an isolated Pi
sub-agent that is forced to retrieve workspace context through Augment Code's
codebase-retrieval MCP tool. The main thread stays clean — the user sees
their command and the synthesized result, nothing else.
Why
Out of the box, /skill execution dumps full file blobs into context, blows
out token budgets, and produces inconsistent retrieval. This router takes the
opposite stance: a single hardcoded path through Augment's semantic engine,
strict timeouts, and a payload ceiling that forces the model to refine its
own queries instead of vomiting megabytes back into the loop.
This package is vendor-locked on purpose. It will not run without a
working local auggie install.
The main agent learns how to use the router automatically — the
extension installs a before_agent_start hook that injects a
versioned ## pi-auggie-router block into the system prompt on
every turn, so the agent picks the right skill, uses the correct
invocation syntax, and respects the bridge limitations without any
user-side configuration. See Auto-injected agent system
prompt for details and opt-out.
Installation
npm install pi-auggie-router
Requires Node ≥ 20.6 and a working Augment Code CLI
(auggie account status must exit 0).
Mounting it inside a Pi host
import { createRouter } from "pi-auggie-router";
const router = createRouter(piHost);
// Later, on shutdown:
router.dispose();
piHost must satisfy the PiHost contract exported from this package:
| Method | Purpose |
|---|---|
postSystemMessage |
Append [System]: … lines to the visible thread. |
postAssistantMessage |
Append the sub-agent's final synthesized output to the thread. |
setInputLocked |
Disable / re-enable the user's main editor while a skill runs. |
getRecentMessages(n) |
Return the last n chat messages for Actor brief assembly. |
callLLM(opts) |
Cheap routing-class call used by Actor + Judge. |
runSubAgent(opts) |
Spin up an isolated Pi agent with MCP servers + middleware attached. |
onUserInput(cb) |
Invoked for every user input; return {cancel:true} to swallow. |
onBeforeMessage(cb) |
Invoked before a typed message is sent; used for the Q&A fallback. |
resolveWorkspacePath |
Resolve paths inside the active workspace (for .pi/skills/...). |
resolveHomePath |
Resolve paths inside ~ (for ~/.pi/agent/skills/...). |
log (optional) |
Structured logger. |
Configuration
All knobs live under auggieRouter in .pi/settings.json:
{
"auggieRouter": {
"defaultProvider": "openrouter",
"routingModel": "anthropic/claude-3-5-haiku",
"historyWindow": 20,
"maxJudgeIterations": 2,
"routingTimeoutMs": 60000,
"qaTimeoutMs": 300000,
"totalTimeoutMs": 300000,
"inactivityTimeoutMs": 60000,
"subAgentTemperature": 0.0,
"overflowCeilingBytes": 25000,
"auggieBinPath": "auggie",
"allowedProviderPrefixes": [],
"executionRouting": {
"enabled": false,
"preference": "balanced",
"surfaceDecision": false,
"skillModelPolicy": "pin",
"models": {
"cheap": "anthropic/claude-3-5-haiku",
"balanced": "anthropic/claude-3-5-sonnet",
"frontier": "anthropic/claude-3-7-sonnet"
}
},
"debugPromptPrefixHash": false,
"outputSanitizer": {
"enabled": true,
"finalOutputMaxChars": 120000,
"stripToolTraces": true
},
"contextBudgets": {
"enabled": false,
"overflowCeilingBytes": {
"cheap": 15000,
"balanced": 25000,
"frontier": 50000
}
},
"historyAssembly": {
"strategy": "recent",
"headMessages": 2,
"tailMessages": 12,
"middleMode": "marker",
"maxCharsPerMessage": 10000,
"maxTotalChars": 60000
},
"contextMemory": {
"enabled": false,
"maxEntries": 8,
"maxBytesPerRun": 1000000,
"previewHeadChars": 4000,
"previewTailChars": 4000
},
"parallelSubagents": {
"enabled": false,
"maxSubagents": 3,
"perWorkerOutputCharCap": 8000
},
"executionTrace": {
"enabled": true,
"maxResultPreviewChars": 2000,
"traceDirectory": ".pi/traces"
},
"traceObservability": {
"enabled": true,
"showReportAfterExecution": false,
"degradationAlertEnabled": true,
"degradationConsecutiveFailures": 3,
"degradationAlertCooldownHours": 24,
"reportMaxTraces": 10,
"reportMaxInlineTraces": 5,
"regressionWindowSize": 10,
"maxTracesPerSkill": 20
},
"promptInjection": {
"enabled": true
}
}
}
Note on data exposure: the routing-class model (default
claude-3-5-haikuvia OpenRouter) sees the lasthistoryWindowchat messages. If your chat may contain secrets, pointroutingModelat a self-hosted gateway or trimhistoryWindow.
Defaults match the values shown above. Only defaultProvider is expected to
change in normal use; everything else is opinionated for a reason.
Adaptive Execution Model Routing
By default, the router resolves the execution model from the SKILL.md
model: frontmatter field (or a built-in fallback). This means easy tasks
and hard tasks run on the same static model.
Adaptive execution routing adds a lightweight model-selection step: after the
Actor/Judge loop classifies the task by complexity and risk, the router picks
an appropriate model from a configurable cheap / balanced / frontier pool.
The selection is sticky for the entire /skill run — one model is chosen
before the sub-agent starts and never changes mid-execution.
Disabled by default. Existing behavior is preserved unless you explicitly opt in.
Enabling adaptive routing
The smallest opt-in is:
{
"auggieRouter": {
"executionRouting": {
"enabled": true,
"surfaceDecision": true
}
}
}
Omitted fields use the defaults shown in the main configuration block above.
How it works
- The Judge (already running in the Actor/Judge loop) classifies the task
with an
executionRoutethat includestier,complexity,risk,confidence, and areason. - The router applies a preference adjustment (see below) and safety
floors — e.g.
architecture_changetasks always usefrontier. - Exactly one model is selected from the configured pool and passed to the sub-agent. No mid-run re-routing occurs.
- Route metadata is never injected into the sub-agent system prompt, preserving prompt-cache efficiency.
Settings
| Setting | Values | Default | Purpose |
|---|---|---|---|
enabled |
true / false |
false |
Turn adaptive routing on. |
preference |
preferCheap / balanced / preferBest |
balanced |
Cost-vs-quality bias. |
surfaceDecision |
true / false |
false |
Show the selected model in the [System] execution message. Routed decisions include tier; pinned/fallback decisions name their source. |
skillModelPolicy |
pin / ignore |
pin |
How SKILL.md model: interacts with routing. |
models.cheap |
model ID | anthropic/claude-3-5-haiku |
Model for read-only / low-complexity tasks. |
models.balanced |
model ID | anthropic/claude-3-5-sonnet |
Model for scoped edits and medium-complexity tasks. |
models.frontier |
model ID | anthropic/claude-3-7-sonnet |
Model for multi-file / architecture / high-risk tasks. |
All configured model IDs pass through mapModel(...) so defaultProvider
and allowedProviderPrefixes continue to apply.
Preference adjustment
| Preference | Behavior |
|---|---|
balanced |
Use the base tier chosen by the Judge. |
preferCheap |
Downgrade balanced → cheap when complexity is medium, risk is read_only or small_edit, and confidence ≥ 0.7. Never downgrades high-risk tasks. |
preferBest |
Upgrade cheap edit tasks to balanced. Upgrade unknown-risk tasks to frontier. |
Safety floors (always enforced)
Regardless of preference:
architecture_changealways routes tofrontier.multi_file_editnever routes belowbalanced.unknownrisk with confidence < 0.5 never routes belowbalanced.- If the Judge did not pass (Q&A was needed), the minimum tier is
balanced.
Skill model policy
| Policy | Behavior |
|---|---|
pin |
If SKILL.md has model:, use it exactly as before. Only tasks without a pinned model go through adaptive routing. Safest default. |
ignore |
Ignore SKILL.md model: and always route from the pool. Useful for team-level cost control. |
Missing pool entries
If the selected tier has no model configured, the router walks a fallback chain:
- Missing
cheap→balanced→frontier - Missing
balanced→frontier→cheap - Missing
frontier→balanced→cheap
If nothing in the pool resolves, the router falls back to legacy model
resolution so the skill still runs. When no minimum tier is active, this may
use SKILL.md model: through mapModel(skill.rawModel, ...). When the Judge
did not pass and the minimum tier is balanced, fallback uses the default
balanced model instead of a potentially cheaper pinned model.
Observability
When surfaceDecision=true, the execution message includes the selected model.
For routed decisions it includes the tier; for pinned or fallback decisions it
names the source instead of showing the neutral balanced sentinel:
[System]: ⚙️ Executing /skill:refactor using balanced model openrouter/anthropic/claude-3-5-sonnet. Reason: route balanced (complexity=medium, risk=small_edit, confidence=0.82)
[System]: ⚙️ Executing /skill:refactor using SKILL.md model openrouter/anthropic/claude-3-7-sonnet. Reason: skillModelPolicy=pin; SKILL.md model honoured.
[System]: ⚙️ Executing /skill:refactor using fallback model openrouter/anthropic/claude-3-5-sonnet. Reason: Execution-routing pool unavailable; used legacy default model resolution.
When surfaceDecision=false (default), the existing minimal message is shown:
[System]: ⚙️ Executing /skill:refactor (Auggie semantic retrieval running...)
Every skill execution emits a structured log via host.log:
{
"event": "auggie-router.execution-route",
"skill": "refactor",
"tier": "balanced",
"model": "openrouter/anthropic/claude-3-5-sonnet",
"source": "execution-routing",
"complexity": "medium",
"risk": "small_edit",
"confidence": 0.82,
"routeTier": "balanced",
"effectiveTier": "balanced"
}
No user prompt content, chat history, or secrets are logged.
Security-relevant settings
| Setting | Default | Purpose |
|---|---|---|
auggieBinPath |
"auggie" |
Absolute path to the auggie binary. Override to avoid $PATH lookup attacks in shared environments. |
allowedProviderPrefixes |
[] (allow all) |
Non-empty array restricts which provider prefixes a SKILL model: field may resolve to. E.g. ["openrouter"] blocks evil-provider/vendor/model. |
All numeric settings are validated within safe ranges; invalid values are silently
dropped and a warning is logged. See source (config.ts) for exact bounds.
Skill model: translation
The model: field in a skill's frontmatter is translated through
mapModel(rawModel, defaultProvider, allowedProviderPrefixes):
Frontmatter model |
Resolved gateway ID |
|---|---|
claude-3-7-sonnet |
openrouter/anthropic/claude-3-7-sonnet |
anthropic/claude-3-5-haiku |
openrouter/anthropic/claude-3-5-haiku |
openrouter/anthropic/claude-3-5-sonnet |
(unchanged — already fully qualified) |
| (missing) | openrouter/anthropic/claude-3-5-sonnet (fallback) |
When allowedProviderPrefixes is set (e.g. ["openrouter"]), a fully-qualified
model whose provider prefix isn't in the list throws DisallowedProviderError and
execution is aborted. This prevents a malicious SKILL.md from routing requests
to an untrusted provider.
Final-output sanitization
The router sanitizes the sub-agent's final text before posting it to the main
thread. This keeps internal tool traces, MCP envelopes, and runaway retrieval
dumps out of the user's chat history — which also keeps future historyWindow
slices clean.
| Setting | Default | Purpose |
|---|---|---|
outputSanitizer.enabled |
true |
Master switch. When false, sub-agent output passes through unchanged. |
outputSanitizer.finalOutputMaxChars |
120000 |
Hard cap on final answer characters. The truncation marker counts against the budget. Set to 0 to disable the cap. |
outputSanitizer.stripToolTraces |
true |
Remove fenced blocks labeled tool_use, tool_result, mcp, codebase-retrieval, auggie, scratchpad, internal (and -/_ variants), plus bare {"jsonrpc":...} / {"tool_use_id":...} / {"tool_call_id":...} lines. |
The sanitizer is conservative: legitimate ts, js, json, py, sh, etc.
fenced code blocks are preserved. A bare {"type":...} JSON line is not
considered a trace (too common in legitimate answers).
When the sanitizer removes or truncates anything, it emits a counts-only log:
{
"event": "auggie-router.output-sanitized",
"skill": "refactor",
"removedSections": 2,
"truncated": false,
"originalChars": 18204,
"finalChars": 17612
}
Removed content is never logged.
Auto-injected agent system prompt
Skills are registered as ordinary Pi commands — the main agent can see
them in its command palette — but out of the box the main agent has no
idea that /skill <name> is the preferred way to do focused work,
that /skill:<name> (colon form) silently falls through as plain text,
or that the extension bridge leaves the input unlocked while a skill
runs. Without those rules, the agent falls back to inline handling,
re-reads files into context, and occasionally types the colon form
thinking it's the canonical syntax.
To make the main agent use the router correctly without requiring every
user to maintain a hand-written APPEND_SYSTEM.md, the extension
versioned with this package installs a before_agent_start hook
that appends a ## pi-auggie-router block to the system prompt on
every turn. The block content lives in
src/agentPrompt.ts (compiled to
dist/agentPrompt.js) and ships with the package — when you run
npm update pi-auggie-router, the rules update automatically. No
user-side maintenance required.
The injected block teaches the agent:
- When to delegate to a skill vs. doing the work inline
- Correct invocation syntax (
/skill <name> <task>, never/skill:<name>) - Hard rules — don't pre-load files, don't re-execute the sub-agent's work, don't invoke the picker UI, don't invoke the trace commands on the user's behalf
- How to write a good task description (specific file path, desired outcome, constraints — these feed the Actor/Judge brief loop)
- The three bridge limitations to respect (input not locked, Q&A fallback broken, tool traces stripped)
- Failure handling — surface
[System Error]: ...lines verbatim, suggest concrete next steps, never silently retry
What the agent sees
The block is injected after Pi's default prompt is fully assembled, so the agent receives it in addition to whatever else Pi has loaded (AGENTS.md, context files, the user's own APPEND_SYSTEM.md, etc.). The block is identical on every turn — there's no per-session diversification, no rotation, no LLM-generated variations.
Versioning and updates
The block is a string constant in the package source. When the rules change:
- The string in
src/agentPrompt.tsis updated. - A new version of
pi-auggie-routeris published. - Users run
npm update pi-auggie-router(orpi update pi-auggie-router). - The next agent turn picks up the new rules. No restart, no config edit, no manual file sync.
Source of truth — and how to inspect it
The block is plain text in
src/agentPrompt.ts:
import { AGENT_PROMPT_BLOCK } from "pi-auggie-router";
console.log(AGENT_PROMPT_BLOCK);
Or after install, peek at the compiled file:
cat node_modules/pi-auggie-router/dist/agentPrompt.js | head -80
Disabling the injection
To opt out, set auggieRouter.promptInjection.enabled to false in
.pi/settings.json (workspace) or ~/.pi/settings.json (global):
{
"auggieRouter": {
"promptInjection": {
"enabled": false
}
}
}
With the hook disabled, the agent sees only the default Pi system
prompt. You can still teach it the router's conventions by writing
your own rules into .pi/APPEND_SYSTEM.md (project) or
~/.pi/agent/APPEND_SYSTEM.md (global).
Why not just write a ~/.pi/agent/APPEND_SYSTEM.md?
Two reasons:
- No user maintenance. A hand-written file drifts out of sync with the router's actual behavior as the package is upgraded. A versioned block in the package source never drifts.
- No duplication. Pi auto-loads
APPEND_SYSTEM.mdfiles. If we also injected the same content via the hook, the rules would appear twice in the system prompt. The hook is the single injection point.
If you want a visible file you can edit, the block source is
src/agentPrompt.ts — copy from there into
your own APPEND_SYSTEM.md and disable the hook. You now have a
hand-maintained copy that the router respects but does not duplicate.
Context budgets by execution tier
When contextBudgets.enabled is true, the sub-agent's Auggie overflow ceiling
is selected from a per-tier pool instead of the static top-level
overflowCeilingBytes. A low-risk read-only task gets a smaller payload window
than a high-risk architecture refactor. The selected ceiling is sticky for the
whole sub-agent run.
| Setting | Default | Purpose |
|---|---|---|
contextBudgets.enabled |
false |
Master switch. When false, the top-level overflowCeilingBytes is used as before. |
contextBudgets.overflowCeilingBytes.cheap |
15000 |
Ceiling for read-only / low-complexity work. |
contextBudgets.overflowCeilingBytes.balanced |
25000 |
Ceiling for scoped edits / medium-complexity work. |
contextBudgets.overflowCeilingBytes.frontier |
50000 |
Ceiling for multi-file / architecture / high-risk work. |
Tier selection rule. When adaptive execution routing produced the model
(selection.source === "execution-routing"), the model's tier drives the
budget — the actual runtime tier after preference + safety floors + pool
fallback. For pinned SKILL.md model: runs or legacy fallback, the Judge's
classification (route.tier) is used as the only meaningful signal; with
adaptive routing disabled this collapses to "balanced" and the budget is
effectively static.
Missing tier fallback. If a tier is omitted from the pool, the router uses
the top-level overflowCeilingBytes instead. A partial pool is intentional —
no implicit backfill from defaults.
When enabled, every run emits a structured log:
{
"event": "auggie-router.context-budget",
"skill": "refactor",
"tier": "cheap",
"overflowCeilingBytes": 15000,
"source": "tier"
}
source is one of "static" (disabled), "tier" (tier hit a configured
value), or "tier-fallback" (tier missing from pool, used top-level ceiling).
No prompts, history, or user content are logged.
Note: history/routing-prompt budgets are intentionally NOT tier-driven yet — history must be assembled before the Judge knows the tier. See the next section for the (separate) history-assembly knob.
Chat-history assembly
The Actor/Judge loop pulls recent messages via host.getRecentMessages(historyWindow).
With long sessions, the earliest goal-setting messages can fall out of the
window — increasing historyWindow solves that but bloats every routing call.
historyAssembly provides an explicit reducer between getRecentMessages and
brief construction. Two strategies:
| Strategy | Behaviour |
|---|---|
recent (default) |
Pass the host-provided window through unchanged. Legacy behaviour. |
headTail |
Keep the first headMessages and last tailMessages of the window. Drop the middle or replace it with an explicit marker. Apply per-message and total char caps. |
| Setting | Default | Purpose |
|---|---|---|
historyAssembly.strategy |
"recent" |
"recent" or "headTail". |
historyAssembly.headMessages |
2 |
Leading messages preserved (only used by headTail). |
historyAssembly.tailMessages |
12 |
Trailing messages preserved (only used by headTail). |
historyAssembly.middleMode |
"marker" |
"marker" inserts a [history-omitted-middle: N message(s), ~M chars] system message; "omit" drops the middle silently. |
historyAssembly.maxCharsPerMessage |
10000 |
Per-message char cap. 0 disables. |
historyAssembly.maxTotalChars |
60000 |
Total assembled char cap. 0 disables. |
Total-cap eviction order when content exceeds maxTotalChars:
- Any
[history-omitted-middle: …]marker is dropped first — it's already a placeholder for absent content, so losing it costs nothing real. - Interior messages are dropped from the geometric middle outwards. The first and last entries are preserved as anchors.
- If only the two anchors remain and the total still exceeds the cap, the last anchor's content is truncated to fit (including the truncation marker) within the cap.
Host limitation. The current PiHost API only exposes
getRecentMessages(N). "Head" therefore means the earliest entries inside
that window — not necessarily the true start of the session. A future host
API could surface session-start messages directly.
The router still applies the existing 10 000-char-per-message safety
truncation inside buildActorMessages / buildJudgeMessages. If the
assembler's maxCharsPerMessage already cut content, that pass is a no-op.
Prompt-prefix cache stability
The sub-agent system prompt is built deterministically from the skill instructions and an optional appendix only — no dynamic data (selected model, execution route, brief, user goal) enters the prefix. This invariant maximizes provider prompt-cache hit rate across repeated invocations of the same skill.
Use buildSubAgentSystemPrompt({ skillInstructions, appendix }) from the public
API to compute the same prefix in tests or tooling.
For regression detection, enable hash-only debug logging:
{
"auggieRouter": {
"debugPromptPrefixHash": true
}
}
When enabled, every sub-agent run emits:
{
"event": "auggie-router.prompt-prefix",
"skill": "refactor",
"sha256": "…64 hex chars…",
"bytes": 12345
}
Only the SHA-256 hash and byte length are logged — never the prompt text. A changing hash across runs of the same skill (with identical appendix) signals an accidental cache-busting regression.
Overflow context memory
By default, oversized Auggie codebase-retrieval payloads are blocked and the
sub-agent is told to refine its query. When contextMemory.enabled is true,
those oversized payloads are instead stored in an execution-scoped temp store
and the replacement message includes:
- an overflow handle such as
overflow_1, - the original byte size,
- a bounded head/tail preview.
During the same sub-agent run, the router attaches a small context-memory MCP
server with two tools:
| Tool | Purpose |
|---|---|
context-memory.list |
List stored overflow entries by metadata only. |
context-memory.read |
Read a bounded character slice for a known overflow handle. |
The MCP read surface is intentionally narrow: context-memory.read caps each
slice at 32 000 characters and accepts only generated handles matching
overflow_<n>. The temp store is disposed after the sub-agent resolves or
rejects, so there is no cross-run memory.
| Setting | Default | Purpose |
|---|---|---|
contextMemory.enabled |
false |
Master switch. When false, legacy overflow replacement text is used. |
contextMemory.maxEntries |
8 |
Maximum stored overflow payloads per skill execution. |
contextMemory.maxBytesPerRun |
1000000 |
Cumulative byte cap per skill execution. |
contextMemory.previewHeadChars |
4000 |
Characters from the beginning included in the replacement preview. |
contextMemory.previewTailChars |
4000 |
Characters from the end included in the replacement preview. |
Parallel sub-agent runner API
runParallelSubagents(...) is exported for hosts or advanced integrations that
want to split an already-known task into explicit independent subtasks. The
main /skill router does not automatically decompose user requests.
The feature is disabled by default and refuses to run unless
parallelSubagents.enabled=true. Each worker sub-agent receives the same stable
skill system prompt plus one compact subtask brief in its user prompt, its own
Auggie MCP stack, and isolated context-memory plumbing when enabled. Worker
outputs are capped before deterministic synthesis, so no extra LLM call is
needed to combine results.
| Setting | Default | Purpose |
|---|---|---|
parallelSubagents.enabled |
false |
Master switch for the explicit runner API. |
parallelSubagents.maxSubagents |
3 |
Maximum concurrent workers allowed by settings. Caller overrides are bounded by this cap. |
parallelSubagents.perWorkerOutputCharCap |
8000 |
Default cap for each worker's final text. 0 disables the cap. |
Execution trace persistence
When executionTrace.enabled is true (the default), the router captures a
full transcript of every sub-agent execution and persists it to
.pi/traces/<skillName>_<timestamp>.json. Each trace contains:
- Skill metadata: name, model, brief, execution route
- Tool calls: server name, tool name, args, result preview (capped at
maxResultPreviewChars), blocked flag, timestamp - Outcome: final text and stopped reason
This data powers trace observability for skill debugging (see
docs/PRD-trace-observability.md):
classifying trace outcomes, detecting skill degradation, and surfacing
actionable reports to users.
| Setting | Default | Purpose |
|---|---|---|
executionTrace.enabled |
true |
Master switch for trace capture and persistence. |
executionTrace.maxResultPreviewChars |
2000 |
Max characters kept from each tool result in the trace. Full payloads are not stored (could be megabytes of codebase content). |
executionTrace.traceDirectory |
".pi/traces" |
Directory (relative to workspace root) where trace JSON files are stored. |
Old traces are automatically cleaned up after each run: count-based retention
keeps at most maxTracesPerSkill (default 20) trace files per skill, deleting
the oldest first. Use cleanupTraces(dir, opts) from the public API to trigger
cleanup manually.
Trace Observability
When traceObservability.enabled is true (the default), the router adds a
lightweight observability layer on top of execution traces:
classifying outcomes, detecting degradation, and surfacing actionable reports.
Commands
| Command | Purpose |
|---|---|
/skill:trace-report <name> |
Show a summary report for a skill's recent execution history — outcome distribution, common failure signals, trend line, and recent traces. |
/skill:trace-view <filename> |
Show a detailed tool-call timeline for a single trace file — timestamps, args, result sizes, duration, and final text. |
These commands are intercepted before Pi's default /skill handler runs.
Settings
| Setting | Default | Purpose |
|---|---|---|
traceObservability.enabled |
true |
Master switch for classification, degradation alerts, and reports. |
traceObservability.showReportAfterExecution |
false |
Show a compact mini-report (last 3 traces) after every skill execution. |
traceObservability.degradationAlertEnabled |
true |
Emit a system message when a skill fails N consecutive times after prior success. |
traceObservability.degradationConsecutiveFailures |
3 |
Consecutive failures required to trigger a degradation alert. |
traceObservability.degradationAlertCooldownHours |
24 |
Minimum hours between repeated alerts for the same skill. |
traceObservability.reportMaxTraces |
10 |
Maximum traces loaded and classified for on-demand reports. |
traceObservability.reportMaxInlineTraces |
5 |
Maximum recent traces shown inline; larger datasets truncate with a count. |
traceObservability.regressionWindowSize |
10 |
Number of historical traces examined for regression detection. |
traceObservability.maxTracesPerSkill |
20 |
Maximum trace files retained per skill (count-based cleanup, oldest deleted first). |
Skip-Judge mode
Setting maxJudgeIterations to 0 skips the Judge entirely. The Actor
produces a brief, the rubric auto-passes, and the default execution route
is used. This eliminates the verification overhead for simple or
well-known skills:
{
"auggieRouter": {
"maxJudgeIterations": 0
}
}
Execution flow
- Intercept —
onUserInputmatches/skill:trace-view,/skill:trace-report, then^/skill:([a-zA-Z0-9_-]+), swallows the input, and prevents Pi's default skill handler from running. - Locate & parse — looks for
SKILL.mdin.pi/skills/<name>/first, then~/.pi/agent/skills/<name>/. Frontmatter is parsed withgray-matter; onlymodel:is honoured. - 2-pass Actor/Judge loop — drafts a
{userGoal, constraints, knownContext}brief, scores it against a binary rubric, rewrites once if any boolean isfalse. Hard cap = 2 passes. WhenmaxJudgeIterations=0, the Judge is skipped entirely (Actor only, auto-pass). - Q&A fallback — if the second pass still fails, the Judge's
missingRequirementQuestionis posted to the user. The next typed message is intercepted viaonBeforeMessage, appended to the brief as a clarification, and execution resumes. - Auggie pre-flight —
auggie account statusis spawned silently. Any non-zero exit aborts with[System Error]: Cannot execute skill. Augment daemon is offline or unauthenticated. - Execution model selection — the router computes the sub-agent model.
When adaptive routing is disabled (default), this is the legacy
mapModel(skill.rawModel, ...)path. When enabled, the Judge'sexecutionRouteis combined with preference, safety floors, and the configured pool to select exactly one model. The selection is sticky for the entire run and never injected into the sub-agent prompt. - Context budget selection — if
contextBudgets.enabled=true, the router chooses a per-tier overflow ceiling for the run. Otherwise it uses the static top-leveloverflowCeilingBytes. - Sub-agent execution — the input editor is locked, a
[System]: ⚙️ Executing …marker is posted, and an isolated Pi sub-agent runs attemperature: 0.0with theauggieMCP attached over stdio. IfcontextMemory.enabled=true, the execution-scopedcontext-memoryMCP is attached too. IfexecutionTrace.enabled=true, a trace middleware captures every tool call. The sub-agent's prompt is appended with theAUGGIE_DIRECTIVE: "Use thecodebase-retrievalMCP tool for workspace context." Structured route, context-budget, and optional prompt-prefix logs are emitted at this point. - Overflow middleware — every oversized
auggie/codebase-retrievalresponse is blocked. With context memory disabled, it is replaced with"Result too large. Please refine your codebase-retrieval query to be more specific."With context memory enabled, the payload is stored execution-locally and the replacement includes an overflow handle plus bounded preview. - Resolution — final sub-agent text is sanitized according to
outputSanitizer, posted to the main thread, the editor is unlocked, and the state machine resets toidle. - Trace persistence — if
executionTrace.enabled, the trace store is finalized with the sub-agent's output and persisted to.pi/traces/. Old traces are cleaned up by count-based per-skill retention (keep the newestmaxTracesPerSkill, default 20; oldest deleted first). A structuredauggie-router.execution-tracelog event is emitted.
State machine
idle ──/skill:──▶ evaluating ──pass──▶ executing ──done──▶ idle
│
└─fail×2─▶ waitingForUser ──answer──▶ executing
Only one skill can be in flight at a time. New /skill: commands while busy
get a [System]: Router busy warning.
Operational defaults
| Knob | Default | Why |
|---|---|---|
| Routing engine | anthropic/claude-3-5-haiku |
Cheap and Anthropic-aligned for routing. |
| History window | 20 messages | Enough for context, not enough to drown the brief. |
| Total timeout | 300 s | Hard kill prevents runaway billing. |
| MCP inactivity timeout | 60 s | Stops OpenRouter loops when a model hangs. |
| Sub-agent temperature | 0.0 | Mandatory for rigid tool usage. |
| Overflow ceiling | 25 000 B | Forces query refinement, not context dumping. |
| Auggie binary path | "auggie" |
Relies on $PATH by default; override for security. |
| Allowed provider prefixes | [] (allow all) |
Restrict to known providers to prevent model redirection. |
| Adaptive routing | disabled | Backwards-compatible opt-in. |
| Adaptive preference | balanced |
Neutral cost-quality bias. |
| Skill model policy | pin |
Preserve existing SKILL.md model: behavior. |
| Surface routing decision | false |
Keep default UI minimal. |
| Output sanitizer | enabled | Keeps tool traces out of the main chat. |
| Context budgets | disabled | Static overflow ceiling unless explicitly enabled. |
| History assembly | recent |
Preserve legacy history behavior by default. |
| Context memory | disabled | Legacy overflow replacement unless opted in. |
| Parallel sub-agents | disabled | Explicit advanced API only. |
Security model
Trust boundary: workspace filesystem
pi-auggie-router loads SKILL.md files from the workspace (.pi/skills/)
and the user's home directory (~/.pi/agent/skills/). The markdown body of
a skill file is injected verbatim into LLM prompts (routing model and
sub-agent). This means:
- Any process that can write to
.pi/skills/*/SKILL.mdeffectively controls the sub-agent's system prompt (prompt injection via filesystem). - A malicious
model:value in SKILL.md frontmatter can redirect execution to a different provider. UseallowedProviderPrefixesto restrict this. - Do not commit SKILL.md files from untrusted sources without review.
Data sent to LLM providers
The routing model (claude-3-5-haiku by default) sees:
- The skill's markdown instructions.
- The last
historyWindowchat messages (truncated to 10 000 chars each). - Actor/Judge JSON payloads.
- Judge routing metadata requests/outputs, including
executionRoute(tier,complexity,risk,confidence,reason).
The sub-agent does not receive route metadata in its system or user prompt;
route decisions are surfaced only through host system messages and structured
logs. If your chat may contain secrets, point routingModel at a self-hosted
gateway or reduce historyWindow.
Path resolution
Skill names are validated against [a-zA-Z0-9_-]+ — no dots, slashes, or
path traversal sequences. Error messages omit filesystem paths to prevent
information leakage.
Sub-process spawning
The router spawns the auggie binary for pre-flight checks and as an MCP
server. By default it relies on $PATH lookup; set auggieBinPath to an
absolute path to eliminate this attack surface. stderr from auggie account status
is redacted for common secret patterns (API keys, Bearer tokens, hex strings)
before being surfaced in the UI.
Development
npm install
npm run build # compile to dist/
npm run lint # tsc --noEmit
npm test # node --test via tsx loader
License
MIT