@callumvass/forgeflow-dev
Dev pipeline for Pi — TDD implementation, code review, architecture, and Datadog investigations.
Package details
Install @callumvass/forgeflow-dev from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:@callumvass/forgeflow-dev- Package
@callumvass/forgeflow-dev- Version
0.36.0- Published
- Apr 12, 2026
- Downloads
- 7,261/mo · 128/wk
- Author
- callumvass
- License
- MIT
- Types
- extension, skill
- Size
- 1.1 MB
- Dependencies
- 0 dependencies · 5 peers
Pi manifest JSON
{
"extensions": [
"./extensions"
],
"skills": [
"./skills"
],
"agents": [
"./agents"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
@callumvass/forgeflow-dev
Development commands for Pi. Use this package when you already have implementation-ready issues and want forgeflow to plan, build, review, and open PRs.
Install
npx pi install @callumvass/forgeflow-dev
Start here
Implement one issue
/implement 42
Implement all auto-generated issues
/implement-all
This works best after the PM package has already produced good greenfield issues with a clear technical direction — usually one explicit initial scaffold/bootstrap issue followed by dependent feature slices.
Commands
| Command | What it does |
|---|---|
/implement |
Implement one issue using plan → TDD → refactor → review |
/implement-last |
Repeat the last /implement target and flags |
/implement-all |
Loop through all open auto-generated and architecture issues |
/review |
Review a PR or branch: blocking defects plus advisory architecture and refactor notes |
/review-last |
Repeat the last /review target and strictness |
/review-current-pr |
Review the current PR without typing its number |
/review-lite |
Review a PR or branch in strict mode: blocking defects only |
/architecture |
Analyse the repo for structural friction and create RFC issues |
/skill-scan |
Scan common skill locations and explain which installed skills forgeflow would recommend |
/skill-recommend |
Recommend missing skills from skills.sh that fit the current repo and stage |
/atlassian-login |
Authenticate to an OAuth-enabled Atlassian MCP server |
/atlassian-status |
Show Atlassian MCP auth status |
/atlassian-logout |
Remove stored Atlassian MCP credentials |
/atlassian-read |
Read a Jira issue or Confluence page by URL |
/datadog-login |
Authenticate to an OAuth-enabled Datadog MCP server |
/datadog-status |
Show Datadog MCP auth status |
/datadog-logout |
Remove stored Datadog MCP credentials |
/datadog |
Resolve a Lambda from repo code and run a Datadog MCP investigation |
What the dev pipeline expects
Forgeflow-dev assumes the issue already names the intended user-visible slice and any project-shaping stack decisions.
On greenfield projects, that usually means the issue already tells the agent things like:
- framework/runtime
- testing baseline
- auth/provider choice if relevant
- persistence approach if relevant
The planner and implementor now treat those choices as binding rather than improvising a different stack.
Pi UX improvements
The Pi extension now exposes a more guided workflow around the core dev pipelines.
Highlights:
/implement,/review, and/review-liteopen guided pickers when you run them without args/implement-last,/review-last, and/review-current-prcover the common repeat flows- live runs show richer status, footer, elapsed time, stage descriptions, and cost
/stagesandCtrl+Shift+Sopen a drill-down view of the latest pipeline- completed
/implementand/reviewruns offer follow-up actions such as opening the PR, copying output into the editor, or queueing the next review command
These UX helpers do not change /implement-all behaviour. It still runs autonomously once started.
/implement in plain English
/implement runs a build chain, then a review chain.
Build chain
- planner — explores the codebase and writes a sequenced test plan
- architecture-reviewer — checks for obvious structural drift
- implementor — implements with strict TDD
- refactorer — cleans up once behaviour is in place
Review chain
- code-reviewer — cold-start review of the diff
- review-judge — validates findings
- fix-findings — fixes accepted findings when needed
At the end, forgeflow opens or updates a PR for human review.
/review in plain English
Standalone /review is broader than the review chain inside /implement.
- code-reviewer — cold-start defect review of the diff
- review-judge — validates blocking findings
- architecture-reviewer — advisory architecture and boundary pass on the changed area
- refactor-reviewer — advisory refactor pass on the changed area
If you are reviewing a GitHub PR in the UI, forgeflow can draft gh review comments for the blocking findings, then let you edit, approve, or skip them before anything is posted.
If you want the old narrow behaviour, use /review-lite or /review --strict. That runs only the blocking review chain (code-reviewer → review-judge) and skips the advisory architecture/refactor passes.
/implement-all in plain English
/implement-all loops through all open auto-generated and architecture issues in dependency order.
On greenfield repos, that usually means it should take an explicit scaffold/bootstrap issue first, then move onto the dependent feature slices.
For each issue it:
- plans
- implements
- reviews
- opens/updates a PR
- waits for CI
- fixes failed CI up to three times
- merges if successful
Boundary-aware implementation
Forgeflow prefers feature-oriented structure over flat-root sprawl. Each issue should name one owning boundary, and the planner/implementor treat that as a gate.
That means:
- prefer extending an existing feature/domain folder
- create a small public entry point when a new boundary is needed (
index.ts,__init__.py,routes.rb, or equivalent) - treat generic roots such as
src/,app/,server/,client/,test/, andtests/as roots, not owning boundaries - avoid
utils/,helpers/,misc/, andlib/junk drawers - block if one issue really spans multiple owning boundaries
On greenfield repos, the first slice should establish a real feature/domain boundary beneath the broad source root so /implement-all does not normalise flat sibling files as the project grows.
Library and framework behaviour
On greenfield work, forgeflow should not invent bespoke framework/auth/test plumbing when the issue already chose a direction.
Examples:
- if the issue says Vue/Nuxt, do not hand-build DOM strings
- if the issue says Phoenix LiveView, do not switch to a JS-heavy UI stack
- if the issue says Clerk or ASP.NET Core Identity, do not replace it with custom auth code
If a necessary project-shaping choice is missing and the repo does not already establish one, the implementor should block instead of making an arbitrary decision.
Auto skill loading
Forgeflow now shortlists relevant skills before /implement, /implement-all, /review, /review-lite, and /architecture.
It reads skills in place from common agent locations rather than asking teams to migrate them into a forgeflow-specific directory.
Discovery covers common local/global roots such as:
.agents/skills/.pi/skills/.claude/skills/.copilot/skills/.codex/skills/.opencode/skills/- package-local
pi.skillsroots declared in workspacepackage.jsonfiles (for examplepackages/dev/skills/) ~/.agents/skills/~/.pi/agent/skills/~/.claude/skills/~/.copilot/skills/~/.codex/skills/~/.opencode/skills/
Selection is repo-aware. Forgeflow looks at things like:
- workspace/package dependencies in
package.json - .NET solutions and project files (
.sln,*.csproj) mix.exs,pyproject.toml,go.mod,Cargo.toml- changed files for review runs
Use /skill-scan to inspect what was discovered and which skills forgeflow would use at each stage. The default output is now concise and human-readable; add --verbose when you want scanned roots, signals, collisions, and other diagnostics. After the deterministic scan, forgeflow runs a strict skill-judge pass that rejects weak matches driven only by broad stack overlap.
Use /skill-recommend to look beyond the repo and query skills.sh for missing skills worth adding. By default it now considers all supported stages; pass --for <stage> when you want to narrow the recommendation to one stage. The default output is a concise shortlist; add --verbose when you want queries, roots, signals, and diagnostics. Forgeflow generates repo-aware search queries from the detected stack, dedupes the results locally, enriches shortlisted external candidates with descriptions from skills add --list, ranks them against the same repo signals it uses for auto-loading, then lets the same strict skill-judge distil the final list down to genuinely relevant installs. Recommendations include install commands so you can add the skill and let future forgeflow runs auto-load it. The human-readable report now puts top installs first and, if a discovered local skill is malformed, diagnostics include the exact SKILL.md path so you can fix the offending file directly.
Examples:
/skill-scan
/skill-scan --verbose
/skill-recommend
/skill-recommend --for review --branch feat/ui
/skill-recommend --for architecture --path apps/web --limit 5 --verbose
Configuration
Optional config lives in:
- project:
.forgeflow.json - global:
~/.pi/agent/forgeflow.json
Example:
{
"agents": {
"planner": { "thinkingLevel": "high" },
"implementor": { "thinkingLevel": "medium" },
"code-reviewer": { "thinkingLevel": "high" },
"review-judge": { "thinkingLevel": "high" }
},
"skills": {
"enabled": true,
"extraPaths": ["~/company-shared-skills", "./tools/skills"],
"maxSelected": 4
},
"sessions": {
"persist": true,
"archiveRuns": 20,
"archiveMaxAge": 30
}
}
Valid thinkingLevel values: off, minimal, low, medium, high, xhigh.
Set sessions.persist to false for sensitive projects where you do not want sub-agent session files written to disk.
Sessions
Sub-agent stages now run through the Pi SDK rather than by spawning nested Pi processes.
Forgeflow still persists deterministic stage sessions under .forgeflow/run/<runId>/, so runs stay resumable and fork lineage stays inspectable on disk.
Each persisted stage appends a hidden handoff note with its final output plus the main files, searches, edits, and shell commands it used. Downstream forked stages inherit that note and are prompted to use it before re-reading the same files.
Successful runs are archived; failed runs are left in place for inspection.
Atlassian MCP
With Atlassian MCP configured, you can:
- implement a Jira issue directly:
/implement PROJ-123 - read a Jira issue or Confluence page:
/atlassian-read <url> - check auth state:
/atlassian-status - clear stored auth:
/atlassian-logout
Environment variables:
export ATLASSIAN_MCP_URL=https://your-atlassian-mcp.example.com/mcp
export ATLASSIAN_MCP_REDIRECT_URI=http://127.0.0.1:33389/callback
# Optional when your MCP server requires a pre-registered OAuth client
export ATLASSIAN_MCP_CLIENT_ID=...
export ATLASSIAN_MCP_CLIENT_SECRET=...
# Optional OAuth scope override
export ATLASSIAN_MCP_SCOPE="read:jira-work read:confluence-content.all"
# Optional site hint for Jira URL generation, multi-site setups,
# and MCP servers that expose separate Jira/Confluence cloud resources for one site
export ATLASSIAN_URL=https://yourcompany.atlassian.net
Then run:
/atlassian-login
/atlassian-status
If the browser callback completes but you need to retry straight away, you can rerun /atlassian-login immediately without waiting for the local OAuth callback port to drain.
Datadog MCP
Forgeflow can route freeform runtime investigations through an OAuth-enabled Datadog MCP server.
Environment variables:
export DATADOG_MCP_URL=https://your-datadog-mcp.example.com/mcp
export DATADOG_MCP_REDIRECT_URI=http://127.0.0.1:33390/callback
# Optional when your MCP server requires a pre-registered OAuth client
export DATADOG_MCP_CLIENT_ID=...
export DATADOG_MCP_CLIENT_SECRET=...
# Optional OAuth scope override
export DATADOG_MCP_SCOPE="metrics logs traces"
Then run:
/datadog-login
/datadog "investigate why the billing lambda is slow in prod"
/datadog "give me p50 p95 and p99 for the billing lambda over the last 24h"
If the browser callback completes but you need to retry straight away, you can rerun /datadog-login immediately without waiting for the local OAuth callback port to drain.
Forgeflow works best with a Datadog MCP server that exposes metric query, metric search/context, and log search tools. The /datadog pipeline now discovers likely metric names and tag keys before reporting no data, rather than assuming only AWS Lambda runtime metric shapes. It also understands Datadog MCP text responses that wrap results in <JSON_DATA> / <YAML_DATA> blocks.
See also
- Root README:
../../README.md - PM package:
../pm/README.md