pi-ultra-compact

Advanced compaction extension and skill for Pi with automatic threshold-based compaction and critical context preservation

Packages

Package details

extensionskill

Install pi-ultra-compact from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:pi-ultra-compact

Package: pi-ultra-compact
Version: 0.8.0
Published: Jun 17, 2026
Downloads: 2,428/mo · 2,428/wk
Author: realvendex
License: MIT
Types: extension, skill
Size: 87.3 KB
Dependencies: 0 dependencies · 3 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions"
  ],
  "skills": [
    "./skills"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-ultra-compact

Advanced compaction extension and skill for Pi with automatic threshold-based compaction and support for 200+ models.

Features

/ultracompact command for manual compaction
Auto-adapts threshold to model's context window (60-80% of max)
200+ models supported - OpenAI, Anthropic, Google, DeepSeek, Meta, Mistral, Qwen, and more
Graduated Eviction (4 levels) — strips reasoning, bulk outputs, artifacts, then messages
Generational Compaction — micro (fast, no LLM) at 60-90%, full at 90%+
Preemptive Trigger — fires before next turn, never pays latency during user turns
Cache-Aware Compaction — immutable summary blocks keep prompt cache warm
Circuit Breaker — 3 strikes → lossy truncation fallback, session never dies
Hierarchical summarization with entropy-based information extraction
Critical context preservation - goals, decisions, errors, file paths
Extension + Skill - works as both a Pi extension and a skill
Smart model switching - remembers per-model thresholds and preserves custom settings
Conversation structure detection - identifies turns, phases, and progress
Multi-pass summarization — progressive compression with quality scoring
LLM-based summarization — optional AI-powered compression (useLLM config)
Content-aware token counting — dynamic ratios for code, prose, and whitespace
Compact section templates — shorter headers, condensed formatting, saves 10-15% more tokens

Installation

pi install npm:pi-ultra-compact

Quick Start

After installation and restarting Pi, use:

/ultracompact

This triggers manual ultra-compact compaction.

Auto-compaction triggers automatically when context exceeds 80% of your model's context window.

Supported Models

Provider	Models	Context Window
OpenAI	GPT-5/5.1/5.2, GPT-4.1, GPT-4o, O3, O4-mini	8K - 1M tokens
Anthropic	Claude 4.5/4.0/3.7/3.5/3	200K tokens
Google	Gemini 2.5/2.0/1.5, Gemma 3/2	32K - 2M tokens
DeepSeek	V4 Pro, V3, V2.5, R1	64K - 1M tokens
Meta	Llama 4, 3.3, 3.1, 3, 2	4K - 1M tokens
Mistral	Medium 3.5, Large 3, Small 4, Codestral	32K - 256K tokens
Qwen	Qwen3, Qwen2.5, Qwen2	32K - 128K tokens
Microsoft	Phi-4, Phi-3, Phi-2	2K - 32K tokens
xAI	Grok 3, Grok 2	8K - 131K tokens
Cohere	Command R+	128K tokens
Yi	Yi-1.5, Yi-34B	4K - 200K tokens

How It Works

Three-Tier System

Preemptive check (every turn): Projects next turn's token usage. If projected > 60% of context, triggers micro-compaction.
Micro-compaction (60-90% usage): Strips reasoning blocks + bulk tool outputs. No LLM call. Runs in microseconds.
Full compaction (90%+ usage): Graduated eviction preconditions the input, then structured summarization produces the final compacted context.

Eviction Levels

Level	What it strips	When
1	Assistant thinking/reasoning blocks	Always (harmless removal)
2	Bulk tool outputs (>100 lines, >5K chars)	Most sessions
3	All non-error tool results	Heavy sessions
4	Oldest non-protected messages	Only when necessary

Safety Systems

Snapshot-rollback: Messages are deep-copied before compaction. If anything fails, the original is preserved.
Circuit breaker: After 3 consecutive failures, falls back to lossy truncation (keep system + last 10 turns).
User messages inviolable: Never stripped regardless of token pressure.
Cache-aware mode: Previous summaries stay immutable — only new content pays prefill cost.

Configuration

Default settings work out of the box. The extension auto-detects your model and sets appropriate thresholds.

Default Settings

Setting	Default	Description
`thresholdTokens`	Auto (60-80% of context)	When to trigger compaction
`keepPercentage`	30%	Percentage of context to keep
`maxKeepTokens`	30,000	Maximum tokens to keep
`autoCompact`	true	Enable automatic compaction
`cacheAware`	false	Immutable summary blocks (saves API costs)
`maxEvictionLevel`	FULL_REMOVAL	Max eviction aggressiveness
`outputHeadroom`	4,096	Tokens reserved for LLM response
`circuitBreakerMaxFailures`	3	Failures before lossy truncation
`preemptiveWatermark`	0.70	Preemptive trigger level
`hardWatermark`	0.95	Reactive fallback level

Commands

Command	Description
`/ultracompact`	Trigger manual ultra-compact compaction

Model Examples

# Works with any model - threshold auto-adapts
# Claude Opus: 160,000 tokens (80% of 200K)
# GPT-5: 320,000 tokens (80% of 400K)
# Gemini 2.5 Pro: 800,000 tokens (80% of 1M)
# DeepSeek V4 Pro: 800,000 tokens (80% of 1M)

Compatibility

Works with any Pi-compatible model
Compatible with gentle-engram (Engram memory backup)
Compatible with gentle-pi (SDD/OpenSpec)
No conflicts with Pi's default compaction

Changelog

See CHANGELOG.md for full version history.

v0.8.0 - Generational Compaction + Safety Systems

Graduated Eviction — 4-level content stripping (reasoning → bulk → artifacts → full)
Generational Compaction — micro (60-90%, no LLM) + full (90%+) tiers
Preemptive Trigger — fires at 70% watermark by projecting next turn
Cache-Aware Mode — immutable summary blocks preserve prompt cache
Snapshot-Rollback + Circuit Breaker — session never dies from bad compaction
66 tests, 100% pass rate — zero regressions

v0.7.0 - Compact Templates & LLM Summarization

Compact section templates - shorter headers save 10-15% tokens across all conversations
LLM-based summarization - optional LLM-powered semantic compression
Content-aware token estimation - dynamic ratios for code/prose/whitespace
66 tests, 100% pass rate - including 13 new effectiveness benchmarks
generateSummary is now async - supports LLM callback integration

v0.6.0 - Algorithm Enhancement Release

Major improvements to compaction quality and performance:

Smart model switching - per-model threshold memory, preserves custom settings
Conversation structure detection - identifies turns, phases, progress
Enhanced critical extraction - progress indicators, questions, user preferences
Multi-pass summarization - 3-pass compression with quality scoring
Token estimation cache - LRU cache for 3x faster performance
100% test pass rate - 43 unit tests + 17 performance benchmarks

v0.5.0 - Audit & Stability Release

This release fixes 18 issues found via comprehensive 5-agent audit:

3 Critical regex bugs fixed - \b word boundaries on all patterns, no more false matches
Startup model detection fixed - correct threshold from boot
Custom thresholds preserved - across model switches
Null safety - guards on all message-consuming methods
53-test Jest suite - comprehensive coverage
Dead code removed - 329-line .disabled file deleted, unused typebox dep removed

Troubleshooting

Extension not loading

Restart Pi after installation
Check pi install npm:pi-ultra-compact completed successfully

Wrong threshold detected

The extension auto-detects your model from Pi config
Ensure your model is in the supported list (200+ models)
Run /ultracompact manually to see detected model and threshold in the logs

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Pi - The AI coding agent