thincontext
Drop-in middleware to compress LLM context before it hits the API
Package details
Install thincontext from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:thincontext- Package
thincontext- Version
1.0.3- Published
- Apr 7, 2026
- Downloads
- 298/mo · 7/wk
- Author
- omarpa
- License
- MIT
- Types
- extension
- Size
- 156.3 KB
- Dependencies
- 1 dependency · 3 peers
Pi manifest JSON
{
"extensions": [
"./extensions/pi"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
thincontext
Drop-in TypeScript middleware that compresses LLM context before it hits the API.
Every agent re-sends the same file reads, system prompts, and tool outputs on every turn. Thincontext sits in the middle and removes the redundancy — transparently, without changing your message format.
Agent → ContextCompressor.compress(messages) → LLM API
Node.js ≥ 18 · TypeScript · ESM + CJS
Install
npm install thincontext
Quick start
import { ContextCompressor } from 'thincontext'
const compressor = new ContextCompressor()
const { messages, stats } = await compressor.compress(myMessages)
console.log(`${stats.savedTokens} tokens saved (${((1 - stats.compressionRatio) * 100).toFixed(1)}%)`)
Zero configuration is needed for the default hash-based dedup behaviour. Add embed and summarize to unlock the full pipeline.
What it does
Five compression stages can run in sequence on every compress() call:
| Stage | What it does | Requires |
|---|---|---|
| Summarizer | Decays old conversation turns: verbatim → summary → dropped | summarize fn |
| Deduplicator | Skips system/tool content the LLM already saw this session |
nothing (hash) or embed fn (semantic) |
| Chunker | Extracts only relevant lines from large code/document context | embed fn |
| ReferenceCompressor | Replaces repeated large blocks with short [ref:...] tokens |
nothing |
| BudgetManager | Drops lowest-priority messages to fit a hard token budget | nothing |
Each module only activates when its dependencies are provided — the compressor degrades gracefully.
In practice
In a typical coding agent session where the same files are read across multiple turns:
- first read of a large file: content is normalised and passed through
- subsequent turns: the full file content can be replaced with a short reference or duplicate marker
- older conversation turns (if
summarizeis configured): progressively compressed to short summaries, then dropped
In real Pi testing, thincontext produced meaningful savings on repeated tool-heavy turns, but not on every turn.
Important: savings are opportunistic, not guaranteed
Thincontext does not guarantee token savings on every turn.
A Pi footer like:
🗜 -0% chars
can be completely normal even when the extension is installed and working.
Helps most when
- the agent reads the same files repeatedly across turns
- the agent produces the same or very similar tool output multiple times
- there are large tool results that exceed the truncation limit
- repeated outputs are old enough to pass the dedup window
Helps less when
- most output is new and unique
- the session is dominated by fresh writes/edits
- the repeated content is still too recent to deduplicate
- tool outputs are already short
- protected modification history must remain visible
Why you may see 0% savings
Some turns are mostly made of:
- one-off
bashoutput - fresh
readresults - recent
edit/writeoperations - unique install logs or error logs
In those cases, thincontext may correctly decide that there is little or nothing safe to compress.
Options
new ContextCompressor({
budget: 8000,
embed: myEmbedFn,
summarize: mySummarizeFn,
countTokens: myTokenFn,
dedup: {
strategy: 'hash',
threshold: 0.92,
maxVectors: 5000,
},
summarization: {
keepLastFull: 5,
summarizeBeyond: 10,
},
chunking: {
maxLines: 50,
contextLines: 5,
minLines: 100,
},
})
Adapters
Adapters ship as separate entrypoints — zero impact on the core bundle if unused.
Embedding
import { openaiEmbed } from 'thincontext/embeddings/openai'
import { localEmbed } from 'thincontext/embeddings/local'
const compressor = new ContextCompressor({
embed: openaiEmbed({ apiKey: process.env.OPENAI_API_KEY! }),
// or: embed: await localEmbed()
})
Summarization
import { anthropicSummarize } from 'thincontext/summarize/anthropic'
import { openaiSummarize } from 'thincontext/summarize/openai'
const compressor = new ContextCompressor({
summarize: anthropicSummarize({ apiKey: process.env.ANTHROPIC_API_KEY! }),
})
Message conversion
import { fromOpenAI } from 'thincontext/adapters/openai'
import { fromAnthropic } from 'thincontext/adapters/anthropic'
Message priorities
Tag messages to control how BudgetManager handles token pressure:
const messages = [
{ role: 'system', content: 'You are...', priority: 'critical' },
{ role: 'user', content: ragChunk, priority: 'low' },
{ role: 'assistant', content: lastReply, priority: 'high' },
]
Priorities: 'critical' · 'high' · 'normal' · 'low'
Session persistence
State (seen hashes, summary cache, ref table) lives in memory and survives across compress() calls.
const snapshot = compressor.export()
const compressor2 = ContextCompressor.restore(snapshot, { budget: 8000 })
Integrations
Pi agent
Install as a Pi package — the extension is bundled inside the npm package:
pi install npm:thincontext
Or add to your ~/.pi/agent/settings.json:
{
"packages": ["npm:thincontext"]
}
The extension hooks Pi's context event to compress messages before every LLM call, with tool result deduplication and a live footer:
🗜 -72% chars
Commands inside Pi:
/thincontext on|off|reset|budget <n>|lines <n>|dedup-after <turns>|debug
Pi-specific notes
Current defaults are conservative:
maxToolLines = 300dedupAfterTurns = 2- recent
edit/writetool results are protected from budget dropping
Known limitations:
bashwrites such assed -iorecho > fileare not reliably detected as modification records- truncation can hide important information that appears late in very long output
- token estimates shown by the extension are approximate; Pi's own usage counters are more trustworthy
- a given turn may show no savings even when the extension is working correctly
Claude Code
No context interception hook exists in Claude Code's interactive CLI — there is no equivalent to Pi's context event that fires before each LLM call.
The thincontext library still works for custom SDK/wrapper workflows, but a true drop-in Claude Code CLI plugin equivalent to the Pi extension is not currently possible with the available integration surface.
Token counting for Claude
cl100k_base is GPT-4's tokenizer. For Claude, expect some variance. See docs/token-counting.md for custom counter guidance.
What this is not
- not an LLM proxy
- not a RAG system
- not model-specific
- not a browser library
Publishing
The repo includes a GitLab pipeline that:
- runs typecheck/tests on pushes
- publishes to npm on version tags like
v1.0.0
After publish, users can install with:
npm install thincontext
or in Pi:
pi install npm:thincontext
Development
npm ci
npm run typecheck
npm test
npm run build
License
MIT