thincontext

Drop-in middleware to compress LLM context before it hits the API

Package details

← Back

extension

Install thincontext from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:thincontext

Package: thincontext
Version: 1.0.3
Published: Apr 7, 2026
Downloads: 298/mo · 7/wk
Author: omarpa
License: MIT
Types: extension
Size: 156.3 KB
Dependencies: 1 dependency · 3 peers

Pi manifest JSON

{
  "extensions": [
    "./extensions/pi"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

thincontext

Drop-in TypeScript middleware that compresses LLM context before it hits the API.

Every agent re-sends the same file reads, system prompts, and tool outputs on every turn. Thincontext sits in the middle and removes the redundancy — transparently, without changing your message format.

Agent → ContextCompressor.compress(messages) → LLM API

Node.js ≥ 18 · TypeScript · ESM + CJS

Install

npm install thincontext

Quick start

import { ContextCompressor } from 'thincontext'

const compressor = new ContextCompressor()

const { messages, stats } = await compressor.compress(myMessages)

console.log(`${stats.savedTokens} tokens saved (${((1 - stats.compressionRatio) * 100).toFixed(1)}%)`)

Zero configuration is needed for the default hash-based dedup behaviour. Add embed and summarize to unlock the full pipeline.

What it does

Five compression stages can run in sequence on every compress() call:

Stage	What it does	Requires
Summarizer	Decays old conversation turns: verbatim → summary → dropped	`summarize` fn
Deduplicator	Skips `system`/`tool` content the LLM already saw this session	nothing (hash) or `embed` fn (semantic)
Chunker	Extracts only relevant lines from large code/document context	`embed` fn
ReferenceCompressor	Replaces repeated large blocks with short `[ref:...]` tokens	nothing
BudgetManager	Drops lowest-priority messages to fit a hard token budget	nothing

Each module only activates when its dependencies are provided — the compressor degrades gracefully.

In practice

In a typical coding agent session where the same files are read across multiple turns:

first read of a large file: content is normalised and passed through
subsequent turns: the full file content can be replaced with a short reference or duplicate marker
older conversation turns (if summarize is configured): progressively compressed to short summaries, then dropped

In real Pi testing, thincontext produced meaningful savings on repeated tool-heavy turns, but not on every turn.

Important: savings are opportunistic, not guaranteed

Thincontext does not guarantee token savings on every turn.

A Pi footer like:

🗜 -0% chars

can be completely normal even when the extension is installed and working.

Helps most when

the agent reads the same files repeatedly across turns
the agent produces the same or very similar tool output multiple times
there are large tool results that exceed the truncation limit
repeated outputs are old enough to pass the dedup window

Helps less when

most output is new and unique
the session is dominated by fresh writes/edits
the repeated content is still too recent to deduplicate
tool outputs are already short
protected modification history must remain visible

Why you may see 0% savings

Some turns are mostly made of:

one-off bash output
fresh read results
recent edit / write operations
unique install logs or error logs

In those cases, thincontext may correctly decide that there is little or nothing safe to compress.

Options

new ContextCompressor({
  budget: 8000,
  embed: myEmbedFn,
  summarize: mySummarizeFn,
  countTokens: myTokenFn,

  dedup: {
    strategy: 'hash',
    threshold: 0.92,
    maxVectors: 5000,
  },

  summarization: {
    keepLastFull: 5,
    summarizeBeyond: 10,
  },

  chunking: {
    maxLines: 50,
    contextLines: 5,
    minLines: 100,
  },
})

Adapters

Adapters ship as separate entrypoints — zero impact on the core bundle if unused.

Embedding

import { openaiEmbed } from 'thincontext/embeddings/openai'
import { localEmbed } from 'thincontext/embeddings/local'

const compressor = new ContextCompressor({
  embed: openaiEmbed({ apiKey: process.env.OPENAI_API_KEY! }),
  // or: embed: await localEmbed()
})

Summarization

import { anthropicSummarize } from 'thincontext/summarize/anthropic'
import { openaiSummarize } from 'thincontext/summarize/openai'

const compressor = new ContextCompressor({
  summarize: anthropicSummarize({ apiKey: process.env.ANTHROPIC_API_KEY! }),
})

Message conversion

import { fromOpenAI } from 'thincontext/adapters/openai'
import { fromAnthropic } from 'thincontext/adapters/anthropic'

Message priorities

Tag messages to control how BudgetManager handles token pressure:

const messages = [
  { role: 'system', content: 'You are...', priority: 'critical' },
  { role: 'user', content: ragChunk, priority: 'low' },
  { role: 'assistant', content: lastReply, priority: 'high' },
]

Priorities: 'critical' · 'high' · 'normal' · 'low'

Session persistence

State (seen hashes, summary cache, ref table) lives in memory and survives across compress() calls.

const snapshot = compressor.export()
const compressor2 = ContextCompressor.restore(snapshot, { budget: 8000 })

Integrations

Pi agent

Install as a Pi package — the extension is bundled inside the npm package:

pi install npm:thincontext

Or add to your ~/.pi/agent/settings.json:

{
  "packages": ["npm:thincontext"]
}

The extension hooks Pi's context event to compress messages before every LLM call, with tool result deduplication and a live footer:

🗜 -72% chars

Commands inside Pi:

/thincontext on|off|reset|budget <n>|lines <n>|dedup-after <turns>|debug

Pi-specific notes

Current defaults are conservative:

maxToolLines = 300
dedupAfterTurns = 2
recent edit/write tool results are protected from budget dropping

Known limitations:

bash writes such as sed -i or echo > file are not reliably detected as modification records
truncation can hide important information that appears late in very long output
token estimates shown by the extension are approximate; Pi's own usage counters are more trustworthy
a given turn may show no savings even when the extension is working correctly

Claude Code

No context interception hook exists in Claude Code's interactive CLI — there is no equivalent to Pi's context event that fires before each LLM call.

The thincontext library still works for custom SDK/wrapper workflows, but a true drop-in Claude Code CLI plugin equivalent to the Pi extension is not currently possible with the available integration surface.

Token counting for Claude

cl100k_base is GPT-4's tokenizer. For Claude, expect some variance. See docs/token-counting.md for custom counter guidance.

What this is not

not an LLM proxy
not a RAG system
not model-specific
not a browser library

Publishing

The repo includes a GitLab pipeline that:

runs typecheck/tests on pushes
publishes to npm on version tags like v1.0.0

After publish, users can install with:

npm install thincontext

or in Pi:

pi install npm:thincontext

Development

npm ci
npm run typecheck
npm test
npm run build

License

MIT