thincontext

Drop-in middleware to compress LLM context before it hits the API

Package details

extension

Install thincontext from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:thincontext
Package
thincontext
Version
1.0.3
Published
Apr 7, 2026
Downloads
298/mo · 7/wk
Author
omarpa
License
MIT
Types
extension
Size
156.3 KB
Dependencies
1 dependency · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./extensions/pi"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

thincontext

Drop-in TypeScript middleware that compresses LLM context before it hits the API.

Every agent re-sends the same file reads, system prompts, and tool outputs on every turn. Thincontext sits in the middle and removes the redundancy — transparently, without changing your message format.

Agent → ContextCompressor.compress(messages) → LLM API

Node.js ≥ 18 · TypeScript · ESM + CJS


Install

npm install thincontext

Quick start

import { ContextCompressor } from 'thincontext'

const compressor = new ContextCompressor()

const { messages, stats } = await compressor.compress(myMessages)

console.log(`${stats.savedTokens} tokens saved (${((1 - stats.compressionRatio) * 100).toFixed(1)}%)`)

Zero configuration is needed for the default hash-based dedup behaviour. Add embed and summarize to unlock the full pipeline.


What it does

Five compression stages can run in sequence on every compress() call:

Stage What it does Requires
Summarizer Decays old conversation turns: verbatim → summary → dropped summarize fn
Deduplicator Skips system/tool content the LLM already saw this session nothing (hash) or embed fn (semantic)
Chunker Extracts only relevant lines from large code/document context embed fn
ReferenceCompressor Replaces repeated large blocks with short [ref:...] tokens nothing
BudgetManager Drops lowest-priority messages to fit a hard token budget nothing

Each module only activates when its dependencies are provided — the compressor degrades gracefully.

In practice

In a typical coding agent session where the same files are read across multiple turns:

  • first read of a large file: content is normalised and passed through
  • subsequent turns: the full file content can be replaced with a short reference or duplicate marker
  • older conversation turns (if summarize is configured): progressively compressed to short summaries, then dropped

In real Pi testing, thincontext produced meaningful savings on repeated tool-heavy turns, but not on every turn.


Important: savings are opportunistic, not guaranteed

Thincontext does not guarantee token savings on every turn.

A Pi footer like:

🗜 -0% chars

can be completely normal even when the extension is installed and working.

Helps most when

  • the agent reads the same files repeatedly across turns
  • the agent produces the same or very similar tool output multiple times
  • there are large tool results that exceed the truncation limit
  • repeated outputs are old enough to pass the dedup window

Helps less when

  • most output is new and unique
  • the session is dominated by fresh writes/edits
  • the repeated content is still too recent to deduplicate
  • tool outputs are already short
  • protected modification history must remain visible

Why you may see 0% savings

Some turns are mostly made of:

  • one-off bash output
  • fresh read results
  • recent edit / write operations
  • unique install logs or error logs

In those cases, thincontext may correctly decide that there is little or nothing safe to compress.


Options

new ContextCompressor({
  budget: 8000,
  embed: myEmbedFn,
  summarize: mySummarizeFn,
  countTokens: myTokenFn,

  dedup: {
    strategy: 'hash',
    threshold: 0.92,
    maxVectors: 5000,
  },

  summarization: {
    keepLastFull: 5,
    summarizeBeyond: 10,
  },

  chunking: {
    maxLines: 50,
    contextLines: 5,
    minLines: 100,
  },
})

Adapters

Adapters ship as separate entrypoints — zero impact on the core bundle if unused.

Embedding

import { openaiEmbed } from 'thincontext/embeddings/openai'
import { localEmbed } from 'thincontext/embeddings/local'

const compressor = new ContextCompressor({
  embed: openaiEmbed({ apiKey: process.env.OPENAI_API_KEY! }),
  // or: embed: await localEmbed()
})

Summarization

import { anthropicSummarize } from 'thincontext/summarize/anthropic'
import { openaiSummarize } from 'thincontext/summarize/openai'

const compressor = new ContextCompressor({
  summarize: anthropicSummarize({ apiKey: process.env.ANTHROPIC_API_KEY! }),
})

Message conversion

import { fromOpenAI } from 'thincontext/adapters/openai'
import { fromAnthropic } from 'thincontext/adapters/anthropic'

Message priorities

Tag messages to control how BudgetManager handles token pressure:

const messages = [
  { role: 'system', content: 'You are...', priority: 'critical' },
  { role: 'user', content: ragChunk, priority: 'low' },
  { role: 'assistant', content: lastReply, priority: 'high' },
]

Priorities: 'critical' · 'high' · 'normal' · 'low'


Session persistence

State (seen hashes, summary cache, ref table) lives in memory and survives across compress() calls.

const snapshot = compressor.export()
const compressor2 = ContextCompressor.restore(snapshot, { budget: 8000 })

Integrations

Pi agent

Install as a Pi package — the extension is bundled inside the npm package:

pi install npm:thincontext

Or add to your ~/.pi/agent/settings.json:

{
  "packages": ["npm:thincontext"]
}

The extension hooks Pi's context event to compress messages before every LLM call, with tool result deduplication and a live footer:

🗜 -72% chars

Commands inside Pi:

/thincontext on|off|reset|budget <n>|lines <n>|dedup-after <turns>|debug

Pi-specific notes

Current defaults are conservative:

  • maxToolLines = 300
  • dedupAfterTurns = 2
  • recent edit/write tool results are protected from budget dropping

Known limitations:

  • bash writes such as sed -i or echo > file are not reliably detected as modification records
  • truncation can hide important information that appears late in very long output
  • token estimates shown by the extension are approximate; Pi's own usage counters are more trustworthy
  • a given turn may show no savings even when the extension is working correctly

Claude Code

No context interception hook exists in Claude Code's interactive CLI — there is no equivalent to Pi's context event that fires before each LLM call.

The thincontext library still works for custom SDK/wrapper workflows, but a true drop-in Claude Code CLI plugin equivalent to the Pi extension is not currently possible with the available integration surface.


Token counting for Claude

cl100k_base is GPT-4's tokenizer. For Claude, expect some variance. See docs/token-counting.md for custom counter guidance.


What this is not

  • not an LLM proxy
  • not a RAG system
  • not model-specific
  • not a browser library

Publishing

The repo includes a GitLab pipeline that:

  • runs typecheck/tests on pushes
  • publishes to npm on version tags like v1.0.0

After publish, users can install with:

npm install thincontext

or in Pi:

pi install npm:thincontext

Development

npm ci
npm run typecheck
npm test
npm run build

License

MIT