@offbynan/pi-cursor-provider

Pi extension providing access to Cursor models via OAuth and a local OpenAI-compatible gRPC proxy

Packages

Package details

extension

Install @offbynan/pi-cursor-provider from npm and Pi will load the resources declared by the package manifest.

npm repo home report

$ pi install npm:@offbynan/pi-cursor-provider

Package: @offbynan/pi-cursor-provider
Version: 0.3.0
Published: May 17, 2026
Downloads: 78/mo · 78/wk
Author: offbynan
License: MIT
Types: extension
Size: 625.8 KB
Dependencies: 1 dependency · 2 peers

Pi manifest JSON

{
  "extensions": [
    "./index.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-cursor-provider

This fork improves on the upstream in ten areas: image support, correct pi -p exit behaviour, removal of dead eviction code, accurate per-model context window inference, post-compaction session sync, context window scaling when Cursor enforces a tighter cap, per-model cost estimation, model deduplication with reasoning-effort mapping, thinking-tag filtering, and structured debug logging. See the sections below for details.

Pi extension that provides access to Cursor models via OAuth authentication and a local OpenAI-compatible proxy.

Forked from ndraiman/pi-cursor-provider.

Changes vs upstream

Image support

This fork extends the proxy to handle images in OpenAI-style image_url content parts:

Base64 images — data:image/png;base64,... payloads are extracted from the request, stored as blobs in Cursor's protobuf format, and forwarded to the upstream API.
Multi-turn state — images are tracked per conversation turn and threaded correctly through session checkpoints, forks, and resumes.
Transparent to callers — no API changes; just include standard image_url content parts in your messages as you would with any OpenAI-compatible client.

The upstream repo does not support images at all — they are silently ignored or cause request failures. This fork handles them properly end-to-end.

`pi -p` exit fix

The upstream repo causes pi -p (non-interactive mode) to hang indefinitely after printing a response. Two bugs were responsible:

Empty end-stream body misclassified as error. Cursor's Connect end-stream frame often has a 0-byte body. JSON.parse("") throws, so the proxy took the error path even on clean completions.
Bridge never unref'd on error path. bridge.end() and bridge.unref() were only called in the success branch. On the error path the h2-bridge child process stayed ref'd, blocking process exit.

This fork fixes both: empty and non-JSON end-stream bodies are treated as success, and the bridge is always unref'd regardless of the outcome.

Removed dead eviction code

The upstream proxy included a 30-minute TTL eviction mechanism (evictStaleConversations, CONVERSATION_TTL_MS, sessionScoped, lastAccessMs). All conversations created by pi include a session ID, permanently exempting them from TTL eviction, so this code was never reachable. This fork removes it.

Accurate per-model context window inference

Cursor's GetUsableModels RPC does not return context window sizes, so the upstream proxy hardcodes 200 k for every model. This fork exports an inferContextWindow(id) function that derives the correct window from known model families:

Family	Window
Claude 4.6 Sonnet / Opus	1 M
All other Claude	200 k
Gemini 2.5 / 3.x	1 M
GPT nano / mini variants	128 k
GPT-5.5+	1 M
GPT-5.x (other)	400 k
Grok 4	256 k
Kimi K2.x	262 k
Anything with `-1m` suffix	1 M
Unknown / Composer	200 k

This ensures pi uses the right compaction thresholds and token budget for each model.

Post-compaction session sync

When pi compacts its message list (the session_compact lifecycle event), the proxy's cached conversation checkpoint still reflects the full pre-compaction conversation. Continuing without clearing that cache would cause a history mismatch, forcing an expensive full reconstruction on the next request.

This fork listens for session_compact and eagerly clears the stored checkpoint for the affected session, so both sides stay in sync at zero extra cost.

Context window scaling when Cursor enforces a tighter cap

Cursor sometimes enforces a tighter context window at runtime than what the model ID implies (for example, capping Gemini at 200 k even though we registered 1 M). In that case the raw usedTokens from Cursor's ConversationTokenDetails would appear far below pi's compaction threshold, so pi would never compact — then Cursor would eventually error with a context-overflow.

This fork reads maxTokens from ConversationTokenDetails and, when Cursor's cap is tighter than the inferred window, scales total_tokens proportionally:

total_tokens = round(usedTokens × piWindow / cursorWindow)

That makes pi's compaction threshold fire at the right time relative to the window Cursor is actually enforcing.

Per-model cost estimation

The upstream repo provides no cost data, so pi cannot show per-turn cost estimates for Cursor models.

This fork ships a detailed cost table (input / output / cache-read / cache-write prices in $/M tokens) covering every current model family — Claude 4.x, GPT-5.x, Gemini 2.5/3.x, Grok 4, Kimi K2, and Composer — plus a pattern-based fallback for variants not yet in the table. Pi uses this data to display cost estimates after each turn.

Model deduplication with reasoning-effort mapping

Cursor's GetUsableModels RPC can return dozens of near-duplicate IDs that differ only by effort suffix (e.g. gpt-5.4-low, gpt-5.4-medium, gpt-5.4-high, gpt-5.4-xhigh). The upstream passes all of these through verbatim, producing a cluttered model list where the user must manually pick the right suffix and pi's reasoning-effort setting is ignored.

This fork deduplicates them: model variants that share the same base ID and differ only by effort suffix are collapsed into a single entry with supportsReasoningEffort: true and an effort map keyed by pi's reasoning levels (minimal / low / medium / high / xhigh). Pi's thinking-level setting then drives the effort suffix automatically, and the model list stays manageable. See the Model Mapping section for the full deduplication rules.

Thinking-tag filtering

Some models (notably certain Gemini variants) emit reasoning content inline with the response, wrapped in tags like <think>, <thinking>, <reasoning>, or <thought>. The upstream passes this through as raw text, polluting the main response with unrendered XML tags.

This fork detects and strips these tags in the proxy's stream processor, routing the extracted content to the reasoning_content SSE field so pi renders it as structured reasoning rather than as part of the assistant's reply.

Structured debug logging

The upstream has no observability. This fork adds opt-in JSONL event logging (set PI_CURSOR_PROVIDER_DEBUG=1) covering every stage of a request: HTTP ingress, message parsing, checkpoint reads/writes, bridge lifecycle, tool call pauses, tool result resumes, and stream completion. A bundled debug:timeline script converts a raw log file into a compact human-readable timeline for diagnosing proxy behaviour.

npm run debug:timeline -- --latest

How it works

pi  →  openai-completions  →  localhost:PORT/v1/chat/completions
                                      ↓
                              proxy.ts (HTTP server)
                                      ↓
                              h2-bridge.mjs (Node HTTP/2)
                                      ↓
                              api2.cursor.sh gRPC

PKCE OAuth — browser-based login to Cursor, no client secret needed
Model discovery — queries Cursor's GetUsableModels gRPC endpoint
Local proxy — translates OpenAI /v1/chat/completions to Cursor's protobuf/HTTP2 Connect protocol
Tool routing — rejects Cursor's native tools, exposes pi's tools via MCP

Install

# Via pi install
pi install npm:@offbynan/pi-cursor-provider

# Or manually
git clone https://github.com/offbynan/pi-cursor-provider ~/.pi/agent/extensions/cursor-provider
cd ~/.pi/agent/extensions/cursor-provider
npm install

Usage

/login cursor     # authenticate via browser
/model            # select a Cursor model

Model Mapping

Cursor exposes many model variants that encode effort level (low, medium, high, xhigh, max, none) and speed (-fast) or thinking (-thinking) in the model ID. This extension deduplicates them so pi's reasoning effort setting controls the effort level.

How it works

Each raw Cursor model ID is parsed into components:

{base}-{effort}[-fast|-thinking]

Examples:

Raw Cursor ID	Base	Effort	Variant
`gpt-5.4-medium`	`gpt-5.4`	`medium`	—
`gpt-5.4-high-fast`	`gpt-5.4`	`high`	`-fast`
`claude-4.6-opus-max-thinking`	`claude-4.6-opus`	`max`	`-thinking`
`gpt-5.1-codex-max-high`	`gpt-5.1-codex-max`	`high`	—
`composer-2`	`composer-2`	—	—

Models sharing the same (base, variant) with ≥2 effort levels and a sensible default (medium or no-suffix) are collapsed into a single entry with supportsReasoningEffort: true. Pi's thinking level maps to the effort suffix:

Pi Level	Cursor Suffix
`minimal`	`none` (if available) or `low`
`low`	`low`
`medium`	`medium` or no suffix (default)
`high`	`high`
`xhigh`	`max` (Claude) or `xhigh` (GPT)

The proxy inserts the effort before -fast/-thinking:

pi selects: gpt-5.4-fast  +  effort: high  →  Cursor receives: gpt-5.4-high-fast
pi selects: gpt-5.4       +  effort: medium  →  Cursor receives: gpt-5.4-medium
pi selects: composer-2     +  (no effort)     →  Cursor receives: composer-2

When a group is collapsed, the proxy registers one model with supportsReasoningEffort: true and an internal effort map (see table above).

Collapsed when Cursor returns either:

Multiple effort suffixes for the same (base, -fast, -thinking) group, or
A single variant whose parsed effort suffix is non-empty (for example only claude-4.5-opus-high is listed). The suffix is removed from the displayed ID so Pi's reasoning-effort setting supplies it.

Left as-is (raw Cursor ID on that row, supportsReasoningEffort: false) when the group has one variant and the parsed effort suffix is empty—typically IDs with no effort segment, such as composer-2, gemini-3.1-pro, or kimi-k2.5.

Disabling the mapping

To see all raw Cursor model variants without dedup:

PI_CURSOR_RAW_MODELS=1 pi

Session Management

The proxy maintains conversation state per pi session, enabling multi-turn conversations with Cursor models while preserving forks, tool continuations, and interruptions correctly.

How it works

Session tracking — pi's session ID is injected into requests via a before_provider_request hook. The proxy keys bridge state and stored conversation state from that real session ID.
Checkpoints — Cursor returns a conversation checkpoint after completed turns. The proxy stores that checkpoint, plus the completed-turn count and a fingerprint of the completed structured history, and reuses it only when the incoming history still matches.
Session-scoped state — real pi session state is kept in memory until explicit cleanup or process restart. Anonymous fallback state can still be TTL-evicted.
Lifecycle cleanup — session state is cleaned up on pi lifecycle events such as session switch, fork, /tree, and shutdown.

Tool continuations

When Cursor pauses for a tool call, the proxy keeps the live upstream bridge open and waits for pi to send the tool result on the next request. That tool result is sent back into the same in-flight Cursor run, so the tool continuation stays part of the original user turn instead of inflating completed history.

Interruptions

If the client disconnects or interrupts a turn mid-stream, the proxy cancels the upstream Cursor run and does not commit the pending checkpoint. Checkpoints are only committed after a turn finishes successfully.

Session fork

When you navigate back in pi's session tree and branch from an earlier point, the proxy discards the stored checkpoint whenever the completed history no longer matches the stored checkpoint metadata. That includes both:

completed turn count mismatches, and
same-depth branch changes detected via completed-history fingerprint mismatch.

After discarding a stale checkpoint, the proxy reconstructs proper protobuf conversation turns from the message history pi sends, so Cursor sees the actual conversation structure at the fork point.

Session resume

Conversation state is stored in memory. If the proxy restarts, checkpoints are lost. On the next request, pi sends the full conversation history, and the proxy reconstructs structured protobuf turns from that history instead of relying on an inline plaintext fallback.

That reconstruction preserves:

assistant messages
tool calls
tool results
final assistant text after tool results

Requirements

Pi
Node.js >= 18
Active Cursor subscription

Development

npm install
npm test

Debug log timeline

When PI_CURSOR_PROVIDER_DEBUG=1 is enabled, the proxy writes timestamped JSONL logs to os.tmpdir() by default. You can turn a log into a compact human-readable timeline with:

npm run debug:timeline -- --latest
npm run debug:timeline -- /path/to/pi-cursor-provider-debug-2026-04-08T14-06-07-565Z-41184.log

Add --json if you want the parsed summary as JSON instead of formatted text.

Credits

OAuth flow and gRPC proxy adapted from opencode-cursor by Ephraim Duncan.

pi-cursor-provider

Changes vs upstream

Image support

pi -p exit fix

Removed dead eviction code

Accurate per-model context window inference

Post-compaction session sync

Context window scaling when Cursor enforces a tighter cap

Per-model cost estimation

Model deduplication with reasoning-effort mapping

Thinking-tag filtering

Structured debug logging

How it works

Install

Usage

Model Mapping

How it works

Disabling the mapping

Session Management

How it works

Tool continuations

Interruptions

Session fork

Session resume

Requirements

Development

Debug log timeline

Credits

`pi -p` exit fix