pi-qwen-mode-proxy
Sampling mode proxy for Qwen models served via llama.cpp โ switch between thinking, coding, and instruct modes
Package details
Install pi-qwen-mode-proxy from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-qwen-mode-proxy- Package
pi-qwen-mode-proxy- Version
1.0.1- Published
- Apr 25, 2026
- Downloads
- 259/mo ยท 259/wk
- Author
- darthrax
- License
- MIT
- Types
- extension
- Size
- 10.9 KB
- Dependencies
- 0 dependencies ยท 1 peer
Pi manifest JSON
{
"extensions": [
"./extensions"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-qwen-mode-proxy
Sampling mode proxy extension for pi that intercepts OpenAI-completions API requests to a llama.cpp server and injects mode-specific sampling parameters for Qwen models (tested with Qwen 3.6 27B). Parameters taken from the recommendation note on model page (https://huggingface.co/Qwen/Qwen3.6-27B).
Modes
| Parameter | ๐ง Thinking | ๐ป Coding | ๐ Instruct |
|---|---|---|---|
| temperature | 1.0 | 0.6 | 0.7 |
| top_p | 0.95 | 0.95 | 0.80 |
| top_k | 20 | 20 | 20 |
| min_p | 0.0 | 0.0 | 0.0 |
| presence_penalty | 0.0 | 0.0 | 1.5 |
| repetition_penalty | 1.0 | 1.0 | 1.0 |
- Thinking โ Creative, exploratory tasks. High temperature for diverse output.
- Coding โ Precise, deterministic coding tasks. Lower temperature for consistent results.
- Instruct โ Instruction-following with presence penalty to encourage topic variety.
Installation
npm
pi install npm:pi-qwen-mode-proxy
git
pi install git:github.com/YOUR_USERNAME/pi-qwen-mode-proxy
local
pi install /path/to/pi-qwen-mode-proxy
Usage
Requires a llama.cpp server serving a Qwen model, registered as the llamacpp provider in ~/.pi/agent/models.json:
{
"providers": {
"llamacpp": {
"baseUrl": "http://10.10.10.11:8080/v1",
"api": "openai-completions",
"apiKey": "llamacpp",
"compat": {
"supportsDeveloperRole": true,
"supportsReasoningEffort": true,
"thinkingFormat": "qwen-chat-template"
},
"models": [
{
"id": "llamacpp",
"name": "(Local AI)",
"reasoning": true,
"input": ["text"],
"contextWindow": 131072,
"maxTokens": 16384,
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
}
]
}
}
}
Commands
/mode Show current mode and parameters
/mode thinking Switch to thinking mode
/mode coding Switch to coding mode
/mode instruct Switch to instruct mode
The current mode is displayed in the status bar footer and persists across /reload via session storage.
How It Works
The extension hooks into pi's before_provider_request event, which fires after pi builds the OpenAI chat completions payload but before it's sent over the network. When the target model is llamacpp, the handler injects the six sampling parameters (temperature, top_p, top_k, min_p, presence_penalty, repetition_penalty) corresponding to the active mode.
No custom provider or streaming implementation is needed โ the extension works as a lightweight interceptor on top of pi's built-in openai-completions provider.
Configuration
The model ID filter defaults to llamacpp. If your provider/model uses a different ID, edit extensions/index.ts and change the TARGET_MODEL constant.
License
MIT