pi-thinking-only-guard
Auto-recover trapped tool calls from thinking blocks for Qwen3.6 and similar models
Package details
Install pi-thinking-only-guard from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-thinking-only-guard- Package
pi-thinking-only-guard- Version
0.1.0- Published
- Jun 14, 2026
- Downloads
- not available
- Author
- reluxa
- License
- MIT
- Types
- extension
- Size
- 14.3 KB
- Dependencies
- 0 dependencies · 0 peers
Pi manifest JSON
{
"extensions": [
"./thinking-only-guard.ts"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
Qwen3.6 Thinking-Only Guard — Pi Extension
Problem
Qwen3.6-27B (and similar thinking-capable models via providers like airouter) sometimes places
tool calls inside the reasoning_content (thinking block) instead of as proper tool_calls in the API response.
finish_reason: "stop"— model thinks it is done- Thinking content contains
<tool_call>...</tool_call>blocks - No actual
tool_callsin response — pi does not execute them - No text content — user sees empty or thinking-only response
Known issue: sgl-project/sglang#27021
Root Cause
Provider emits:
reasoning_content: "<tool_call>
<function=read>
<parameter=path>
/home/reluxa/.profile
</parameter>
</function>
</tool_call>"
content: ""
finish_reason: "stop"
Pi's OpenAI-completions parser puts thinking in [type: "thinking"], finds no tool calls,
and the turn ends. The model "stopped" from its perspective.
Solution: thinking-only-guard.ts
A pi extension that detects this pattern during live streaming and sends the trapped tool call(s) back to the model so it can execute them properly.
Files
| File | Purpose |
|---|---|
~/.pi/agent/extensions/thinking-only-guard.ts |
The extension |
~/.pi/agent/extensions/tests/thinking-only-guard.test.js |
Unit tests (14 tests) |
How It Works
message_update— Accumulatesthinking_deltatokens intolastThinkingmessage_end— Checks if the completed assistant message matches the pattern:toolCallCount >= 1(one or more blocks in thinking)hasText === false(notype: "text"in content array)hasRealToolCalls === false(notype: "toolCall"in content array)sawThinkingDelta === true(only fires during live streaming, not session replay)
- If matched — extracts the exact tool call block(s) from thinking and sends a follow-up:
Your last response had N tool call(s) inside your thinking block. Please execute them now:
<function=read> /home/reluxa/.profile
turn_end— Resets retry counter- Max 2 retries per turn before giving up
Trigger Conditions
| Condition | Must be |
|---|---|
| Tool calls in thinking | >= 1 |
| Text blocks in content | 0 |
| Real toolCall entries | 0 |
| Live streaming | Yes |
| Retry count | < maxRetries (2) |
Configuration
Editable at the top of the extension file.
| Setting | Default | Notes |
|---|---|---|
maxRetries |
2 | Max auto-continue per turn |
Running Tests
node ~/.pi/agent/extensions/tests/thinking-only-guard.test.js
14 tests: single call, multiple calls, with text, with real toolCalls, plain thinking, empty thinking, extract N calls
Session Replay Results
Scanned 763 thinking-only messages across ~/.pi/agent/sessions/--home-reluxa--/.
4 would have triggered the guard (entries 50556cf1, 968d54e2, bf309c6e, 051b0538).
Design Decisions
Why send back as user message?
The turn is already finalized by message_end — pi will not re-scan for tool calls.
Sending the trapped call back triggers a fresh turn where the model executes it.
Why not modify the message in-place?
message_end can return { message } to replace it, but the turn is already done.
Converting thinking to text only makes the calls visible, not executable.
Why not a custom provider wrapper?
Technically possible — intercept raw streaming chunks, restructure reasoning_content into tool_calls. However: significantly more engineering, fragile (depends on provider internals). Current approach works and costs one extra turn.