pi-thinking-only-guard

Auto-recover trapped tool calls from thinking blocks for Qwen3.6 and similar models

Packages

Package details

extension

Install pi-thinking-only-guard from npm and Pi will load the resources declared by the package manifest.

npm home report

$ pi install npm:pi-thinking-only-guard

Package: pi-thinking-only-guard
Version: 0.1.0
Published: Jun 14, 2026
Downloads: not available
Author: reluxa
License: MIT
Types: extension
Size: 14.3 KB
Dependencies: 0 dependencies · 0 peers

Pi manifest JSON

{
  "extensions": [
    "./thinking-only-guard.ts"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

Qwen3.6 Thinking-Only Guard — Pi Extension

Problem

Qwen3.6-27B (and similar thinking-capable models via providers like airouter) sometimes places tool calls inside the reasoning_content (thinking block) instead of as proper tool_calls in the API response.

finish_reason: "stop" — model thinks it is done
Thinking content contains <tool_call> ... </tool_call> blocks
No actual tool_calls in response — pi does not execute them
No text content — user sees empty or thinking-only response

Known issue: sgl-project/sglang#27021

Root Cause

Provider emits:

    reasoning_content: "<tool_call>
<function=read>
<parameter=path>
/home/reluxa/.profile
</parameter>
</function>
</tool_call>"
    content: ""
    finish_reason: "stop"

Pi's OpenAI-completions parser puts thinking in [type: "thinking"], finds no tool calls, and the turn ends. The model "stopped" from its perspective.

Solution: thinking-only-guard.ts

A pi extension that detects this pattern during live streaming and sends the trapped tool call(s) back to the model so it can execute them properly.

Files

File	Purpose
`~/.pi/agent/extensions/thinking-only-guard.ts`	The extension
`~/.pi/agent/extensions/tests/thinking-only-guard.test.js`	Unit tests (14 tests)

How It Works

message_update — Accumulates thinking_delta tokens into lastThinking
message_end — Checks if the completed assistant message matches the pattern:
- toolCallCount >= 1 (one or more blocks in thinking)
- hasText === false (no type: "text" in content array)
- hasRealToolCalls === false (no type: "toolCall" in content array)
- sawThinkingDelta === true (only fires during live streaming, not session replay)
If matched — extracts the exact tool call block(s) from thinking and sends a follow-up:

Your last response had N tool call(s) inside your thinking block. Please execute them now:

<function=read> /home/reluxa/.profile
turn_end — Resets retry counter
Max 2 retries per turn before giving up

Trigger Conditions

Condition	Must be
Tool calls in thinking	>= 1
Text blocks in content	0
Real toolCall entries	0
Live streaming	Yes
Retry count	< maxRetries (2)

Configuration

Editable at the top of the extension file.

Setting	Default	Notes
`maxRetries`	2	Max auto-continue per turn

Running Tests

node ~/.pi/agent/extensions/tests/thinking-only-guard.test.js

14 tests: single call, multiple calls, with text, with real toolCalls, plain thinking, empty thinking, extract N calls

Session Replay Results

Scanned 763 thinking-only messages across ~/.pi/agent/sessions/--home-reluxa--/. 4 would have triggered the guard (entries 50556cf1, 968d54e2, bf309c6e, 051b0538).

Design Decisions

Why send back as user message?

The turn is already finalized by message_end — pi will not re-scan for tool calls. Sending the trapped call back triggers a fresh turn where the model executes it.

Why not modify the message in-place?

message_end can return { message } to replace it, but the turn is already done. Converting thinking to text only makes the calls visible, not executable.

Why not a custom provider wrapper?

Technically possible — intercept raw streaming chunks, restructure reasoning_content into tool_calls. However: significantly more engineering, fragile (depends on provider internals). Current approach works and costs one extra turn.