oh-my-tps
Tiny live TTFT and TPS readouts for the Pi coding agent.
Package details
Install oh-my-tps from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:oh-my-tps- Package
oh-my-tps- Version
0.1.1- Published
- Jun 10, 2026
- Downloads
- not available
- Author
- enderliquid
- License
- MIT
- Types
- extension
- Size
- 21.2 KB
- Dependencies
- 1 dependency · 1 peer
Pi manifest JSON
{
"extensions": [
"./extensions"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
Oh My TPS
English | 简体中文
Install
npm package
pi install npm:oh-my-tps
Git repository
pi install git:github.com/EnderLiquid/oh-my-tps
What it does
oh-my-tps does one thing:
it adds a tiny live speed readout to the Pi TUI so you can see first-token latency and output speed while the model is responding.
τ: TTFT, time to first token, in secondsΔ: TPS, tokens per second
What it looks like:
τ0.8 Δ48.6
That's it.
Ten characters. It just works.
If you want, you can keep reading for the details—but at this point you already know how to use it.
Reading the numbers
You will see readings like these in the TUI footer area:
τ0.8 Δ48.6
τ1.1 Δ49.7L
τ0.8A Δ52.4A
Suffixes:
A: Average, the average final value across recent requestsL: Last, the final value from the previous request
A quick way to read them:
τ0.8 Δ48.6: the response is currently streaming; TTFT was about 0.8s and the current live TPS estimate is about 48.6τ1.1 Δ49.7L: the request has been sent, but streaming has not started yet; TTFT is still counting, so the extension shows the previous request's final TPS as a referenceτ0.8A Δ52.4A: Pi is currently idle, so the extension shows the recent average performance
The live Δ shown during streaming is an estimate. The final Δ shown after the response ends is more trustworthy.
How it works
This section is for people who want to know what the extension is actually measuring.
State machine
Internally, the extension moves through four phases:
- waiting: the request has been sent and is waiting for the first token
- streaming: the assistant is actively streaming output
- settled: the response has finished
- idle: the turn is over and the extension is showing historical values
Example:
idle τ… Δ? (before the very first request)
-> waiting(req1) τ0.2 Δ? (first request in the prompt, no usable average yet, τ updates every 200ms)
...
-> idle τ1.5A Δ50.0A (idle, with historical averages available)
-> waiting(req1) τ0.2 Δ50.0A (first request in the prompt, shows average as baseline while waiting)
-> streaming(req1) τ1.3 Δ51.0 (live Δ updates, τ is now locked)
-> settled(req1) τ1.3 Δ52.0 (final Δ locked, τ locked)
-> waiting(req2+) τ0.2 Δ52.0L (second or later request in the same prompt, uses last final value as baseline)
-> streaming(req2+) τ1.7 Δ49.0 (live Δ updates, τ locked)
-> settled(req2+) τ1.7 Δ49.5 (final Δ locked, τ locked)
-> idle τ1.5A Δ50.0A
Where τ comes from
τ is straightforward.
Once a provider request is sent, the extension enters waiting and refreshes the elapsed time every 200ms. The moment the first assistant streaming update arrives, that time delta is locked in as the TTFT for the request.
So in practice:
- during
waiting,τkeeps increasing - once
streamingbegins,τstops changing - the
τshown later insettled, the historicalτreused before the next request, and theτthat contributes to idle averages are all based on that final locked TTFT
Where live Δ and final Δ come from
These two values come from different sources, and that distinction matters.
Live Δ
During streaming, the provider does not continuously tell Pi exactly how many new output tokens just arrived. That means live Δ has to be estimated locally.
The current implementation does this:
- Take all assistant text that has streamed so far for the current response
- Estimate how many tokens that text roughly corresponds to with
tokenx - Divide that estimate by the elapsed streaming time
In other words, live Δ is essentially:
estimated output tokens so far / elapsed streaming time so far
It is not the provider's real-time token truth. It is a local approximation meant for UI feedback.
Final settled Δ
When the response ends, if the provider returns usage.output, the extension uses that to compute the final TPS:
final output tokens / total streaming time
This is usually more trustworthy than live Δ, because it is based on the provider's final reported output token count rather than a local estimate.
If a provider or a specific response does not return usable output token data, the extension falls back to the last live estimate as a best-effort display value.
Why live and final values can differ
1. Token estimation is heuristic
tokenx is not an exact tokenizer. It is a lightweight heuristic estimator. That is why it works well for fast UI updates: it is small, fast, and easy to run on every streaming update. The tradeoff is obvious: it is not designed to match every model family exactly.
tokenx is designed and benchmarked closer to GPT-style tokenization / English text. When you use other model families or output that contains non-English text, the live estimate can drift further away from the final settled value.
2. Streaming itself is uneven
Model output does not arrive in the UI as a perfectly uniform token-by-token stream. The observed readout is affected by things like:
- the provider's own SSE / chunk flush strategy
- how Pi receives and surfaces updates
- structural changes caused by thinking blocks, tool calls, and normal text appearing together
So live Δ typically behaves like this:
- unstable at first, then gradually settles
- often approaches the final settled value, but does not perfectly match it
Average value A
In the current implementation, A means the average final performance across the most recent 5 provider requests.
- average
τ: the average final TTFT across those recent requests - average
Δ: the average settled TPS across those recent requests
How to interpret the data
A good rule of thumb is:
τ: highly useful- settled / average
Δ: the most useful numbers when comparing results - live
Δ: reflects real-time trend and perceived speed
Where it fits
- useful as a rough quantitative reference for LLM latency and speed
- useful for quickly spotting obviously slow requests in long Pi sessions
- not meant for strict model benchmarking
License
MIT License