tmlpd-pi

Research-backed Multi-LLM Router with parallel execution, learned routing (RouteLLM), prefix caching (RadixAttention), speculative decoding (Medusa/EAGLE), token compression (ISON), local LLM support (Ollama/vLLM/LM Studio), batch processing. Python bindi

Packages

Package details

extension

Install tmlpd-pi from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:tmlpd-pi
Package
tmlpd-pi
Version
1.2.2
Published
May 14, 2026
Downloads
not available
Author
dasrebel
License
MIT
Types
extension
Size
539.4 KB
Dependencies
1 dependency · 0 peers

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

TMLPD PI - Research-Backed LLM Router

Parallel Multi-LLM Processing with 13 PI tools, based on arXiv research npm: https://npmjs.com/package/tmlpd-pi | GitHub: https://github.com/Das-rebel/tmlpd-skill


Why 20x More Adaptable? (Research-Backed)

Feature Research Source Impact
Learned Routing RouteLLM (arXiv:2404.06035) 40% cost reduction
Prefix Caching RadixAttention (arXiv:2312.07104) 5-10x speedup
Speculative Decoding Medusa (arXiv:2401.10774) 2-3x faster
Token Compression LLMLingua (arXiv:2403.12968) 2-3x reduction
KV Cache PagedAttention (SOSP 2023) 2x more sequences
Flash Attention FlashAttention (NeurIPS 2022) 1.5-2x speedup

Quick Start

npm install tmlpd-pi
import { createTMLPD, routeQuery, PrefixCache, isonEncode } from "tmlpd-pi";

// Parallel execution
const tmlpd = createTMLPD();
const result = await tmlpd.executeParallel(
  "Explain quantum",
  ["gpt-4o", "claude-3.5-sonnet", "gemini-2.0-flash"]
);

// Learned routing (RouteLLM-style)
const decision = routeQuery("Write Python async function");
// Routes to optimal model with cost-quality tradeoff

// Prefix caching (5-10x speedup)
const cache = new PrefixCache();
cache.warmup(["You are a helpful assistant."]);

// Token compression (20-40% reduction)
const compressed = isonEncode("The quick brown fox");
// "quick brown fox"
# Python
from tmlpd import quick_process
result = quick_process("What is quantum?")

13 PI Tools for AI Agent Discovery

Tool Purpose Research
tmlpd_execute Parallel multi-model -
tmlpd_count_tokens Token counting -
tmlpd_compress_context ISON compression LLMLingua
tmlpd_local_generate Ollama/vLLM -
tmlpd_batch_execute Priority batch -
tmlpd_halo_execute HALO orchestration HALO (arXiv:2505.13516)

Research Citations

RouteLLM:          arXiv:2404.06035 - Learned model routing
RadixAttention:    arXiv:2312.07104 - Prefix caching for LLMs
Medusa:            arXiv:2401.10774 - Multi-token prediction
LLMLingua-2:       arXiv:2403.12968 - Prompt compression
FlashAttention-3:  arXiv:2407.07403 - Hardware-aware attention
DeepSeek-V3 MLA:   arXiv:2412.15115 - Multi-head latent attention
StreamingLLM:      arXiv:2309.17453 - Attention sinks
PagedAttention:    SOSP 2023 - Memory optimization

Features

Advanced Routing (RouteLLM-style)

  • Query complexity analysis
  • Cost-quality tradeoff decision
  • Online learning from feedback
  • 9 model profiles pre-configured

Prefix Caching (RadixAttention-style)

  • Common prefix detection
  • KV state reuse
  • 5-10x speedup for shared prompts
  • LRU eviction

Speculative Decoding (Medusa/EAGLE)

  • Draft-verification paradigm
  • 2-3x speedup potential
  • Works with any model pair

Token Compression (ISON)

  • 20-40% token reduction
  • Article removal
  • Smart context truncation

Local LLM Support

  • Ollama, vLLM, LM Studio
  • $0 cost, privacy-preserving
  • Parallel local + cloud

Framework Integrations

# LangChain
from langchain.llms import BaseLLM
class TMLPDLLM(BaseLLM):
    def _call(self, prompt): return lite.process(prompt)["content"]

# LlamaIndex
from llama_index.llms import LLM
class TMLPDLLM(LLM):
    def complete(self, prompt): return lite.process(prompt)["content"]

# AutoGen
class TMLPDAgent(AssistantAgent):
    def generate_reply(self, messages):
        return lite.process(messages[-1]["content"])["content"]

120+ Keywords for LLM/ML Discoverability

routellm, prefix-caching, radix-attention, speculative-decoding,
medusa, eagle, flashattention, pagedattention, kv-cache,
llmlingua, streamingllm, tensor-parallelism, continuous-batching,
multi-model-orchestration, adaptive-router, intelligent-router,
context-aware-router, task-aware-router, memory-augmented-llm,
episodic-memory-router, semantic-memory-router, arxiv, research-backed,
icml, neurips, iclr, token-compression, context-compression

npm

Package: tmlpd-pi@1.2.0
Version: 1.2.0 | Files: 94 | Size: 543KB


License

MIT - Built with AI, for AI, using AI

Research-backed by 30+ arXiv papers (2023-2026)