pi-arxivist

Fetch arxiv papers as Markdown (pi extension)

Packages

Package details

extension

Install pi-arxivist from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-arxivist
Package
pi-arxivist
Version
0.1.3
Published
Jun 16, 2026
Downloads
not available
Author
lhufo
License
MIT
Types
extension
Size
50.4 KB
Dependencies
1 dependency · 2 peers
Pi manifest JSON
{
  "extensions": [
    "./dist/index.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-arxivist

Fetch arxiv papers as clean Markdown, right inside pi. Zero config, zero system dependencies.

Arxiv provides LaTeX source tarballs for most papers. fetch_arxiv downloads the source, flattens \input/\include references, and converts the result to Markdown via pandoc. No PDF extraction, no garbled math, no lost structure.

Install

pi install npm:pi-arxivist

Usage

fetch_arxiv 1203.6859
fetch_arxiv https://arxiv.org/abs/1203.6859
fetch_arxiv https://arxiv.org/pdf/1203.6859

Accepts bare IDs, abstract URLs, or PDF URLs.

What it returns

  • Title, authors, abstract — extracted from the document metadata (pandoc handles nested braces, \thanks footnotes)
  • Body as Markdown — math preserved as $...$ / $$...$$, unknown LaTeX commands passed through as raw TeX
  • Output path — full paper at output/paper.md inside the cache directory
  • Preamble path — macro definitions extracted to preamble.tex so you can inspect them on demand

The tool truncates output to fit context limits. Use read on the output path for the rest.

How it works

  1. Downloads the source tarball from arxiv.org/e-print/<id>
  2. Extracts with tar
  3. Finds the main .tex file (heuristic: first file with \documentclass)
  4. Recursively resolves \input/\include commands into a single flat document
  5. Splits preamble from body, writes preamble to preamble.tex
  6. Extracts metadata (title, authors, abstract) via pandoc's JSON AST
  7. Converts body to Markdown via the official pandoc WASM binary

No system pandoc or LaTeX distribution needed.