pi-arxivist

Fetch arxiv papers as Markdown (pi extension)

Packages

Package details

extension

Install pi-arxivist from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-arxivist
Package
pi-arxivist
Version
0.1.7
Published
Jun 18, 2026
Downloads
687/mo · 687/wk
Author
lhufo
License
MIT
Types
extension
Size
62.7 KB
Dependencies
2 dependencies · 3 peers
Pi manifest JSON
{
  "extensions": [
    "./dist/index.js"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-arxivist

Fetch arxiv papers as clean Markdown, right inside pi. Zero config, zero system dependencies.

Arxiv provides LaTeX source tarballs for most papers. fetch_arxiv downloads the source, flattens \input/\include references, and converts the result to Markdown via pandoc. No PDF extraction, no garbled math, no lost structure.

Install

pi install npm:pi-arxivist

Usage

fetch_arxiv 1203.6859
fetch_arxiv https://arxiv.org/abs/1203.6859
fetch_arxiv https://arxiv.org/pdf/1203.6859

Accepts bare IDs, abstract URLs, or PDF URLs.

What it returns

  • paper.md — full paper in the cache directory, math preserved as $...$ / $$...$$
  • meta.json — full frontmatter as JSON (title, abstract, authors, etc.)
  • preamble.tex — macro definitions that pandoc couldn't process, extracted for inspection

The tool truncates output to fit context limits. Use read on the output path for the full paper.

How it works

  1. Downloads the source tarball from arxiv.org/e-print/<id>
  2. Extracts with tar
  3. Builds a dependency graph from \input/\include references across all .tex files, and selects the root by indegree
  4. Resolves the graph into a single flat document (circular-reference-safe, \includeonly-aware)
  5. Converts the full source to Markdown via the official pandoc WASM binary
  6. Extracts metadata from the pandoc-generated YAML frontmatter
  7. Extracts unprocessed preamble macros to preamble.tex

No system pandoc or LaTeX distribution needed.