pi-autocontext-erdos

Pi package for AutoContext-assisted Erdős-style formal math problem loops with Lean verification.

Packages

Package details

extensionskillprompt

Install pi-autocontext-erdos from npm and Pi will load the resources declared by the package manifest.

$ pi install npm:pi-autocontext-erdos
Package
pi-autocontext-erdos
Version
0.1.43
Published
Jun 2, 2026
Downloads
3,361/mo · 3,361/wk
Author
jayscambler
License
Apache-2.0
Types
extension, skill, prompt
Size
1.5 MB
Dependencies
0 dependencies · 3 peers
Pi manifest JSON
{
  "skills": [
    "./skills"
  ],
  "prompts": [
    "./prompts"
  ],
  "extensions": [
    "./extensions"
  ]
}

Security note

Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.

README

pi-autocontext-erdos

Pi package for AutoContext-assisted Erdős-style formal math problem loops with Lean verification.

Release notes live in docs/RELEASE_<version>.md, with machine-readable release evidence under docs/release-evidence/.

This package is an orchestration layer. It does not replace Lean and does not claim to prove open problems. It helps structure a math exploration loop: problem intake, formalization, lemma planning, proof attempts, Lean-backed verification, and reporting.

Relationship to pi-autocontext-lean-verify

Use this package for the problem loop and research workflow:

pi-autocontext-erdos
  = problem intake + formalization + lemma ladder + attempt logs + reports

Use pi-autocontext-lean-verify as the verifier/proof-repair backend:

pi-autocontext-lean-verify
  = Lean oracle + fixed-template proof-body repair + guardrail checks

Install both when you want proof-repair automation:

pi install npm:pi-autocontext-lean-verify@0.1.17
pi install git:github.com/greyhaven-ai/pi-autocontext-erdos

From a local checkout:

pi install ./pi-autocontext-erdos

For local development, from this checkout:

npm install
npm run validate
npm run pack:dry-run
pi install ./pi-autocontext-erdos --local

Provided resources

  • Extension tool: autocontext_erdos
  • Skill: erdos-math-loop
  • Prompt template: erdos-math-loop.md
  • Domain notes: docs/DOMAIN.md
  • Validation notes: docs/VALIDATION.md
  • Lake/Mathlib verifier notes: docs/LAKE_MATHLIB_SUPPORT.md
  • Mathlib benchmark evidence: docs/MATHLIB_BENCHMARK.md
  • Final theorem audit guardrails: docs/THEOREM_AUDIT.md
  • Reusable bounded artifact schemas and validator: docs/SCHEMAS.md
  • Benchmark registry: benchmarks/index.json
  • Release process and evidence cross-checks: docs/RELEASE_PROCESS.md
  • Lemma graph orchestration: docs/LEMMA_GRAPH.md
  • Erdős–Straus parametric family benchmark: docs/ERDOS_STRAUS_PARAMETRIC_FAMILIES.md
  • Erdős–Straus bounded certificate-search benchmark: docs/ERDOS_STRAUS_BOUNDED_SEARCH.md
  • Erdős–Straus bounded-search stress benchmark: docs/ERDOS_STRAUS_BOUNDED_STRESS.md
  • Schur lower-bound bounded certificate benchmark: docs/SCHUR_LOWER_BOUND_BOUNDED.md
  • Van der Waerden lower-bound bounded certificate benchmark: docs/VAN_DER_WAERDEN_LOWER_BOUND_BOUNDED.md
  • Example runs:
    • examples/erdos-ginzburg-ziv-n2
    • examples/erdos-ginzburg-ziv-n3
    • examples/schur-two-color-n5
    • examples/schur-two-color-n5-lemma-graph
    • examples/schur-two-color-lower-bound
    • examples/van-der-waerden-3term-lower-bound
    • examples/mathlib-divisibility-prime
    • examples/erdos-straus-parametric-families
    • examples/erdos-straus-bounded-search

Example validation

The checked examples include the n = 2 and n = 3 finite instances of the Erdős–Ginzburg–Ziv theorem over natural numbers, plus a full-pipeline finite Schur two-color n = 5 benchmark. The n = 2 EGZ example proves:

theorem egz_two_nat (a b c : Nat) :
    (a + b) % 2 = 0 ∨ (a + c) % 2 = 0 ∨ (b + c) % 2 = 0

The harder EGZ n = 3 example proves that among any five natural numbers, one of the ten triples has sum divisible by 3. The AutoContext repair smoke in docs/EGZ_N3_AUTOCONTEXT_REPAIR.md deliberately breaks that proof, repairs it back to a Lean-verified candidate with autocontext_erdos, and the compact repaired proof is now the canonical example.

The Schur n = 5 example proves the finite two-color statement that every coloring of 1..5 contains a monochromatic x + y = z triple. It exercises direct Lean verification, audit_theorems, fixed-template export, and pi-autocontext-lean-verify handoff; see docs/SCHUR_N5_FULL_PIPELINE.md.

The Schur lower-bound bounded example emits explicit two-coloring certificates for 2 ≤ n ≤ 4, including the finite witness S(2) > 4, using the reusable bounded artifact schema with a non-Erdős–Straus certificate payload; see docs/SCHUR_LOWER_BOUND_BOUNDED.md.

The Mathlib divisibility-prime benchmark imports Mathlib in a Lake project and proves a scoped Euclid-lemma API theorem using Nat.Prime.dvd_mul; see docs/MATHLIB_BENCHMARK.md.

The Erdős–Straus parametric-family example verifies only the elementary even-denominator family as a rational identity over ; it is explicitly not a proof of the full conjecture. It exercises Lake/Mathlib verification, lemma graph frontier tracking, audit_theorems, fixed-template export, and graph result recording; see docs/ERDOS_STRAUS_PARAMETRIC_FAMILIES.md.

The Erdős–Straus bounded-search example generates exact certificates for 2 ≤ n ≤ 100, emits them as chunked Lean certificate modules plus aggregate generatedCertificates data, proves reusable Boolean integer-checker soundness, proves checker-pass/validity/local chunk coverage/hierarchical aggregate coverage theorems, then derives the public finite theorem from those checker theorems in Lean/Mathlib. The JavaScript search is only a candidate generator; Lean remains the correctness oracle. The generator supports --min, --max, --lean-chunk-size, --check, reusable bounded artifact schemas, an artifact integrity manifest, and npm run validate:generated; see docs/ERDOS_STRAUS_BOUNDED_SEARCH.md and docs/SCHEMAS.md. Out-of-band stress benchmarking through npm run stress:erdos-straus-bounded supports --resume/partial reports plus first-class chunk-summary validation, Lean/Lake toolchain preflight, and Lake cache-template recovery; see docs/ERDOS_STRAUS_BOUNDED_STRESS.md.

The van der Waerden lower-bound example is a third finite/bounded registry benchmark. It generates explicit two-coloring certificates for 3 ≤ n ≤ 8 with no monochromatic 3-term arithmetic progression and proves the public Lean theorem van_der_waerden_two_color_three_term_lower_bound_eight; see docs/VAN_DER_WAERDEN_LOWER_BOUND_BOUNDED.md.

Validate the package and examples with:

npm run validate
npm run typecheck
npm run validate:extension
npm run validate:schemas
npm run validate:artifacts
npm run validate:evidence
npm run validate:generated
npm run audit:benchmarks -- --dry-run
npm run audit:benchmarks -- --out-root /tmp/pi-autocontext-erdos-benchmark-audits
npm run smoke:installed
npm run validate:example
npm run validate:lake-example

AutoContext-backed tool

The package registers one consolidated Pi tool:

autocontext_erdos

Actions:

  • preflight: check Lean and uvx autoctx improve availability.
  • init_problem: create the run artifact layout.
  • verify_file: run Lean on a formalization and write attempt artifacts.
  • autocontext_improve: invoke autoctx improve with a Lean verifier command and save prompt/rubric/logs/best candidate/verdict.
  • export_fixed_template: freeze a named theorem into a Theorem.template.lean fixture with a {{PROOF}} hole for pi-autocontext-lean-verify proof-body repair.
  • audit_theorems: final-artifact audit that verifies one or more Lean files, optionally runs lake build <target> first, runs #print axioms, checks forbidden constructs, compares per-file imports to optional baselines, and records environment/toolchain hashes.
  • lemma_graph: initialize, propose candidate lemmas, validate, summarize, advance, and record results for dependency graphs under lemmas/lemma_graph.json. The propose mode merges agent-generated candidate nodes as planned, skipping duplicates and recorded dead ends.
  • generate_bounded_certificates: generate or check bounded Erdős–Straus certificate runs with reproducible artifact hashes; Lean verification is still required for proof claims.
  • validate_artifacts: validate schema-stamped bounded certificate, artifact integrity, theorem-audit, and stress artifacts under a run root/audit root.
  • summarize_run: summarize files and attempt verdicts under a run root.

The standalone registry-driven audit runner reads benchmarks/index.json and invokes the same Lean/Lake audit stages for every registered benchmark. Use --list or --dry-run to inspect the plan before running Lean/Lake:

node scripts/audit-benchmarks.mjs --list
node scripts/audit-benchmarks.mjs --dry-run
ELAN_HOME=/tmp/autocontext-elan-home \
LEAN_BIN=/tmp/autocontext-elan-home/bin/lean \
node scripts/audit-benchmarks.mjs --out-root /tmp/pi-autocontext-erdos-benchmark-audits

Release evidence can be cross-checked against npm, git tags, and GitHub Actions runs:

node scripts/validate-evidence.mjs --file docs/release-evidence/<version>.json --cross-check

The AutoContext action uses direct Lean verification by default:

uvx --python 3.12 --from autocontext==0.5.1 autoctx improve ... --verify-cmd "<lean> {file}"

For Mathlib or local Lake-project imports, pass verificationMode="lake" and projectRoot; verification and AutoContext rechecks then use lake env lean <file>. Multi-module generated examples such as the chunked Erdős–Straus certificate benchmark are built with lake build Formalization before root/candidate rechecks. For final audits, pass lakeBuildTarget="Formalization" plus auditFiles to capture the Lake build, per-module Lean verification, per-module import baselines, and per-theorem axiom results in one audit root. init_problem can scaffold a Mathlib-aware Lake project with withMathlib=true. See docs/LAKE_MATHLIB_SUPPORT.md and docs/THEOREM_AUDIT.md.

For whole-file repair, pass preserveTheorems to require exact normalized theorem-statement preservation before a Lean-verified candidate is accepted:

{
  "action": "autocontext_improve",
  "preserveTheorems": ["mod_three_eq_zero_or_one_or_two", "egz_three_nat"]
}

If Lean accepts a candidate but a preserved theorem statement changes, the verdict is theorem_statement_changed with Accepted: false.

For the compact EGZ n = 3 repair loop, current validation recommends timeoutSeconds=240; timeoutSeconds=120 is useful but budget-limited for this task.

When a theorem statement is ready to freeze, use export_fixed_template to hand off proof-body-only repair to pi-autocontext-lean-verify; see docs/LEAN_VERIFY_HANDOFF.md.

Before treating a final theorem as a serious result, run audit_theorems; see docs/THEOREM_AUDIT.md.

MVP workflow

A typical run creates a durable folder such as:

runs/<problem-slug>/
  problem.md
  source_notes.md
  formalization/
    Formalization.lean
    formalization_notes.md
  lemmas/
    lemma_plan.md
  attempts/
    0001/
      prompt.md
      candidate.lean
      lean.stderr.txt
      verdict.md
  reports/
    final_report.md

Every mathematical claim in the final report must be labeled as one of:

  • Lean verified;
  • formalization draft;
  • conjecture;
  • heuristic/proof sketch;
  • failed attempt.

Guardrail posture

Never count a proof unless Lean verifies the exact theorem or unchanged template. Do not use sorry, admit, new axioms, unsafe, theorem weakening, proof-by-importing-answer, or hidden assumptions as success criteria.