pi-autocontext-erdos
Pi package for AutoContext-assisted Erdős-style formal math problem loops with Lean verification.
Package details
Install pi-autocontext-erdos from npm and Pi will load the resources declared by the package manifest.
$ pi install npm:pi-autocontext-erdos- Package
pi-autocontext-erdos- Version
0.1.43- Published
- Jun 2, 2026
- Downloads
- 3,361/mo · 3,361/wk
- Author
- jayscambler
- License
- Apache-2.0
- Types
- extension, skill, prompt
- Size
- 1.5 MB
- Dependencies
- 0 dependencies · 3 peers
Pi manifest JSON
{
"skills": [
"./skills"
],
"prompts": [
"./prompts"
],
"extensions": [
"./extensions"
]
}Security note
Pi packages can execute code and influence agent behavior. Review the source before installing third-party packages.
README
pi-autocontext-erdos
Pi package for AutoContext-assisted Erdős-style formal math problem loops with Lean verification.
Release notes live in docs/RELEASE_<version>.md, with machine-readable release evidence under docs/release-evidence/.
This package is an orchestration layer. It does not replace Lean and does not claim to prove open problems. It helps structure a math exploration loop: problem intake, formalization, lemma planning, proof attempts, Lean-backed verification, and reporting.
Relationship to pi-autocontext-lean-verify
Use this package for the problem loop and research workflow:
pi-autocontext-erdos
= problem intake + formalization + lemma ladder + attempt logs + reports
Use pi-autocontext-lean-verify as the verifier/proof-repair backend:
pi-autocontext-lean-verify
= Lean oracle + fixed-template proof-body repair + guardrail checks
Install both when you want proof-repair automation:
pi install npm:pi-autocontext-lean-verify@0.1.17
pi install git:github.com/greyhaven-ai/pi-autocontext-erdos
From a local checkout:
pi install ./pi-autocontext-erdos
For local development, from this checkout:
npm install
npm run validate
npm run pack:dry-run
pi install ./pi-autocontext-erdos --local
Provided resources
- Extension tool:
autocontext_erdos - Skill:
erdos-math-loop - Prompt template:
erdos-math-loop.md - Domain notes:
docs/DOMAIN.md - Validation notes:
docs/VALIDATION.md - Lake/Mathlib verifier notes:
docs/LAKE_MATHLIB_SUPPORT.md - Mathlib benchmark evidence:
docs/MATHLIB_BENCHMARK.md - Final theorem audit guardrails:
docs/THEOREM_AUDIT.md - Reusable bounded artifact schemas and validator:
docs/SCHEMAS.md - Benchmark registry:
benchmarks/index.json - Release process and evidence cross-checks:
docs/RELEASE_PROCESS.md - Lemma graph orchestration:
docs/LEMMA_GRAPH.md - Erdős–Straus parametric family benchmark:
docs/ERDOS_STRAUS_PARAMETRIC_FAMILIES.md - Erdős–Straus bounded certificate-search benchmark:
docs/ERDOS_STRAUS_BOUNDED_SEARCH.md - Erdős–Straus bounded-search stress benchmark:
docs/ERDOS_STRAUS_BOUNDED_STRESS.md - Schur lower-bound bounded certificate benchmark:
docs/SCHUR_LOWER_BOUND_BOUNDED.md - Van der Waerden lower-bound bounded certificate benchmark:
docs/VAN_DER_WAERDEN_LOWER_BOUND_BOUNDED.md - Example runs:
examples/erdos-ginzburg-ziv-n2examples/erdos-ginzburg-ziv-n3examples/schur-two-color-n5examples/schur-two-color-n5-lemma-graphexamples/schur-two-color-lower-boundexamples/van-der-waerden-3term-lower-boundexamples/mathlib-divisibility-primeexamples/erdos-straus-parametric-familiesexamples/erdos-straus-bounded-search
Example validation
The checked examples include the n = 2 and n = 3 finite instances of the Erdős–Ginzburg–Ziv theorem over natural numbers, plus a full-pipeline finite Schur two-color n = 5 benchmark. The n = 2 EGZ example proves:
theorem egz_two_nat (a b c : Nat) :
(a + b) % 2 = 0 ∨ (a + c) % 2 = 0 ∨ (b + c) % 2 = 0
The harder EGZ n = 3 example proves that among any five natural numbers, one of the ten triples has sum divisible by 3. The AutoContext repair smoke in docs/EGZ_N3_AUTOCONTEXT_REPAIR.md deliberately breaks that proof, repairs it back to a Lean-verified candidate with autocontext_erdos, and the compact repaired proof is now the canonical example.
The Schur n = 5 example proves the finite two-color statement that every coloring of 1..5 contains a monochromatic x + y = z triple. It exercises direct Lean verification, audit_theorems, fixed-template export, and pi-autocontext-lean-verify handoff; see docs/SCHUR_N5_FULL_PIPELINE.md.
The Schur lower-bound bounded example emits explicit two-coloring certificates for 2 ≤ n ≤ 4, including the finite witness S(2) > 4, using the reusable bounded artifact schema with a non-Erdős–Straus certificate payload; see docs/SCHUR_LOWER_BOUND_BOUNDED.md.
The Mathlib divisibility-prime benchmark imports Mathlib in a Lake project and proves a scoped Euclid-lemma API theorem using Nat.Prime.dvd_mul; see docs/MATHLIB_BENCHMARK.md.
The Erdős–Straus parametric-family example verifies only the elementary even-denominator family as a rational identity over ℚ; it is explicitly not a proof of the full conjecture. It exercises Lake/Mathlib verification, lemma graph frontier tracking, audit_theorems, fixed-template export, and graph result recording; see docs/ERDOS_STRAUS_PARAMETRIC_FAMILIES.md.
The Erdős–Straus bounded-search example generates exact certificates for 2 ≤ n ≤ 100, emits them as chunked Lean certificate modules plus aggregate generatedCertificates data, proves reusable Boolean integer-checker soundness, proves checker-pass/validity/local chunk coverage/hierarchical aggregate coverage theorems, then derives the public finite theorem from those checker theorems in Lean/Mathlib. The JavaScript search is only a candidate generator; Lean remains the correctness oracle. The generator supports --min, --max, --lean-chunk-size, --check, reusable bounded artifact schemas, an artifact integrity manifest, and npm run validate:generated; see docs/ERDOS_STRAUS_BOUNDED_SEARCH.md and docs/SCHEMAS.md. Out-of-band stress benchmarking through npm run stress:erdos-straus-bounded supports --resume/partial reports plus first-class chunk-summary validation, Lean/Lake toolchain preflight, and Lake cache-template recovery; see docs/ERDOS_STRAUS_BOUNDED_STRESS.md.
The van der Waerden lower-bound example is a third finite/bounded registry benchmark. It generates explicit two-coloring certificates for 3 ≤ n ≤ 8 with no monochromatic 3-term arithmetic progression and proves the public Lean theorem van_der_waerden_two_color_three_term_lower_bound_eight; see docs/VAN_DER_WAERDEN_LOWER_BOUND_BOUNDED.md.
Validate the package and examples with:
npm run validate
npm run typecheck
npm run validate:extension
npm run validate:schemas
npm run validate:artifacts
npm run validate:evidence
npm run validate:generated
npm run audit:benchmarks -- --dry-run
npm run audit:benchmarks -- --out-root /tmp/pi-autocontext-erdos-benchmark-audits
npm run smoke:installed
npm run validate:example
npm run validate:lake-example
AutoContext-backed tool
The package registers one consolidated Pi tool:
autocontext_erdos
Actions:
preflight: check Lean anduvx autoctx improveavailability.init_problem: create the run artifact layout.verify_file: run Lean on a formalization and write attempt artifacts.autocontext_improve: invokeautoctx improvewith a Lean verifier command and save prompt/rubric/logs/best candidate/verdict.export_fixed_template: freeze a named theorem into aTheorem.template.leanfixture with a{{PROOF}}hole forpi-autocontext-lean-verifyproof-body repair.audit_theorems: final-artifact audit that verifies one or more Lean files, optionally runslake build <target>first, runs#print axioms, checks forbidden constructs, compares per-file imports to optional baselines, and records environment/toolchain hashes.lemma_graph: initialize, propose candidate lemmas, validate, summarize, advance, and record results for dependency graphs underlemmas/lemma_graph.json. Theproposemode merges agent-generated candidate nodes asplanned, skipping duplicates and recorded dead ends.generate_bounded_certificates: generate or check bounded Erdős–Straus certificate runs with reproducible artifact hashes; Lean verification is still required for proof claims.validate_artifacts: validate schema-stamped bounded certificate, artifact integrity, theorem-audit, and stress artifacts under a run root/audit root.summarize_run: summarize files and attempt verdicts under a run root.
The standalone registry-driven audit runner reads benchmarks/index.json and invokes the same Lean/Lake audit stages for every registered benchmark. Use --list or --dry-run to inspect the plan before running Lean/Lake:
node scripts/audit-benchmarks.mjs --list
node scripts/audit-benchmarks.mjs --dry-run
ELAN_HOME=/tmp/autocontext-elan-home \
LEAN_BIN=/tmp/autocontext-elan-home/bin/lean \
node scripts/audit-benchmarks.mjs --out-root /tmp/pi-autocontext-erdos-benchmark-audits
Release evidence can be cross-checked against npm, git tags, and GitHub Actions runs:
node scripts/validate-evidence.mjs --file docs/release-evidence/<version>.json --cross-check
The AutoContext action uses direct Lean verification by default:
uvx --python 3.12 --from autocontext==0.5.1 autoctx improve ... --verify-cmd "<lean> {file}"
For Mathlib or local Lake-project imports, pass verificationMode="lake" and projectRoot; verification and AutoContext rechecks then use lake env lean <file>. Multi-module generated examples such as the chunked Erdős–Straus certificate benchmark are built with lake build Formalization before root/candidate rechecks. For final audits, pass lakeBuildTarget="Formalization" plus auditFiles to capture the Lake build, per-module Lean verification, per-module import baselines, and per-theorem axiom results in one audit root. init_problem can scaffold a Mathlib-aware Lake project with withMathlib=true. See docs/LAKE_MATHLIB_SUPPORT.md and docs/THEOREM_AUDIT.md.
For whole-file repair, pass preserveTheorems to require exact normalized theorem-statement preservation before a Lean-verified candidate is accepted:
{
"action": "autocontext_improve",
"preserveTheorems": ["mod_three_eq_zero_or_one_or_two", "egz_three_nat"]
}
If Lean accepts a candidate but a preserved theorem statement changes, the verdict is theorem_statement_changed with Accepted: false.
For the compact EGZ n = 3 repair loop, current validation recommends timeoutSeconds=240; timeoutSeconds=120 is useful but budget-limited for this task.
When a theorem statement is ready to freeze, use export_fixed_template to hand off proof-body-only repair to pi-autocontext-lean-verify; see docs/LEAN_VERIFY_HANDOFF.md.
Before treating a final theorem as a serious result, run audit_theorems; see docs/THEOREM_AUDIT.md.
MVP workflow
A typical run creates a durable folder such as:
runs/<problem-slug>/
problem.md
source_notes.md
formalization/
Formalization.lean
formalization_notes.md
lemmas/
lemma_plan.md
attempts/
0001/
prompt.md
candidate.lean
lean.stderr.txt
verdict.md
reports/
final_report.md
Every mathematical claim in the final report must be labeled as one of:
- Lean verified;
- formalization draft;
- conjecture;
- heuristic/proof sketch;
- failed attempt.
Guardrail posture
Never count a proof unless Lean verifies the exact theorem or unchanged template. Do not use sorry, admit, new axioms, unsafe, theorem weakening, proof-by-importing-answer, or hidden assumptions as success criteria.