NameetP/pdfmux
Developer ProductivitySelf‑healing PDF extractor that audits output, re‑extracts problematic pages and supports multiple backends for clean LLM‑ready data
Features
- Per‑page confidence scoring and automatic re‑extraction of failures
- Rule‑based routing to five specialized extractors (PyMuPDF, OpenDataLoader, RapidOCR, Docling, Surya) plus BYOK LLM fallback
- CLI and Python API for single files, batch directories, streaming NDJSON, watching folders and CI‑friendly strict mode
- Zero‑config defaults with optional extras via pip extras (OCR, tables, schemas, profiling, etc.)
Recent releases
View all 13 releases →Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
About
Stars
66
Forks
8
Languages
Python
JavaScript
Shell
Downloads/week
7
↑550%
NPM Maintainers
1
Single npm maintainer
Contributors
5
Install & Platforms
Install via
pip
Similar tools
Alternative to
LlamaParse