Skip to content

NameetP/pdfmux

Developer Productivity

Self‑healing PDF extractor that audits output, re‑extracts problematic pages and supports multiple backends for clean LLM‑ready data

Python Latest v1.6.4 · 29d ago Security brief →

Features

  • Per‑page confidence scoring and automatic re‑extraction of failures
  • Rule‑based routing to five specialized extractors (PyMuPDF, OpenDataLoader, RapidOCR, Docling, Surya) plus BYOK LLM fallback
  • CLI and Python API for single files, batch directories, streaming NDJSON, watching folders and CI‑friendly strict mode
  • Zero‑config defaults with optional extras via pip extras (OCR, tables, schemas, profiling, etc.)

Recent releases

View all 13 releases →
No immediate action
v1.6.4 Breaking risk

audit command + upsell line

Review required
v1.6.3 Bug fix
Auth

Confidence fix

No immediate action
v1.6.2 Maintenance

Routine maintenance and dependency updates.

No immediate action
v1.6.1 Breaking risk

Strict threshold + batch_extract

No immediate action
v1.6.0 New feature

Extraction backends + RTL + caching

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
66
Forks
8
Languages
Python JavaScript Shell
Downloads/week
7 ↑550%
NPM Maintainers
1 Single npm maintainer
Contributors
5

Install & Platforms

Install via
pip

Alternative to

LlamaParse

Beta — feedback welcome: [email protected]