This release keeps dependencies and maintenance posture current for teams operating this tool.
✓ No known CVEs patched in this version
Topics
+8 more
Summary
AI summaryMinor fixes and improvements.
Full changelog
Regression-guard release
No behavior changes. Adds 11 behavioral-contract tests for the real-world failure modes that prompted the 1.6.1 work — pinning correct behavior so it can't silently regress.
Test categories
- Truncated PDF streams — the four
pypdf: Stream has ended unexpectedlycases from the original batch run. pdfmux must either recover (PyMuPDF's xref repair) or raise — never silently return empty. - Non-ASCII filenames — CJK + full-width punctuation (
Coolsoft test reports(原版).pdf). Bothextract_textandbatch_extractmust accept these without shell-quoting issues. - Arabic-only PDFs — the BiDi post-processor must not crash on RTL text.
- 0-byte files — must raise a named
PdfmuxError, never silently return empty. - HTML files renamed to
.pdf— common when 'view as PDF' saves the page source. Must error cleanly OR return text without HTML markup. - Missing files — must raise
FileError, not a bareFileNotFoundError. - Batch isolation — a bad file in
batch_extractmust yield an exception for that file without poisoning the rest of the batch.
Numbers
- 670 passing (was 659 in 1.6.1)
- 0 behavior changes
- 0 source code changes — tests-only release
Install: pip install -U pdfmux
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About NameetP/pdfmux
PDF extraction router with built-in MCP server. Classifies each page (digital, scanned, tables) and routes to the best backend (PyMuPDF, Docling, OCR, or optional LLM fallback)
Related context
Related tools
Beta — feedback welcome: [email protected]