NameetP/pdfmux

v1.5.0 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 3mo Developer Productivity

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-agent docling document-parsing llm mcp ocr

+8 more

opendataloader pdf pdf-extraction pdf-to-json pdf-to-markdown python self-healing structured-extraction

Summary

AI summary

Overall benchmark score improved from 0.867 to 0.905.

Full changelog

What's New in v1.5.0

Benchmark Results

0.905 overall benchmark score on opendataloader-bench (200 docs)
Up from 0.867 (v1.3.0) — a +4.4% improvement
100% confidence score across all documents
98 docs improved, only 3 regressed

Key Improvements

Image Table OCR (TEDS: 0.887 → 0.911, +2.7%)

Integrated RapidOCR for tables embedded as images
Smart filtering: 50% fill rate + 30% numeric cell thresholds to avoid false positives on charts

ML Heading Classifier (MHS: 0.844 → 0.852, +0.9%)

ML-based fallback for heading detection when heuristics fail
Improved heading cleanup for cleaner document structure

Column-Aware Reading Order (NID: 0.910 → 0.920)

A/B column reordering: detects multi-column pages, compares both orderings, picks the better one
Safe by design — worst case is no-op (original text preserved)
Conservative detection (200pt gap threshold) to avoid false positives

Install

pip install pdfmux==1.5.0

Full Changelog: https://github.com/NameetP/pdfmux/compare/v1.3.0...v1.5.0

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track NameetP/pdfmux

Get notified when new releases ship.

About NameetP/pdfmux

PDF extraction router with built-in MCP server. Classifies each page (digital, scanned, tables) and routes to the best backend (PyMuPDF, Docling, OCR, or optional LLM fallback)

NameetP/pdfmux

Summary

What's New in v1.5.0

Benchmark Results

Key Improvements

Install

Related context

Related tools

Earlier breaking changes