This release includes 2 breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+10 more
Affected surfaces
ReleasePort's take
Moderate signalThe default PDF parser now uses pypdfium2 instead of pymupdf4llm, removing the AGPL dependency. The pymupdf packages are removed from core and offered as optional extras.
Why it matters: Affects any code relying on pymupdf4llm or pymupdf for PDF parsing; migration required before next upgrade cycle.
Summary
AI summaryUpdates What changed, Breaking change & migration, and https://github.com/py-pdf/benchmarks across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Breaking | High |
Default PDF parser switched from pymupdf4llm to pypdfium2; AGPL dependency removed. Default PDF parser switched from pymupdf4llm to pypdfium2; AGPL dependency removed. Source: llm_adapter@2026-05-29 Confidence: high |
— |
| Feature | Low |
`pypdfium2` added as core dependency; `PyPDFium2Parser` extracts text page‑by‑page. `pypdfium2` added as core dependency; `PyPDFium2Parser` extracts text page‑by‑page. Source: llm_adapter@2026-05-29 Confidence: high |
— |
| Deprecation | Medium |
`pymupdf4llm` and `pymupdf` removed from core dependencies; available as opt‑in extras. `pymupdf4llm` and `pymupdf` removed from core dependencies; available as opt‑in extras. Source: llm_adapter@2026-05-29 Confidence: high |
— |
Full changelog
Permissive-by-default PDF parsing — no AGPL in the default install
Langroid is MIT-licensed, but until now a plain pip install langroid pulled in
pymupdf4llm (and transitively pymupdf), which are AGPL-3.0 licensed. This
release removes that AGPL dependency from the default install and switches the
default PDF parser to the permissively-licensed pypdfium2
(Apache-2.0 / BSD-3-Clause). Resolves #1026.
What changed
pypdfium2is now the default PDF parser, added as a core dependency. A new
PyPDFium2Parserextracts text page-by-page via PDFium.pymupdf4llm/pymupdfremoved from core dependencies. They remain
available as opt-in extras:doc-chat,pdf-parsers,all, orpymupdf4llm.DocChatAgentnow defaults topypdfium2as well, so document-chat works
out of the box on a bare install with no AGPL code.- Per the py-pdf benchmarks,
pypdfium2
matches or exceedspymupdfon raw text-extraction accuracy.
Breaking change & migration
-
A bare
pip install langroidno longer installspymupdf4llm/pymupdf, and the
default PDF parser now emits plain text rather thanpymupdf4llm's structured
Markdown. -
If you want the richer Markdown extraction (headers, tables, multi-column reflow)
frompymupdf4llm, install an extra and select it explicitly:pip install "langroid[doc-chat]" # or [pdf-parsers], [all], [pymupdf4llm]from langroid.parsing.parser import ParsingConfig, PdfParsingConfig cfg = ParsingConfig(pdf=PdfParsingConfig(library="pymupdf4llm"))
Thanks to @alexagr for reporting the licensing issue (#1026). See #1028 for details.
Breaking Changes
- Removed core dependencies `pymupdf4llm` and `pymupdf`; they are now opt‑in extras (doc-chat, pdf-parsers, all, pymupdf4llm).
- Default PDF parser changed from `pymupdf4llm` to `pypdfium2`, altering output from structured Markdown to plain text.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
Related context
Related tools
Beta — feedback welcome: [email protected]