docling
RAG & RetrievalA document‑processing library that parses many file formats (PDF, Office, audio, images, etc.) and integrates with generative‑AI ecosystems
Features
- Parse multiple document formats including PDF, DOCX, PPTX, XLSX, HTML, audio (WAV/MP3), images, LaTeX, plain‑text and more
- Advanced PDF understanding – layout, tables, code, formulas, image classification, etc.
- Unified DoclingDocument representation for seamless AI integrations (LangChain, LlamaIndex, Crew AI, Haystack)
- Extensive OCR support for scanned documents and images
- CLI tool and Python API with export options (Markdown, HTML, JSON, WebVTT, etc.)
Recent releases
View all 77 releases →Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
About
Stars
60,894
Forks
4,243
Languages
Python
Shell
Dockerfile
Install & Platforms
Install via
pip
Platforms
linux
macos
windows
arm64