This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+9 more
Summary
AI summaryCrawling performance improved to 15.7 pages/sec with async I/O and parallel processing.
Full changelog
What's new
3x faster crawling — async I/O + ProcessPoolExecutor bypass the GIL for true parallel HTML extraction.
Performance
- Async httpx engine replaces sequential requests — concurrent fetches with
asyncio.gather - ProcessPoolExecutor offloads CPU-bound BeautifulSoup + markdownify to separate processes
- Streaming pipeline via
asyncio.as_completed— pages save as they arrive, no batch-wait - Benchmark: 15.7 pages/sec at concurrency=5 (up from 3.4 p/s in v0.1.1)
How to upgrade
pip install --upgrade markcrawl
The async engine activates automatically when httpx is installed:
pip install markcrawl[http2]
Or use it directly:
from markcrawl import crawl
result = crawl("https://example.com", out_dir="output", concurrency=5)
Full changelog
https://github.com/AIMLPM/markcrawl/compare/v0.1.1...v0.2.0
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About AIMLPM/markcrawl
Crawl websites into clean Markdown, search pages, and extract structured data with LLMs. Built-in MCP server for web research and RAG pipelines.
Beta — feedback welcome: [email protected]