This release adds 2 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+9 more
Summary
AI summaryImages now preserve alt text as inline references and CrawlResult includes a pages list for direct programmatic access.
Full changelog
What's new
Image alt text preservation
Images are no longer silently stripped. Alt text and figcaptions are extracted as [Image: description] inline references, preserving context from diagrams, architecture charts, and annotated screenshots. Figcaptions take priority over alt text when both are present.
Python API: result.pages
CrawlResult now includes a pages list of PageData objects for direct programmatic access:
import markcrawl
result = markcrawl.crawl("https://example.com", out_dir="./output")
for page in result.pages:
print(page.url, page.title)
chunks = markcrawl.chunk_markdown(page.content)
No more parsing JSONL files to use crawl results in code.
Benchmark documentation
New docs/BENCHMARKS.md with self-contained speed, quality, and cost comparisons across 7 tools. Full methodology at llm-crawler-benchmarks.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About AIMLPM/markcrawl
Crawl websites into clean Markdown, search pages, and extract structured data with LLMs. Built-in MCP server for web research and RAG pipelines.
Beta — feedback welcome: [email protected]