AIMLPM/markcrawl

v0.4.1 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 3mo RAG & Retrieval

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-agents anthropic-claude data-extraction gemini ingestion-pipeline llm

+9 more

markdown-extraction openai pgvector python sitemap-crawler structured-data supabase vector-db webcrawler

Summary

AI summary

Images now preserve alt text as inline references and CrawlResult includes a pages list for direct programmatic access.

Full changelog

What's new

Image alt text preservation

Images are no longer silently stripped. Alt text and figcaptions are extracted as [Image: description] inline references, preserving context from diagrams, architecture charts, and annotated screenshots. Figcaptions take priority over alt text when both are present.

Python API: `result.pages`

CrawlResult now includes a pages list of PageData objects for direct programmatic access:

import markcrawl

result = markcrawl.crawl("https://example.com", out_dir="./output")
for page in result.pages:
    print(page.url, page.title)
    chunks = markcrawl.chunk_markdown(page.content)

No more parsing JSONL files to use crawl results in code.

Benchmark documentation

New docs/BENCHMARKS.md with self-contained speed, quality, and cost comparisons across 7 tools. Full methodology at llm-crawler-benchmarks.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track AIMLPM/markcrawl

Get notified when new releases ship.

About AIMLPM/markcrawl

Crawl websites into clean Markdown, search pages, and extract structured data with LLMs. Built-in MCP server for web research and RAG pipelines.

All releases →

AIMLPM/markcrawl

Summary

What's new

Image alt text preservation

Python API: `result.pages`

Benchmark documentation

Related context

Related tools

AIMLPM/markcrawl

Summary

What's new

Image alt text preservation

Python API: result.pages

Benchmark documentation

Related context

Related tools

Python API: `result.pages`