Skip to content

AIMLPM/markcrawl

v0.3.1 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 1mo RAG & Retrieval
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-agents anthropic-claude data-extraction gemini ingestion-pipeline llm
+9 more
markdown-extraction openai pgvector python sitemap-crawler structured-data supabase vector-db webcrawler

Summary

AI summary

Added --include-path flag to restrict crawling to specified URL patterns.

Full changelog

What's new

--include-path: only crawl what you want

The inverse of --exclude-path — only URLs matching at least one include pattern are crawled.

markcrawl --base https://example.com \
  --include-path "/blog/*" --include-path "/pricing" \
  --max-pages 200 --out ./output
  • Can be repeated for multiple patterns
  • Seed URLs bypass the filter so the entry point is always fetched for link discovery
  • Exclude takes priority over include when both are set
  • Works in both CLI and Python API: crawl(..., include_paths=["/blog/*"])

Full changelog

https://github.com/AIMLPM/markcrawl/compare/v0.3.0...v0.3.1

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track AIMLPM/markcrawl

Get notified when new releases ship.

Sign up free

About AIMLPM/markcrawl

Crawl websites into clean Markdown, search pages, and extract structured data with LLMs. Built-in MCP server for web research and RAG pipelines.

All releases →

Related context

Beta — feedback welcome: [email protected]