Skip to content

AIMLPM/markcrawl

RAG & Retrieval

Turn any webpage or website into clean Markdown for LLM pipelines with a single command.

Python Latest v0.11.1 · 22d ago Security brief →

Features

  • Crawl and convert webpages to clean Markdown files
  • Generate a structured JSONL index of crawled pages including citations
  • Optional binary downloads (PDF, DOCX) and local ML embedding stack

Recent releases

View all 23 releases →
No immediate action
v0.11.1 Breaking risk

Reject print aggregator URLs

Review required
v0.11.0 Breaking risk
Dependencies

Binary downloads + filters

Config change
v0.10.6 Breaking risk
Auth

Respect robots flag and audit

No immediate action
v0.10.5 Breaking risk

Auto‑scope broadening

No immediate action
v0.10.4 Breaking risk

Idle‑timeout reset behavior

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
2
Forks
0
Languages
Python HTML Shell

Install & Platforms

Install via
pip

Beta — feedback welcome: [email protected]