This release includes 2 breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+9 more
Affected surfaces
Summary
AI summaryReverts six library defaults to False, restoring v0.8.0 behavior after causing regression in public benchmarks.
Full changelog
Hotfix
v0.9.0 flipped six library defaults to True. On the internal 11-site pool this looked good (+0.014 MRR over champion). On the public llm-crawler-benchmarks pool (different sites), it caused a significant regression — both MRR and crawl speed dropped, and at least one site (huggingface-transformers) went to 0.000 MRR.
v0.9.1 reverts all six defaults to False, restoring v0.8.0 behavior.
What changed
| Default | v0.9.0 | v0.9.1 (this) |
|---|---|---|
| auto_extract_title | True | False (reverted) |
| prepend_first_paragraph | True | False (reverted) |
| strip_markdown_links | True | False (reverted) |
| i18n_filter | True | False (reverted) |
| title_at_top | True | False (reverted) |
| auto_render_js | True | False (reverted) |
The new modules from v0.9.0 — markcrawl.js_detect (SPA detection) and markcrawl.dom_cleanup (overlay stripping) — remain available as opt-in. Pass the corresponding flags or call the modules directly to use them.
Migration
If you upgraded to v0.9.0 and saw degraded behavior, upgrading to v0.9.1 restores the prior behavior with zero changes needed on your side.
If you intentionally relied on the v0.9.0 defaults, opt back in explicitly:
from markcrawl.core import crawl
result = crawl(
base_url=...,
out_dir=...,
i18n_filter=True,
title_at_top=True,
auto_render_js=True,
)
# plus pass auto_extract_title=True etc. to chunker if calling chunker directly
Root cause
Under investigation. The internal-pool +0.014 lift hid that some defaults misbehave on out-of-distribution sites. Likely culprits:
auto_render_js=Truefalsely flagging SSR sites with heavy inline scripts as SPAs, forcing the slower Playwright path- Overlay stripping potentially removing legitimate content on some sites
- One or more chunker defaults interacting poorly with sites we hadn't tested
Will publish a detailed post-mortem once the diagnosis is in. v0.9.1 is the safe state until then.
Install
```bash
pip install 'markcrawl[js]==0.9.1'
```
Breaking Changes
- Default values for `auto_extract_title`, `prepend_first_paragraph`, `strip_markdown_links`, `i18n_filter`, `title_at_top`, and `auto_render_js` reverted from True to False.
- Behavioral change: Previous v0.9.0 defaults caused significant MRR and crawl speed regressions on external benchmark pools.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About AIMLPM/markcrawl
Crawl websites into clean Markdown, search pages, and extract structured data with LLMs. Built-in MCP server for web research and RAG pipelines.
Beta — feedback welcome: [email protected]