This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+13 more
Summary
AI summarywebclaw now auto-detects and extracts DOCX, XLSX/XLS, and CSV files into markdown or JSON.
Full changelog
v0.2.0 — Major feature release
Document Extraction
webclaw now auto-detects and extracts content from document files:
- DOCX — Word documents parsed into markdown with headings preserved
- XLSX/XLS — Spreadsheets converted to markdown tables (multi-sheet support)
- CSV — Parsed with quoted field handling, output as markdown table
Auto-detected by Content-Type header or URL extension. Works in batch mode too:
webclaw https://example.com/report.docx
webclaw https://example.com/data.xlsx -f json
webclaw --urls-file mixed-urls.txt --output-dir ./results
HTML Output Format
webclaw https://example.com -f html
Returns sanitized HTML. Works with crawl, batch, and --output-dir (.html extension).
Multi-URL Watch
echo "https://site1.com/pricing
https://site2.com/status" > urls.txt
webclaw --urls-file urls.txt --watch --watch-interval 300 --webhook "https://discord.com/..."
Monitors all URLs in parallel. Reports aggregate changes per check.
Batch + LLM Extraction
webclaw --urls-file sites.txt --extract-prompt "get email and phone" --output-dir results
Combines batch fetching with LLM extraction. Processes URLs sequentially to respect rate limits.
Full changelog: https://github.com/0xMassi/webclaw/blob/main/CHANGELOG.md
Full Changelog: https://github.com/0xMassi/webclaw/compare/v0.1.7...v0.2.0
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About 0xMassi/webclaw
Web content extraction for AI agents. 10 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research. TLS fingerprinting bypasses anti-bot without a browser. 67% fewer tokens than raw HTML. `npx create-webclaw` auto-configures Claude, Cursor, Windsurf, Codex, OpenCode.
Related context
Beta — feedback welcome: [email protected]