Skip to content

0xMassi/webclaw

v0.2.0 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo MCP Developer Tools
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai ai-agents ai-scraping cli crawler data-extraction
+13 more
firecrawl-alternative html-to-markdown llm markdown mcp mcp-server rust self-hosted tls-fingerprinting web-crawler web-extraction web-scraper web-scraping

Summary

AI summary

webclaw now auto-detects and extracts DOCX, XLSX/XLS, and CSV files into markdown or JSON.

Full changelog

v0.2.0 — Major feature release

Document Extraction

webclaw now auto-detects and extracts content from document files:

  • DOCX — Word documents parsed into markdown with headings preserved
  • XLSX/XLS — Spreadsheets converted to markdown tables (multi-sheet support)
  • CSV — Parsed with quoted field handling, output as markdown table

Auto-detected by Content-Type header or URL extension. Works in batch mode too:

webclaw https://example.com/report.docx
webclaw https://example.com/data.xlsx -f json
webclaw --urls-file mixed-urls.txt --output-dir ./results

HTML Output Format

webclaw https://example.com -f html

Returns sanitized HTML. Works with crawl, batch, and --output-dir (.html extension).

Multi-URL Watch

echo "https://site1.com/pricing
https://site2.com/status" > urls.txt
webclaw --urls-file urls.txt --watch --watch-interval 300 --webhook "https://discord.com/..."

Monitors all URLs in parallel. Reports aggregate changes per check.

Batch + LLM Extraction

webclaw --urls-file sites.txt --extract-prompt "get email and phone" --output-dir results

Combines batch fetching with LLM extraction. Processes URLs sequentially to respect rate limits.

Full changelog: https://github.com/0xMassi/webclaw/blob/main/CHANGELOG.md

Full Changelog: https://github.com/0xMassi/webclaw/compare/v0.1.7...v0.2.0

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track 0xMassi/webclaw

Get notified when new releases ship.

Sign up free

About 0xMassi/webclaw

Web content extraction for AI agents. 10 tools: scrape, crawl, map, batch, extract, summarize, diff, brand, search, research. TLS fingerprinting bypasses anti-bot without a browser. 67% fewer tokens than raw HTML. `npx create-webclaw` auto-configures Claude, Cursor, Windsurf, Codex, OpenCode.

All releases →

Beta — feedback welcome: [email protected]