Config change

v0.4.12 Breaking risk 13h

Auth

AutoThrottle + Export + Auth + Bugfixes + Performance

Open

No immediate action

v0.4.11 Mixed 14d

ShopifySpider + executable-path + crash fix

Open

No immediate action

v0.4.10 Mixed 22d

Scrapy integration + Chromium support + bug fixes

Open

Review required

v0.4.9 Bug fix 1mo

Auth

Proxy argument handling fix

Open

v0.4.8 New feature 2mo

⚠ Upgrade required

Adaptive relocation now defaults to a 40% similarity threshold; lower the threshold if needed and heed the new warning on weak matches.
Run `scrapling install --force` after updating to refresh browsers and fingerprints.

Notable features

Added `LinkExtractor` primitive in `scrapling.spiders.LinkExtractor` for URL extraction with fine‑grained controls.
Introduced `CrawlSpider` and `CrawlRule` templates to simplify "follow links matching a pattern" boilerplate.
Provided `SitemapSpider` template that seeds crawls from sitemaps or `robots.txt`, handling gzip‑compressed sitemaps.

Full changelog

A big spider update that takes the crawling framework to the next level 🕷️

[!NOTE]
Follow us on X for daily tips and tricks

🚀 New Stuff and quality of life changes

Added a LinkExtractor primitive in scrapling.spiders.LinkExtractor to pull URLs out of a Response. There are a lot of controls (Check the docs)
```
from scrapling.spiders import LinkExtractor

extractor = LinkExtractor(allow=r"/posts/", deny_domains=["ads.example.com"])
```

Added CrawlSpider and CrawlRule generic spider templates so you no longer have to hand-write the same "follow links matching this pattern" boilerplate. Override rules() to return a list of CrawlRule objects, each pairing a LinkExtractor. (Check the docs)

from scrapling.spiders import CrawlSpider, CrawlRule, LinkExtractor

class QuotesSpider(CrawlSpider):
    name = "blog"
    start_urls = ["https://quotes.toscrape.com/"]

    def rules(self):
        return [
            CrawlRule(LinkExtractor(allow=r"/author/"), callback=self.parse_author),
            CrawlRule(LinkExtractor(allow=r"/page/\d+/")),  # pagination, no callback
        ]

    async def parse_author(self, response):
        yield {
            "name": response.css(".author-title::text").get(),
            "birthday": response.css(".author-born-date::text").get(),
            "url": response.url,
        }

Added a SitemapSpider template that seeds a crawl directly from a sitemap, or robots.txt URLs. Handles gzip-compressed sitemaps, and a lot of controls and options. URLs are dispatched via the crawl rules as shown above for CrawlSpider. (Check the docs)

from scrapling.spiders import SitemapSpider, CrawlRule, LinkExtractor

class NewsSitemap(SitemapSpider):
    name = "news"
    sitemap_urls = ["https://example.com/robots.txt"]

    def rules(self):
        return [
            CrawlRule(LinkExtractor(allow=r"/articles/"), callback=self.parse_article),
        ]

    async def parse_article(self, response):
        yield {"url": response.url, "title": response.css("h1::text").get()}

Adaptive relocation now defaults to a 40% similarity threshold instead of 0 across all methods. This will make the adaptive feature work better. When nothing crosses the threshold, a warning now tells you the top score it did see, so you can lower percentage deliberately if needed.
Updated all browsers and fingerprints. Run a new scrapling install --force after updating to refresh the browsers and fingerprints.

🐛 Bug Fixes

Fixed Fetcher.configure(...) not applying to per-request calls. Same fix applied to AsyncFetcher.
Fixed incorrect request fingerprinting that caused duplicate requests in spiders by @yetval in #255.
Fixed the Adaptive scraping engine staying silent on weak matches. Combined with the threshold change above, you now get a warning instead of a misleading "best guess" element when relocation fails.

Docs

Refreshed older code examples across the documentation to match the current version.
Improved the code copy-paste experience on the docs site and trimmed the agent skill so it uses fewer tokens per invocation.

🙏 Special thanks to the community for all the continuous testing and feedback

All releases

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

Docs

Big shoutout to our Platinum Sponsors

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

Translations

Big shoutout to our Platinum Sponsors

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

Big shoutout to our Platinum Sponsors

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

Other

Big shoutout to our Platinum Sponsors

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

Coverage/tests improvement

Agent Skill improvement

Docs improvement

Big shoutout to our Platinum Sponsors

Bug fixes

Other

Big shoutout to our Platinum Sponsors

🚀 New Stuff and quality of life changes

🕷️ Spider Framework

🔄 Proxy Rotation

🌐 Browser Fetcher Improvements

🔧 Bug Fixes & Improvements

⚠️ Breaking Changes

🔨 Other Changes

Big shoutout to our biggest Sponsors

Big shoutout to our biggest Sponsors

Breaking changes

Improvements

Fixes

Big shoutout to our biggest Sponsors

What's Changed

Big shoutout to our biggest Sponsors

What's Changed

Big shoutout to our biggest Sponsors

Big shoutout to our biggest Sponsors

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

🔨 Misc

Big shoutout to our biggest Sponsors

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

🔨 Misc

Big shoutout to our biggest Sponsors

🚀 New Stuff and quality of life changes

🐛 Bug Fixes

Big shoutout to our biggest Sponsors

🚀 New Stuff

🐛 Bug Fixes

Big shoutout to our biggest Sponsors

🚀 New Stuff

🐛 Bug Fixes

New Contributors

Big shoutout to our biggest Sponsors

🚀 New Stuff

🐛 Bug Fixes

Big shoutout to our biggest Sponsors

Big shoutout to our biggest Sponsors

🚀 New Stuff

🐛 Bug Fixes

🔨 Misc

🎯 Breaking Changes

Big shoutout to our biggest Sponsors

Scrapling v0.3.1 release notes

Big shoutout to our biggest Sponsors

Scrapling v0.3.0 Release Notes

🚀 Major New Features

Session-Based Architecture

A lot of new stealth/anti-bot Capabilities

AI Integration & MCP Server

New Interactive Web Scraping Shell