Scrapling

v0.4.8 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 24d AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai ai-scraping automation crawler crawling crawling-python

+14 more

data data-extraction mcp mcp-server playwright python scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath

ReleasePort's take

Moderate signal

editorial:auto 13d

Scrapling 0.4.8 adds LinkExtractor, CrawlSpider, and SitemapSpider spider templates, plus critical fixes for request fingerprinting and Fetcher.configure application.

Why it matters: Fingerprinting bug causes duplicate requests; apply fixes immediately if affected. New templates accelerate development; adaptive relocation defaults to 40% similarity threshold. Test templates in dev.

Summary

AI summary

Added LinkExtractor, CrawlSpider/CrawlRule, and SitemapSpider templates plus adaptive relocation threshold change.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	Added LinkExtractor primitive to scrapling.spiders.LinkExtractor for URL extraction. Added LinkExtractor primitive to scrapling.spiders.LinkExtractor for URL extraction. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Introduced CrawlSpider and CrawlRule templates for automated link following. Introduced CrawlSpider and CrawlRule templates for automated link following. Source: llm_adapter@2026-05-21 Confidence: high	—
Feature	Medium	Added SitemapSpider template to crawl from sitemaps or robots.txt URLs. Added SitemapSpider template to crawl from sitemaps or robots.txt URLs. Source: llm_adapter@2026-05-21 Confidence: high	—
Performance	Medium	Adaptive relocation now defaults to 40% similarity threshold for better accuracy. Adaptive relocation now defaults to 40% similarity threshold for better accuracy. Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix
Bugfix	Medium	Fixed Fetcher.configure not applying to per-request calls; also fixed in AsyncFetcher. Fixed Fetcher.configure not applying to per-request calls; also fixed in AsyncFetcher. Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	Resolved incorrect request fingerprinting causing duplicate requests in spiders. Resolved incorrect request fingerprinting causing duplicate requests in spiders. Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	Fixed Adaptive scraping engine staying silent on weak matches; now warns instead. Fixed Adaptive scraping engine staying silent on weak matches; now warns instead. Source: llm_adapter@2026-05-21 Confidence: high	—

Full changelog

A big spider update that takes the crawling framework to the next level 🕷️

[!NOTE]
Follow us on X for daily tips and tricks

🚀 New Stuff and quality of life changes

Added a LinkExtractor primitive in scrapling.spiders.LinkExtractor to pull URLs out of a Response. There are a lot of controls (Check the docs)
```
from scrapling.spiders import LinkExtractor

extractor = LinkExtractor(allow=r"/posts/", deny_domains=["ads.example.com"])
```

Added CrawlSpider and CrawlRule generic spider templates so you no longer have to hand-write the same "follow links matching this pattern" boilerplate. Override rules() to return a list of CrawlRule objects, each pairing a LinkExtractor. (Check the docs)

from scrapling.spiders import CrawlSpider, CrawlRule, LinkExtractor

class QuotesSpider(CrawlSpider):
    name = "blog"
    start_urls = ["https://quotes.toscrape.com/"]

    def rules(self):
        return [
            CrawlRule(LinkExtractor(allow=r"/author/"), callback=self.parse_author),
            CrawlRule(LinkExtractor(allow=r"/page/\d+/")),  # pagination, no callback
        ]

    async def parse_author(self, response):
        yield {
            "name": response.css(".author-title::text").get(),
            "birthday": response.css(".author-born-date::text").get(),
            "url": response.url,
        }

Added a SitemapSpider template that seeds a crawl directly from a sitemap, or robots.txt URLs. Handles gzip-compressed sitemaps, and a lot of controls and options. URLs are dispatched via the crawl rules as shown above for CrawlSpider. (Check the docs)

from scrapling.spiders import SitemapSpider, CrawlRule, LinkExtractor

class NewsSitemap(SitemapSpider):
    name = "news"
    sitemap_urls = ["https://example.com/robots.txt"]

    def rules(self):
        return [
            CrawlRule(LinkExtractor(allow=r"/articles/"), callback=self.parse_article),
        ]

    async def parse_article(self, response):
        yield {"url": response.url, "title": response.css("h1::text").get()}

Adaptive relocation now defaults to a 40% similarity threshold instead of 0 across all methods. This will make the adaptive feature work better. When nothing crosses the threshold, a warning now tells you the top score it did see, so you can lower percentage deliberately if needed.
Updated all browsers and fingerprints. Run a new scrapling install --force after updating to refresh the browsers and fingerprints.

🐛 Bug Fixes

Fixed Fetcher.configure(...) not applying to per-request calls. Same fix applied to AsyncFetcher.
Fixed incorrect request fingerprinting that caused duplicate requests in spiders by @yetval in #255.
Fixed the Adaptive scraping engine staying silent on weak matches. Combined with the threshold change above, you now get a warning instead of a misleading "best guess" element when relocation fails.

Docs

Refreshed older code examples across the documentation to match the current version.
Improved the code copy-paste experience on the docs site and trimmed the agent skill so it uses fewer tokens per invocation.

🙏 Special thanks to the community for all the continuous testing and feedback

Big shoutout to our Platinum Sponsors

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Scrapling

Get notified when new releases ship.

About Scrapling

All releases →