Skip to content

Scrapling

v0.4.8 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai ai-scraping automation crawler crawling crawling-python
+14 more
data data-extraction mcp mcp-server playwright python scraping selectors stealth web-scraper web-scraping web-scraping-python webscraping xpath

ReleasePort's take

Moderate signal
editorial:auto 13d

Scrapling 0.4.8 adds LinkExtractor, CrawlSpider, and SitemapSpider spider templates, plus critical fixes for request fingerprinting and Fetcher.configure application.

Why it matters: Fingerprinting bug causes duplicate requests; apply fixes immediately if affected. New templates accelerate development; adaptive relocation defaults to 40% similarity threshold. Test templates in dev.

Summary

AI summary

Added LinkExtractor, CrawlSpider/CrawlRule, and SitemapSpider templates plus adaptive relocation threshold change.

Changes in this release

Feature Medium

Added LinkExtractor primitive to scrapling.spiders.LinkExtractor for URL extraction.

Added LinkExtractor primitive to scrapling.spiders.LinkExtractor for URL extraction.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Introduced CrawlSpider and CrawlRule templates for automated link following.

Introduced CrawlSpider and CrawlRule templates for automated link following.

Source: llm_adapter@2026-05-21

Confidence: high

Feature Medium

Added SitemapSpider template to crawl from sitemaps or robots.txt URLs.

Added SitemapSpider template to crawl from sitemaps or robots.txt URLs.

Source: llm_adapter@2026-05-21

Confidence: high

Performance Medium

Adaptive relocation now defaults to 40% similarity threshold for better accuracy.

Adaptive relocation now defaults to 40% similarity threshold for better accuracy.

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

Fixed Fetcher.configure not applying to per-request calls; also fixed in AsyncFetcher.

Fixed Fetcher.configure not applying to per-request calls; also fixed in AsyncFetcher.

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

Resolved incorrect request fingerprinting causing duplicate requests in spiders.

Resolved incorrect request fingerprinting causing duplicate requests in spiders.

Source: llm_adapter@2026-05-21

Confidence: high

Bugfix Medium

Fixed Adaptive scraping engine staying silent on weak matches; now warns instead.

Fixed Adaptive scraping engine staying silent on weak matches; now warns instead.

Source: llm_adapter@2026-05-21

Confidence: high

Full changelog

A big spider update that takes the crawling framework to the next level 🕷️

[!NOTE]
Follow us on X for daily tips and tricks

🚀 New Stuff and quality of life changes

  • Added a LinkExtractor primitive in scrapling.spiders.LinkExtractor to pull URLs out of a Response. There are a lot of controls (Check the docs)

    from scrapling.spiders import LinkExtractor
    
    extractor = LinkExtractor(allow=r"/posts/", deny_domains=["ads.example.com"])
    
  • Added CrawlSpider and CrawlRule generic spider templates so you no longer have to hand-write the same "follow links matching this pattern" boilerplate. Override rules() to return a list of CrawlRule objects, each pairing a LinkExtractor. (Check the docs)

    from scrapling.spiders import CrawlSpider, CrawlRule, LinkExtractor
    
    class QuotesSpider(CrawlSpider):
        name = "blog"
        start_urls = ["https://quotes.toscrape.com/"]
    
        def rules(self):
            return [
                CrawlRule(LinkExtractor(allow=r"/author/"), callback=self.parse_author),
                CrawlRule(LinkExtractor(allow=r"/page/\d+/")),  # pagination, no callback
            ]
    
        async def parse_author(self, response):
            yield {
                "name": response.css(".author-title::text").get(),
                "birthday": response.css(".author-born-date::text").get(),
                "url": response.url,
            }
    
  • Added a SitemapSpider template that seeds a crawl directly from a sitemap, or robots.txt URLs. Handles gzip-compressed sitemaps, and a lot of controls and options. URLs are dispatched via the crawl rules as shown above for CrawlSpider. (Check the docs)

    from scrapling.spiders import SitemapSpider, CrawlRule, LinkExtractor
    
    class NewsSitemap(SitemapSpider):
        name = "news"
        sitemap_urls = ["https://example.com/robots.txt"]
    
        def rules(self):
            return [
                CrawlRule(LinkExtractor(allow=r"/articles/"), callback=self.parse_article),
            ]
    
        async def parse_article(self, response):
            yield {"url": response.url, "title": response.css("h1::text").get()}
    
  • Adaptive relocation now defaults to a 40% similarity threshold instead of 0 across all methods. This will make the adaptive feature work better. When nothing crosses the threshold, a warning now tells you the top score it did see, so you can lower percentage deliberately if needed.

  • Updated all browsers and fingerprints. Run a new scrapling install --force after updating to refresh the browsers and fingerprints.

🐛 Bug Fixes

  • Fixed Fetcher.configure(...) not applying to per-request calls. Same fix applied to AsyncFetcher.
  • Fixed incorrect request fingerprinting that caused duplicate requests in spiders by @yetval in #255.
  • Fixed the Adaptive scraping engine staying silent on weak matches. Combined with the threshold change above, you now get a warning instead of a misleading "best guess" element when relocation fails.

Docs

  • Refreshed older code examples across the documentation to match the current version.
  • Improved the code copy-paste experience on the docs site and trimmed the agent skill so it uses fewer tokens per invocation.

🙏 Special thanks to the community for all the continuous testing and feedback


Big shoutout to our Platinum Sponsors

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Scrapling

Get notified when new releases ship.

Sign up free

About Scrapling

All releases →

Beta — feedback welcome: [email protected]