Skip to content

claude-flow

v3.10.20 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-framework agentic-rag agentic-workflow agents ai-agents
+14 more
ai-assistant ai-coding ai-skills autonomous-agents claude-code codex mcp-server multi-agent multi-agent-systems npm skills swarm swarm-intelligence typescript

Summary

AI summary

Broad release touches What changed in code, What's next, Honest limits, and opt-in.

Changes in this release

Feature Medium

Adds opt-in cross‑encoder reranker (Xenova/ms-marco-MiniLM-L-6-v2) improving top‑1 from 80% to 90% and top‑3 from 80% to 100%.

Adds opt-in cross‑encoder reranker (Xenova/ms-marco-MiniLM-L-6-v2) improving top‑1 from 80% to 90% and top‑3 from 80% to 100%.

Source: llm_adapter@2026-05-30

Confidence: high

Feature Medium

Introduces three new MCP tool parameters: rerank (bool), hybridWeight, and ceWeight with defaults 0, 0.5, 0.5 respectively.

Introduces three new MCP tool parameters: rerank (bool), hybridWeight, and ceWeight with defaults 0, 0.5, 0.5 respectively.

Source: llm_adapter@2026-05-30

Confidence: high

Feature Low

Adds five new tests covering graceful‑degradation contract of the cross‑encoder reranker.

Adds five new tests covering graceful‑degradation contract of the cross‑encoder reranker.

Source: llm_adapter@2026-05-30

Confidence: high

Refactor Low

Refactors reranker loading to lazy‑load via direct `AutoTokenizer` + `AutoModelForSequenceClassification`; fails fast after first load failure.

Refactors reranker loading to lazy‑load via direct `AutoTokenizer` + `AutoModelForSequenceClassification`; fails fast after first load failure.

Source: llm_adapter@2026-05-30

Confidence: high

Full changelog

What ships

Cross-encoder reranker (opt-in) — Xenova/ms-marco-MiniLM-L-6-v2 (int8,
~30MB) lazy-loaded via @xenova/transformers, gracefully degrading when
unavailable. Pushes top-1 from 80% → 90% and top-3 from 80% → 100% on
the same A/B harness.

Cumulative SOTA push (3.10.17 → 3.10.20)

| Metric | 3.10.17 cosine | 3.10.18 hybrid | 3.10.19 multi-field | 3.10.20 +rerank |
|---|---:|---:|---:|---:|
| Top-1 hit rate | 0% | 50% | 80% | 90% |
| Top-3 hit rate | 0% | 70% | 80% | 100% |
| MRR@3 | 0.000 | 0.583 | 0.800 | 0.933 |
| Top-1 diversity | 100% | 80% | 100% | 100% |
| Avg query latency | 29 ms | 41 ms | 39 ms | 984 ms (opt-in) |

The ablation that drove the architecture

| Configuration | Top-1 | Top-3 | MRR@3 |
|---|:---:|:---:|:---:|
| Hybrid only (3.10.19) | 8/10 | 8/10 | 0.800 |
| Cross-encoder alone (over top-30 pool) | 6/10 | 10/10 | 0.733 |
| Combined 0.5·hybrid + 0.5·CE (3.10.20 default) | 9/10 | 10/10 | 0.933 |

Cross-encoder alone finds all relevant docs in top-3 but loses top-1 —
MS MARCO's calibration on short commit subjects is noisy. Hybrid is the
opposite: strong top-1, weaker top-3. Linear combination captures both.

Weight grid-search confirms a broad plateau:

| hybrid : ce | top-1 | top-3 | MRR@3 |
|---|:---:|:---:|:---:|
| 0.5 : 0.5 (default) | 9/10 | 10/10 | 0.933 |
| 0.4 : 0.6 | 9/10 | 10/10 | 0.933 |
| 0.3 : 0.7 | 9/10 | 10/10 | 0.933 |

Why opt-in

Latency cost is ~25× hybrid (1.0 s vs 39 ms per query at N=385). The default
hybrid path stays for hot paths and batch retrieval. Callers needing SOTA
relevance flip {rerank: true} per call.

What changed in code

  1. src/memory/cross-encoder-rerank.ts — lazy-loaded singleton via direct
    AutoTokenizer + AutoModelForSequenceClassification. The xenova v2
    pipeline('text-classification') API can't handle {text, text_pair} pairs
    reliably; the lower-level API does. Handles single-logit (sigmoid) AND
    binary-logit (softmax) heads.

  2. One-shot load policy — after a failed load, subsequent calls return
    null immediately. No retry loops in hot paths.

  3. neural_patterns MCP tool — three new params:

    • rerank: boolean (default false)
    • hybridWeight: number (default 0.5)
    • ceWeight: number (default 0.5)
    • Response includes crossEncoderScore when rerank is on.
  4. 5 new tests in __tests__/cross-encoder-rerank.test.ts covering the
    graceful-degradation contract (no network needed — forces failure with a
    guaranteed-bad model name).

Reproduce

git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )

# Unit tests (no network) — 44 total
( cd v3/@claude-flow/cli && npx vitest run __tests__/cross-encoder-rerank.test.ts __tests__/hybrid-retrieval.test.ts __tests__/pretrain-from-github.test.ts )

# Live A/B (cross-encoder downloads ~30MB on first run)
cd v3/@claude-flow/cli
node scripts/pretrain-from-github.mjs
node scripts/benchmark-pretrained-retrieval.mjs              # 3.10.19 default → 80% top-1
RERANK=1 node scripts/benchmark-pretrained-retrieval.mjs     # 3.10.20 + rerank → 90%/100%
HYBRID=0 node scripts/benchmark-pretrained-retrieval.mjs     # cosine baseline → 0%

Honest limits

  • N=385, 10 queries, regex-relevance proxy. Direction (0% → 90% top-1) is
    robust to noise; absolute numbers could shift on a different corpus. A
    labelled held-out evaluation is the right next gauge.
  • 30 MB cross-encoder model downloads on first run. Subsequent runs hit
    local cache.
  • The remaining 10% top-1 gap is one query that the regex can't see clearly
    — may be genuinely ambiguous or a regex-proxy artefact.

What's next

  • Labelled held-out corpus for tighter relevance confidence intervals
  • Larger cross-encoder (ms-marco-MiniLM-L-12-v2) if quality matters more
    than latency
  • Learned distiller (#2241 round-D) — still tracked

Install

npx [email protected]    # latest / alpha / v3alpha all aligned

Full ADR: v3/docs/adr/ADR-080-cross-encoder-reranker.md

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track claude-flow

Get notified when new releases ship.

Sign up free

About claude-flow

Deploy multi-agent swarms with coordinated workflows.

All releases →

Related context

Beta — feedback welcome: [email protected]