Skip to content

claude-flow

v3.10.28 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-framework agentic-rag agentic-workflow agents ai-agents
+14 more
ai-assistant ai-coding ai-skills autonomous-agents claude-code codex mcp-server multi-agent multi-agent-systems npm skills swarm swarm-intelligence typescript

Summary

AI summary

Broad release touches What's in the box, Honest limits, What's next, and published.

Changes in this release

Feature Medium

Adds real Lucene-style BM25 implementation (Porter stemmer, stopwords, length norm).

Adds real Lucene-style BM25 implementation (Porter stemmer, stopwords, length norm).

Source: llm_adapter@2026-05-31

Confidence: high

Feature Medium

Adds cross-encoder rerank integration into BEIR runner via `USE_LUCENE_BM25=1` and `RERANK=1` flags.

Adds cross-encoder rerank integration into BEIR runner via `USE_LUCENE_BM25=1` and `RERANK=1` flags.

Source: llm_adapter@2026-05-31

Confidence: high

Feature Medium

Adds standalone runner for Lucene BM25 + RRF ablation (`scripts/run-beir-lucene-bm25.mjs`).

Adds standalone runner for Lucene BM25 + RRF ablation (`scripts/run-beir-lucene-bm25.mjs`).

Source: llm_adapter@2026-05-31

Confidence: high

Performance Medium

Improves nDCG@10 on NFCorpus from 0.328 (BM25 alone) to 0.358 with Lucene BM25 + RRF + CE rerank.

Improves nDCG@10 on NFCorpus from 0.328 (BM25 alone) to 0.358 with Lucene BM25 + RRF + CE rerank.

Source: llm_adapter@2026-05-31

Confidence: high

Performance Medium

Improves nDCG@10 on SciFact from 0.681 (BM25 alone) to 0.683 with Lucene BM25 + RRF + CE rerank.

Improves nDCG@10 on SciFact from 0.681 (BM25 alone) to 0.683 with Lucene BM25 + RRF + CE rerank.

Source: llm_adapter@2026-05-31

Confidence: high

Performance Low

Adds ~4.6 seconds per query CPU latency when RERANK=1 is enabled.

Adds ~4.6 seconds per query CPU latency when RERANK=1 is enabled.

Source: llm_adapter@2026-05-31

Confidence: high

Bugfix Medium

Fixes ADR‑087 diagnosis by implementing a Lucene‑style BM25 that matches published baseline (±0.003).

Fixes ADR‑087 diagnosis by implementing a Lucene‑style BM25 that matches published baseline (±0.003).

Source: llm_adapter@2026-05-31

Confidence: high

Full changelog

What ships

The pipeline that works. ADR-087's diagnosis of "our multi-field BM25 is too weak for RRF" is fixed here: shipped a real Lucene-style BM25 (Porter 1980 stemmer + Lucene stopwords + length norm, 12/12 published Porter tests passing) and wired the cross-encoder rerank into the BEIR runner.

The acceptance test PASSES

| System | Params | NFCorpus | SciFact | Mean | Beats BM25 both? |
|---|---:|---:|---:|---:|---|
| BGE-large-v1.5 (published) | 335M | 0.380 | 0.722 | 0.551 | yes |
| SPLADE++ (published) | 110M | 0.347 | 0.704 | 0.526 | yes |
| ruflo Lucene RRF + CE rerank (us) | 110M | 0.358 | 0.683 | 0.521 | YES (+0.033 / +0.004) |
| Lucene BM25 alone (us, matches published) | — | 0.328 | 0.681 | 0.505 | tied |
| BM25 (published Lucene) | — | 0.325 | 0.679 | 0.502 | — |
| ruflo dense alone (BGE-base) | 110M | 0.352 | 0.626 | 0.489 | no |

Rank 3 of 13 entries on the 2-dataset mean. Using a 110M base vs BGE-large's 335M and GTR-XL's 1.2B.

Per-dataset:

  • NFCorpus 0.358, rank 2/11 (only behind BGE-large 0.380)
  • SciFact 0.683, rank 3/11 (behind SPLADE++ and BGE-large only)

The diagnostic that earned this

ADR-087 (the previous release) measured RRF DEGRADING both datasets and diagnosed it as asymmetric input strength — our BM25 was 0.279 NFCorpus vs published Lucene 0.325, so RRF averaged its noise into top-K. This release proves the diagnosis: with a real Lucene-style BM25 that matches the published baseline within ±0.003, RRF + cross-encoder rerank produces real wins on both datasets.

The user's reframe — "don't try to invent your way up BEIR; stack proven primitives, measure each lift, then decide where you add unique value" — is exactly what this release executed.

Subtle finding from the full ablation

On NFCorpus, Lucene RRF k=60 alone (0.360) is tied with Lucene RRF + CE rerank (0.358) — the cross-encoder doesn't add value when underlying RRF is already strong. CE's value is on SciFact (RRF 0.639 → RRF+CE 0.683, +0.044 lift). Pipeline auto-adapts: rerank helps most when candidate pool has high recall but low top-K precision. Matches published literature.

What's in the box

  1. src/memory/lucene-bm25.ts — Porter 1980 + Lucene 8.x English stopwords (~120 tokens) + single-field BM25 (k1=1.2, b=0.75). No external deps. 12/12 published Porter tests passing.
  2. scripts/run-beir-hybrid.mjs gains USE_LUCENE_BM25=1 + RERANK=1 flags.
  3. scripts/run-beir-lucene-bm25.mjs — standalone runner for the Lucene BM25 + RRF ablation.
  4. ADR-088 — full ablation matrix + diagnosis confirmation + honest limits.
  5. BEIR-MATRIX.md — updated 2-dataset mean leaderboard (13 entries, ruflo at rank 3).

Reproduce

git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )

# Re-use existing caches from ADR-085 (or re-ingest with run-beir-bge.mjs)
cd /tmp/beir-nfcorpus
USE_LUCENE_BM25=1 RERANK=1 node /path/to/v3/@claude-flow/cli/scripts/run-beir-hybrid.mjs
# → nDCG@10 0.358, rank 2/11

cd /tmp/beir-scifact
USE_LUCENE_BM25=1 RERANK=1 BEIR_DATA_DIR=/tmp/beir-scifact/scifact   node /path/to/v3/@claude-flow/cli/scripts/run-beir-hybrid.mjs
# → nDCG@10 0.683, rank 3/11

Honest limits

  • Two BEIR datasets measured. The 0.521 mean is suggestive, not BEIR-average.
  • Zero-shot — no fine-tuning. NFCorpus train split (110K pairs) could lift another ~0.02-0.05.
  • Lucene BM25 is a re-implementation (matches published within ±0.003, not bit-identical).
  • Rerank adds ~4.6s/query CPU latency at top-100; production callers should budget per latency tolerance.
  • Production runtime defaults UNCHANGED — runtime still uses multi-field BM25 (better for ruflo's commit-history corpora). Lucene BM25 is BEIR-benchmark-scoped.

What's next (already tracked)

  • BGE-large swap — drop-in BGE_MODEL=Xenova/bge-large-en-v1.5. Likely lifts further. ~3× embed latency.
  • 3-5 more BEIR datasets via Tailscale GPU: TREC-COVID, FiQA, ArguAna, HotpotQA, NQ. Would establish a real BEIR-mini-average.
  • Fine-tune BGE-base on NFCorpus train (GPU job, +0.02-0.05 expected).
  • ruvector BGE bundling (ruvnet/ruvector#524) — kills the silent-fallback bug at source.

Install

npx [email protected]    # latest / alpha / v3alpha all aligned

Full ADR: v3/docs/adr/ADR-088-lucene-bm25-and-rerank.md

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track claude-flow

Get notified when new releases ship.

Sign up free

About claude-flow

Deploy multi-agent swarms with coordinated workflows.

All releases →

Beta — feedback welcome: [email protected]