claude-flow

v3.10.23 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 1mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-framework agentic-workflow agents ai-agents ai-assistant

+14 more

ai-coding ai-skills autonomous-agents claude-code codex harness mcp-server multi-agent multi-agent-systems npm skills swarm swarm-intelligence typescript

Summary

AI summary

Updates What changed in code, labelled, and v3/docs/adr/ADR-083-joint-rerank-grid.md across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	`subjectWeight` default now conditional on `useRerank` flag (3.0 when reranking, 2.0 otherwise). `subjectWeight` default now conditional on `useRerank` flag (3.0 when reranking, 2.0 otherwise). Source: llm_adapter@2026-05-30 Confidence: high	—
Feature	Medium	Updated default hybrid and cross‑encoder weights to hw=0.7, cw=0.3. Updated default hybrid and cross‑encoder weights to hw=0.7, cw=0.3. Source: llm_adapter@2026-05-30 Confidence: high	—
Feature	Low	Extended `scripts/grid-search-retrieval.mjs` with joint rerank sweep (28 configs). Extended `scripts/grid-search-retrieval.mjs` with joint rerank sweep (28 configs). Source: llm_adapter@2026-05-30 Confidence: high	—
Refactor	Low	Updated schema descriptions to reflect conditional defaults. Updated schema descriptions to reflect conditional defaults. Source: llm_adapter@2026-05-30 Confidence: high	—

Full changelog

What ships

Joint rerank re-grid — the rerank path's hybrid sub-params (α, sw) had been
tuned against the OLD α=0.6/sw=3.0 baseline; with ADR-082 changing α/sw under it,
a joint re-grid was the next ceiling-raiser. It paid off: rerank nDCG@3 0.900 → 0.963.

The key finding

The rerank path wants different hybrid sub-params than the non-rerank path:

| Path | Best α | Best sw | Best hw/cw | nDCG@3 |
|---|---:|---:|---|---:|
| Non-rerank (hybrid only) | 0.5 | 2.0 | — | 0.963 |
| Rerank | 0.5 | 3.0 | hw=0.7 cw=0.3 | 0.963 |

When the cross-encoder is doing semantic understanding downstream, the hybrid
stage can be more keyword-focused (higher subjectWeight). When hybrid is
the final stage, lower subjectWeight gives body tokens room to contribute.

Implementation: subjectWeight default is now conditional on rerank flag
(3.0 when reranking, 2.0 otherwise). Explicit param overrides.

The win

| Metric (rerank path, labelled) | 3.10.22 | 3.10.23 | Δ |
|---|---:|---:|---:|
| Label top-1 | 90% | 90% | tied |
| Label top-3 | 90% | 100% | +10pp |
| Label MRR@3 | 0.925 | 0.950 | +0.025 |
| Label precision@3 | 0.700 | 0.700 | tied |
| Label nDCG@3 | 0.900 | 0.963 | +0.063 (+7%) |
| Label nDCG@5 | 0.904 | 0.944 | +0.040 |

Both paths now at corpus ceiling (nDCG@3 = 0.963)

The choice between them is now purely cost vs richness:

| Path | Latency | Top-3 precision | Use when |
|---|---:|---:|---|
| Hybrid | 39 ms | 0.533 | hot paths, throughput-bound |
| Rerank | 1000 ms | 0.700 | richness-first, latency-tolerant |

Cumulative SOTA push since cosine baseline (3.10.17 → 3.10.23)

| Metric (labelled) | 3.10.17 | 3.10.19 | 3.10.20 | 3.10.22 | 3.10.23 |
|---|---:|---:|---:|---:|---:|
| Hybrid nDCG@3 | 0.000 | 0.900 | 0.900 | 0.963 | 0.963 |
| Rerank nDCG@3 | — | — | 0.913 | 0.900 | 0.963 |
| Hybrid top-3 | 0% | 90% | 90% | 100% | 100% |
| Rerank top-3 | — | — | 100% | 90% | 100% |
| Rerank precision@3 | — | — | 0.667 | 0.700 | 0.700 |

What changed in code

subjectWeight default is now conditional on useRerank in src/mcp-tools/neural-tools.ts (3.0 if reranking, 2.0 otherwise).
hybridWeight / ceWeight defaults updated to grid winners: 0.5/0.5 → 0.7/0.3.
scripts/grid-search-retrieval.mjs extended with joint rerank sweep (28 configs across hw/cw × α × sw).
Schema descriptions updated to reflect the conditional defaults.

Pending for next iteration

Cross-repo generalisation test — all numbers in ADRs 077-083 are on the
ruflo corpus. The real SOTA test is "does this hold up on a different repo's
history?" Pretrain on agentdb / agentic-flow, run a similar labelled bench,
see if nDCG@3 stays near 0.96. Tracked for 3.10.24 (or its own ADR-084).

Reproduce

git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )
node v3/@claude-flow/cli/scripts/pretrain-from-github.mjs

# Joint grid (~25 min)
cd v3/@claude-flow/cli && node scripts/grid-search-retrieval.mjs

# Verify both paths at corpus ceiling
BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs            # hybrid → nDCG@3 0.963
RERANK=1 BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs   # rerank → nDCG@3 0.963 (was 0.900)

Install

npx [email protected]    # latest / alpha / v3alpha all aligned

Full ADR: v3/docs/adr/ADR-083-joint-rerank-grid.md

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track claude-flow

Get notified when new releases ship.

About claude-flow

Deploy multi-agent swarms with coordinated workflows.

All releases →