Skip to content

claude-flow

v3.10.23 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-framework agentic-rag agentic-workflow agents ai-agents
+14 more
ai-assistant ai-coding ai-skills autonomous-agents claude-code codex mcp-server multi-agent multi-agent-systems npm skills swarm swarm-intelligence typescript

Summary

AI summary

Updates What changed in code, labelled, and v3/docs/adr/ADR-083-joint-rerank-grid.md across a mixed release.

Changes in this release

Feature Medium

`subjectWeight` default now conditional on `useRerank` flag (3.0 when reranking, 2.0 otherwise).

`subjectWeight` default now conditional on `useRerank` flag (3.0 when reranking, 2.0 otherwise).

Source: llm_adapter@2026-05-30

Confidence: high

Feature Medium

Updated default hybrid and cross‑encoder weights to hw=0.7, cw=0.3.

Updated default hybrid and cross‑encoder weights to hw=0.7, cw=0.3.

Source: llm_adapter@2026-05-30

Confidence: high

Feature Low

Extended `scripts/grid-search-retrieval.mjs` with joint rerank sweep (28 configs).

Extended `scripts/grid-search-retrieval.mjs` with joint rerank sweep (28 configs).

Source: llm_adapter@2026-05-30

Confidence: high

Refactor Low

Updated schema descriptions to reflect conditional defaults.

Updated schema descriptions to reflect conditional defaults.

Source: llm_adapter@2026-05-30

Confidence: high

Full changelog

What ships

Joint rerank re-grid — the rerank path's hybrid sub-params (α, sw) had been
tuned against the OLD α=0.6/sw=3.0 baseline; with ADR-082 changing α/sw under it,
a joint re-grid was the next ceiling-raiser. It paid off: rerank nDCG@3 0.900 → 0.963.

The key finding

The rerank path wants different hybrid sub-params than the non-rerank path:

| Path | Best α | Best sw | Best hw/cw | nDCG@3 |
|---|---:|---:|---|---:|
| Non-rerank (hybrid only) | 0.5 | 2.0 | — | 0.963 |
| Rerank | 0.5 | 3.0 | hw=0.7 cw=0.3 | 0.963 |

When the cross-encoder is doing semantic understanding downstream, the hybrid
stage can be more keyword-focused (higher subjectWeight). When hybrid is
the final stage, lower subjectWeight gives body tokens room to contribute.

Implementation: subjectWeight default is now conditional on rerank flag
(3.0 when reranking, 2.0 otherwise). Explicit param overrides.

The win

| Metric (rerank path, labelled) | 3.10.22 | 3.10.23 | Δ |
|---|---:|---:|---:|
| Label top-1 | 90% | 90% | tied |
| Label top-3 | 90% | 100% | +10pp |
| Label MRR@3 | 0.925 | 0.950 | +0.025 |
| Label precision@3 | 0.700 | 0.700 | tied |
| Label nDCG@3 | 0.900 | 0.963 | +0.063 (+7%) |
| Label nDCG@5 | 0.904 | 0.944 | +0.040 |

Both paths now at corpus ceiling (nDCG@3 = 0.963)

The choice between them is now purely cost vs richness:

| Path | Latency | Top-3 precision | Use when |
|---|---:|---:|---|
| Hybrid | 39 ms | 0.533 | hot paths, throughput-bound |
| Rerank | 1000 ms | 0.700 | richness-first, latency-tolerant |

Cumulative SOTA push since cosine baseline (3.10.17 → 3.10.23)

| Metric (labelled) | 3.10.17 | 3.10.19 | 3.10.20 | 3.10.22 | 3.10.23 |
|---|---:|---:|---:|---:|---:|
| Hybrid nDCG@3 | 0.000 | 0.900 | 0.900 | 0.963 | 0.963 |
| Rerank nDCG@3 | — | — | 0.913 | 0.900 | 0.963 |
| Hybrid top-3 | 0% | 90% | 90% | 100% | 100% |
| Rerank top-3 | — | — | 100% | 90% | 100% |
| Rerank precision@3 | — | — | 0.667 | 0.700 | 0.700 |

What changed in code

  1. subjectWeight default is now conditional on useRerank in src/mcp-tools/neural-tools.ts (3.0 if reranking, 2.0 otherwise).
  2. hybridWeight / ceWeight defaults updated to grid winners: 0.5/0.5 → 0.7/0.3.
  3. scripts/grid-search-retrieval.mjs extended with joint rerank sweep (28 configs across hw/cw × α × sw).
  4. Schema descriptions updated to reflect the conditional defaults.

Pending for next iteration

Cross-repo generalisation test — all numbers in ADRs 077-083 are on the
ruflo corpus. The real SOTA test is "does this hold up on a different repo's
history?" Pretrain on agentdb / agentic-flow, run a similar labelled bench,
see if nDCG@3 stays near 0.96. Tracked for 3.10.24 (or its own ADR-084).

Reproduce

git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )
node v3/@claude-flow/cli/scripts/pretrain-from-github.mjs

# Joint grid (~25 min)
cd v3/@claude-flow/cli && node scripts/grid-search-retrieval.mjs

# Verify both paths at corpus ceiling
BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs            # hybrid → nDCG@3 0.963
RERANK=1 BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs   # rerank → nDCG@3 0.963 (was 0.900)

Install

npx [email protected]    # latest / alpha / v3alpha all aligned

Full ADR: v3/docs/adr/ADR-083-joint-rerank-grid.md

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track claude-flow

Get notified when new releases ship.

Sign up free

About claude-flow

Deploy multi-agent swarms with coordinated workflows.

All releases →

Related context

Beta — feedback welcome: [email protected]