claude-flow

v3.10.22 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 1mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-framework agentic-workflow agents ai-agents ai-assistant

+14 more

ai-coding ai-skills autonomous-agents claude-code codex harness mcp-server multi-agent multi-agent-systems npm skills swarm swarm-intelligence typescript

Summary

AI summary

Updates What changed in code, hybrid, and labelled across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Feature	Medium	Adds grid-search script `scripts/grid-search-retrieval.mjs` for hyperparameter tuning. Adds grid-search script `scripts/grid-search-retrieval.mjs` for hyperparameter tuning. Source: llm_adapter@2026-05-30 Confidence: high	—
Performance
Performance	Low	Improves labelled nDCG@3 from 0.900 to 0.963 (+7%). Improves labelled nDCG@3 from 0.900 to 0.963 (+7%). Source: granite4.1:30b@2026-05-30-audit Confidence: low	—
Performance	Low	Increases label top‑3 accuracy from 90% to 100%. Increases label top‑3 accuracy from 90% to 100%. Source: granite4.1:30b@2026-05-30-audit Confidence: low	—
Performance	Low	Raises label precision@3 from 0.400 to 0.533. Raises label precision@3 from 0.400 to 0.533. Source: granite4.1:30b@2026-05-30-audit Confidence: low	—
Bugfix	Medium	Corrects misleading default parameters that were tuned against an inaccurate proxy corpus. Corrects misleading default parameters that were tuned against an inaccurate proxy corpus. Source: granite4.1:30b@2026-05-30-audit Confidence: low	—
Refactor	Low	Updates default retrieval parameters: alpha 0.6 → 0.5, subjectWeight 3.0 → 2.0, mmrLambda 0.5 → 0.7. Updates default retrieval parameters: alpha 0.6 → 0.5, subjectWeight 3.0 → 2.0, mmrLambda 0.5 → 0.7. Source: llm_adapter@2026-05-30 Confidence: high	—

Full changelog

What ships

Grid-search-tuned retrieval defaults against the ADR-081 labelled corpus.
The previous defaults (α=0.6, subjectWeight=3.0, mmrLambda=0.5) were tuned
against the regex proxy that ADR-081 then revealed was misleading — so we
re-tuned properly.

The win

| Metric (hybrid path, labelled) | 3.10.21 | 3.10.22 | Δ |
|---|---:|---:|---:|
| Label top-1 | 90% | 90% | tied |
| Label top-3 | 90% | 100% | +10pp |
| Label MRR@3 | 0.900 | 0.950 | +0.05 |
| Label precision@3 | 0.400 | 0.533 | +0.13 |
| Label nDCG@3 | 0.900 | 0.963 | +0.06 (+7%) |
| Label nDCG@5 | 0.875 | 0.938 | +0.06 |
| Avg latency | 42 ms | 55 ms | +13 ms |

The findings

Grid swept 32 configs (27 hybrid + 5 rerank) using labelled nDCG@3 as the
canonical metric:

α=0.5 beats α=0.6, α=0.7 is broken. At α=0.7 (more cosine, less BM25)
top-1 collapsed to 40-50% across every other parameter combination. BM25
carries more discriminating power than the bi-encoder on this corpus
than the original 0.6 default credited it with.
subjectWeight=2 beats sw=3 and sw=5. Less subject weight lets body
tokens contribute relevance that gets crowded out at sw=3.
mmrLambda=0.7 beats 0.5 and 0.3. Less diversity bias / more pure
relevance ranking pulls more relevant docs into top-3.

What's still pending

A joint α/sw × hybridWeight/ceWeight re-grid for the rerank path —
the rerank winner (hw=0.7 cw=0.3) was tested against OLD α=0.6 sw=3.0
baselines; with new α=0.5 sw=2.0 the joint optimum shifted. Kept rerank
weights at 0.5/0.5 conservatively. Next iteration.

Cumulative SOTA push since cosine baseline (3.10.17 → 3.10.22)

| Metric (labelled) | 3.10.17 | 3.10.19 | 3.10.20 | 3.10.22 |
|---|---:|---:|---:|---:|
| Label top-1 (hybrid) | 0% | 90% | 90% | 90% |
| Label top-3 (hybrid) | 0% | 90% | 90% | 100% |
| Label nDCG@3 (hybrid) | 0.000 | 0.900 | 0.900 | 0.963 |
| Label precision@3 (hybrid) | 0.000 | 0.400 | 0.400 | 0.533 |

What changed in code

Defaults updated in src/mcp-tools/neural-tools.ts:
- alpha: 0.6 → 0.5
- subjectWeight: 3.0 → 2.0
- mmrLambda: 0.5 → 0.7
New script scripts/grid-search-retrieval.mjs — re-runnable harness,
sweeps hyperparameter space, picks winners by nDCG/top-1/precision@3.
--quick mode for fast iteration.
Run JSONs at docs/benchmarks/runs/grid-search-retrieval-{ts,latest}.json
with full per-config metrics.

Reproduce

git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )

# Pretrain (415 patterns)
node v3/@claude-flow/cli/scripts/pretrain-from-github.mjs

# Grid-search (~1 min)
cd v3/@claude-flow/cli && node scripts/grid-search-retrieval.mjs

# Verify new defaults
BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs
# → Label nDCG@3 0.963, top-1 90%, top-3 100%, precision@3 0.533

Install

npx [email protected]    # latest / alpha / v3alpha all aligned

Full ADR: v3/docs/adr/ADR-082-grid-search-retrieval-defaults.md

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track claude-flow

Get notified when new releases ship.

About claude-flow

Deploy multi-agent swarms with coordinated workflows.

All releases →