Skip to content

claude-flow

v3.10.22 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai agentic-framework agentic-rag agentic-workflow agents ai-agents
+14 more
ai-assistant ai-coding ai-skills autonomous-agents claude-code codex mcp-server multi-agent multi-agent-systems npm skills swarm swarm-intelligence typescript

Summary

AI summary

Updates What changed in code, hybrid, and labelled across a mixed release.

Changes in this release

Feature Medium

Adds grid-search script `scripts/grid-search-retrieval.mjs` for hyperparameter tuning.

Adds grid-search script `scripts/grid-search-retrieval.mjs` for hyperparameter tuning.

Source: llm_adapter@2026-05-30

Confidence: high

Performance Low

Improves labelled nDCG@3 from 0.900 to 0.963 (+7%).

Improves labelled nDCG@3 from 0.900 to 0.963 (+7%).

Source: granite4.1:30b@2026-05-30-audit

Confidence: low

Performance Low

Increases label top‑3 accuracy from 90% to 100%.

Increases label top‑3 accuracy from 90% to 100%.

Source: granite4.1:30b@2026-05-30-audit

Confidence: low

Performance Low

Raises label precision@3 from 0.400 to 0.533.

Raises label precision@3 from 0.400 to 0.533.

Source: granite4.1:30b@2026-05-30-audit

Confidence: low

Bugfix Medium

Corrects misleading default parameters that were tuned against an inaccurate proxy corpus.

Corrects misleading default parameters that were tuned against an inaccurate proxy corpus.

Source: granite4.1:30b@2026-05-30-audit

Confidence: low

Refactor Low

Updates default retrieval parameters: alpha 0.6 → 0.5, subjectWeight 3.0 → 2.0, mmrLambda 0.5 → 0.7.

Updates default retrieval parameters: alpha 0.6 → 0.5, subjectWeight 3.0 → 2.0, mmrLambda 0.5 → 0.7.

Source: llm_adapter@2026-05-30

Confidence: high

Full changelog

What ships

Grid-search-tuned retrieval defaults against the ADR-081 labelled corpus.
The previous defaults (α=0.6, subjectWeight=3.0, mmrLambda=0.5) were tuned
against the regex proxy that ADR-081 then revealed was misleading — so we
re-tuned properly.

The win

| Metric (hybrid path, labelled) | 3.10.21 | 3.10.22 | Δ |
|---|---:|---:|---:|
| Label top-1 | 90% | 90% | tied |
| Label top-3 | 90% | 100% | +10pp |
| Label MRR@3 | 0.900 | 0.950 | +0.05 |
| Label precision@3 | 0.400 | 0.533 | +0.13 |
| Label nDCG@3 | 0.900 | 0.963 | +0.06 (+7%) |
| Label nDCG@5 | 0.875 | 0.938 | +0.06 |
| Avg latency | 42 ms | 55 ms | +13 ms |

The findings

Grid swept 32 configs (27 hybrid + 5 rerank) using labelled nDCG@3 as the
canonical metric:

  1. α=0.5 beats α=0.6, α=0.7 is broken. At α=0.7 (more cosine, less BM25)
    top-1 collapsed to 40-50% across every other parameter combination. BM25
    carries more discriminating power than the bi-encoder on this corpus
    than the original 0.6 default credited it with.

  2. subjectWeight=2 beats sw=3 and sw=5. Less subject weight lets body
    tokens contribute relevance that gets crowded out at sw=3.

  3. mmrLambda=0.7 beats 0.5 and 0.3. Less diversity bias / more pure
    relevance ranking pulls more relevant docs into top-3.

What's still pending

A joint α/sw × hybridWeight/ceWeight re-grid for the rerank path
the rerank winner (hw=0.7 cw=0.3) was tested against OLD α=0.6 sw=3.0
baselines; with new α=0.5 sw=2.0 the joint optimum shifted. Kept rerank
weights at 0.5/0.5 conservatively. Next iteration.

Cumulative SOTA push since cosine baseline (3.10.17 → 3.10.22)

| Metric (labelled) | 3.10.17 | 3.10.19 | 3.10.20 | 3.10.22 |
|---|---:|---:|---:|---:|
| Label top-1 (hybrid) | 0% | 90% | 90% | 90% |
| Label top-3 (hybrid) | 0% | 90% | 90% | 100% |
| Label nDCG@3 (hybrid) | 0.000 | 0.900 | 0.900 | 0.963 |
| Label precision@3 (hybrid) | 0.000 | 0.400 | 0.400 | 0.533 |

What changed in code

  1. Defaults updated in src/mcp-tools/neural-tools.ts:
    • alpha: 0.6 → 0.5
    • subjectWeight: 3.0 → 2.0
    • mmrLambda: 0.5 → 0.7
  2. New script scripts/grid-search-retrieval.mjs — re-runnable harness,
    sweeps hyperparameter space, picks winners by nDCG/top-1/precision@3.
    --quick mode for fast iteration.
  3. Run JSONs at docs/benchmarks/runs/grid-search-retrieval-{ts,latest}.json
    with full per-config metrics.

Reproduce

git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )

# Pretrain (415 patterns)
node v3/@claude-flow/cli/scripts/pretrain-from-github.mjs

# Grid-search (~1 min)
cd v3/@claude-flow/cli && node scripts/grid-search-retrieval.mjs

# Verify new defaults
BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs
# → Label nDCG@3 0.963, top-1 90%, top-3 100%, precision@3 0.533

Install

npx [email protected]    # latest / alpha / v3alpha all aligned

Full ADR: v3/docs/adr/ADR-082-grid-search-retrieval-defaults.md

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track claude-flow

Get notified when new releases ship.

Sign up free

About claude-flow

Deploy multi-agent swarms with coordinated workflows.

All releases →

Beta — feedback welcome: [email protected]