This release adds 2 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+14 more
Summary
AI summaryUpdates What changed in code, hybrid, and labelled across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Adds grid-search script `scripts/grid-search-retrieval.mjs` for hyperparameter tuning. Adds grid-search script `scripts/grid-search-retrieval.mjs` for hyperparameter tuning. Source: llm_adapter@2026-05-30 Confidence: high |
— |
| Performance | Low |
Improves labelled nDCG@3 from 0.900 to 0.963 (+7%). Improves labelled nDCG@3 from 0.900 to 0.963 (+7%). Source: granite4.1:30b@2026-05-30-audit Confidence: low |
— |
| Performance | Low |
Increases label top‑3 accuracy from 90% to 100%. Increases label top‑3 accuracy from 90% to 100%. Source: granite4.1:30b@2026-05-30-audit Confidence: low |
— |
| Performance | Low |
Raises label precision@3 from 0.400 to 0.533. Raises label precision@3 from 0.400 to 0.533. Source: granite4.1:30b@2026-05-30-audit Confidence: low |
— |
| Bugfix | Medium |
Corrects misleading default parameters that were tuned against an inaccurate proxy corpus. Corrects misleading default parameters that were tuned against an inaccurate proxy corpus. Source: granite4.1:30b@2026-05-30-audit Confidence: low |
— |
| Refactor | Low |
Updates default retrieval parameters: alpha 0.6 → 0.5, subjectWeight 3.0 → 2.0, mmrLambda 0.5 → 0.7. Updates default retrieval parameters: alpha 0.6 → 0.5, subjectWeight 3.0 → 2.0, mmrLambda 0.5 → 0.7. Source: llm_adapter@2026-05-30 Confidence: high |
— |
Full changelog
What ships
Grid-search-tuned retrieval defaults against the ADR-081 labelled corpus.
The previous defaults (α=0.6, subjectWeight=3.0, mmrLambda=0.5) were tuned
against the regex proxy that ADR-081 then revealed was misleading — so we
re-tuned properly.
The win
| Metric (hybrid path, labelled) | 3.10.21 | 3.10.22 | Δ |
|---|---:|---:|---:|
| Label top-1 | 90% | 90% | tied |
| Label top-3 | 90% | 100% | +10pp |
| Label MRR@3 | 0.900 | 0.950 | +0.05 |
| Label precision@3 | 0.400 | 0.533 | +0.13 |
| Label nDCG@3 | 0.900 | 0.963 | +0.06 (+7%) |
| Label nDCG@5 | 0.875 | 0.938 | +0.06 |
| Avg latency | 42 ms | 55 ms | +13 ms |
The findings
Grid swept 32 configs (27 hybrid + 5 rerank) using labelled nDCG@3 as the
canonical metric:
-
α=0.5 beats α=0.6, α=0.7 is broken. At α=0.7 (more cosine, less BM25)
top-1 collapsed to 40-50% across every other parameter combination. BM25
carries more discriminating power than the bi-encoder on this corpus
than the original 0.6 default credited it with. -
subjectWeight=2 beats sw=3 and sw=5. Less subject weight lets body
tokens contribute relevance that gets crowded out at sw=3. -
mmrLambda=0.7 beats 0.5 and 0.3. Less diversity bias / more pure
relevance ranking pulls more relevant docs into top-3.
What's still pending
A joint α/sw × hybridWeight/ceWeight re-grid for the rerank path —
the rerank winner (hw=0.7 cw=0.3) was tested against OLD α=0.6 sw=3.0
baselines; with new α=0.5 sw=2.0 the joint optimum shifted. Kept rerank
weights at 0.5/0.5 conservatively. Next iteration.
Cumulative SOTA push since cosine baseline (3.10.17 → 3.10.22)
| Metric (labelled) | 3.10.17 | 3.10.19 | 3.10.20 | 3.10.22 |
|---|---:|---:|---:|---:|
| Label top-1 (hybrid) | 0% | 90% | 90% | 90% |
| Label top-3 (hybrid) | 0% | 90% | 90% | 100% |
| Label nDCG@3 (hybrid) | 0.000 | 0.900 | 0.900 | 0.963 |
| Label precision@3 (hybrid) | 0.000 | 0.400 | 0.400 | 0.533 |
What changed in code
- Defaults updated in
src/mcp-tools/neural-tools.ts:alpha: 0.6 → 0.5subjectWeight: 3.0 → 2.0mmrLambda: 0.5 → 0.7
- New script
scripts/grid-search-retrieval.mjs— re-runnable harness,
sweeps hyperparameter space, picks winners by nDCG/top-1/precision@3.
--quickmode for fast iteration. - Run JSONs at
docs/benchmarks/runs/grid-search-retrieval-{ts,latest}.json
with full per-config metrics.
Reproduce
git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )
# Pretrain (415 patterns)
node v3/@claude-flow/cli/scripts/pretrain-from-github.mjs
# Grid-search (~1 min)
cd v3/@claude-flow/cli && node scripts/grid-search-retrieval.mjs
# Verify new defaults
BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs
# → Label nDCG@3 0.963, top-1 90%, top-3 100%, precision@3 0.533
Install
npx [email protected] # latest / alpha / v3alpha all aligned
Full ADR: v3/docs/adr/ADR-082-grid-search-retrieval-defaults.md
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
Related context
Related tools
Beta — feedback welcome: [email protected]