This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+14 more
Summary
AI summaryUpdates What changed in code, Honest limits, and Lucene across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Low |
Added `src/memory/bge-embedder.ts` supporting lazy-loaded BGE models (small, base, large). Added `src/memory/bge-embedder.ts` supporting lazy-loaded BGE models (small, base, large). Source: llm_adapter@2026-05-30 Confidence: high |
— |
| Feature | Low |
Added `scripts/run-beir-bge.mjs` for direct-dense BEIR benchmark runner with on‑disk embedding cache. Added `scripts/run-beir-bge.mjs` for direct-dense BEIR benchmark runner with on‑disk embedding cache. Source: llm_adapter@2026-05-30 Confidence: high |
— |
| Feature | Low |
Added `docs/benchmarks/BEIR-MATRIX.md` public benchmark tracking page. Added `docs/benchmarks/BEIR-MATRIX.md` public benchmark tracking page. Source: llm_adapter@2026-05-30 Confidence: high |
— |
| Performance | Medium |
Achieved nDCG@10 = 0.352 on BEIR NFCorpus using BGE‑base, ranking top‑2 among listed baselines. Achieved nDCG@10 = 0.352 on BEIR NFCorpus using BGE‑base, ranking top‑2 among listed baselines. Source: llm_adapter@2026-05-30 Confidence: high |
— |
| Bugfix | Medium |
Fixed silent degradation of embedding path on darwin-arm64 caused by `sharp`/`libvips` issue. Fixed silent degradation of embedding path on darwin-arm64 caused by `sharp`/`libvips` issue. Source: llm_adapter@2026-05-30 Confidence: high |
— |
Full changelog
ruflo 3.10.25 — reproducible BEIR NFCorpus benchmark, nDCG@10 0.352, top-2 against listed public baselines
We now have a reproducible BEIR benchmark harness, run JSONs, per-query metrics
(in 3.10.26), and a clean direct BGE dense path.
First public result: BEIR NFCorpus
nDCG@10 = 0.352 using BGE-base-en-v1.5 (110M params) via the direct
dense path (no fine-tuning, no hybrid BM25+dense fusion, no cross-encoder
reranker). Internal hybrid pipeline is isolated from this comparison so the
dense-vs-dense numbers stay honest.
| Rank | Method | Params | nDCG@10 |
|---:|---|---:|---:|
| 1 | BGE-large-v1.5 (listed) | 335M | 0.380 |
| 2 | ruflo + BGE-base-en-v1.5 ← us | 110M | 0.352 |
| 3 | SPLADE++ | 110M | 0.347 |
| 4 | GTR-XL | 1.2B | 0.343 |
| 5 | DocT5query / Contriever | — | 0.328 |
| 7 | BM25 (Lucene) | — | 0.325 |
| 8 | TAS-B / GenQ | — | 0.319 |
| 10 | ColBERT | 110M | 0.305 |
| 11 | SBERT msmarco | 110M | 0.272 |
This is top-2 on BEIR NFCorpus, NOT "top-2 on BEIR." BEIR is an 18-dataset
suite; NFCorpus is one dataset. The broader BEIR average requires TREC-COVID,
FiQA, ArguAna, HotpotQA, NQ, etc. SciFact (2nd dataset) is queued.
The more important part — the audit trail
We found and fixed a real environment bug where the embedding path could
silently degrade into hash fallback because of a sharp/libvips issue on
darwin-arm64. The neural store reported _realEmbedding: true because the
import succeeded — but per-call embeds threw and got swallowed by an inner
catch. The pure-BM25 path (with broken random cosine) was carrying the entire
"hybrid" signal undetected.
The new path bypasses that dependency by loading BGE directly through
@xenova/transformers's AutoTokenizer + AutoModel. Text bi-encoders
don't need image preprocessing; sharp is a transitive dep that's never
needed for retrieval.
What changed in code
src/memory/bge-embedder.ts— lazy-loaded singleton, supports
bge-small (33M, 384-dim), bge-base (110M, 768-dim, default),
bge-large (335M, 1024-dim). CLS-token pooling + L2 normalisation
per BAAI spec.scripts/run-beir-nfcorpus.mjs— hybrid-pipeline harness; with the
embedder broken this collapses to pure-BM25 (measured 0.289 vs published
BM25 0.325).scripts/run-beir-bge.mjs— direct-dense BEIR runner, on-disk
embedding cache, dataset auto-detect.docs/benchmarks/BEIR-MATRIX.md— public benchmark tracking page
(added in 3.10.26).
Reproduce
git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )
mkdir -p /tmp/beir-nfcorpus && cd /tmp/beir-nfcorpus
curl -sL -o nfcorpus.zip 'https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip'
unzip -q nfcorpus.zip
# BGE-base direct dense (one-time ~25min ingest + ~2min full eval)
node /path/to/v3/@claude-flow/cli/scripts/run-beir-bge.mjs
# → nDCG@10 0.352, rank 2/11 against listed baselines
# Cached subsequent runs (~2 min)
SKIP_INGEST=1 node /path/to/scripts/run-beir-bge.mjs
Honest limits
- One BEIR dataset measured. SciFact in progress; broader BEIR average
tracked. - Zero-shot, no fine-tuning. NFCorpus has a 110K-pair train split that
could fine-tune for an additional ~0.02-0.05 nDCG. - The 0.005 gap to SPLADE++ is small. Paired bootstrap CI shipping
in 3.10.26 will determine if it's statistically significant. - The
_realEmbedding: truelie inneural-tools.tsis bypassed, not
fixed. BGE direct-API path is the workaround; the underlying flag bug
is tracked.
Install
npx [email protected] # latest / alpha / v3alpha all aligned
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
Related context
Related tools
Beta — feedback welcome: [email protected]