lyonzin/knowledge-rag

v3.8.1 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 2mo MCP Developer Tools

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

antigravity claude claude-code claude-code-cli codex cursor-ai

+14 more

document-search hybrid-search inteligencia-artificial knowledge-base local-ai mcp mcp-server llm rag-chatbot rag-pipeline reranking retrieval-augmented-generation semantic-search vector-db

Summary

AI summary

Embeddings now raise EmbeddingError instead of silently returning zero vectors, fixing corruption bugs.

Full changelog

Highlights

Critical fix: no more silent zero-vector corruption (#36)

FastEmbedEmbeddings.__call__ no longer swallows exceptions and returns [[0.0]*dim, ...] when the ONNX model fails to load. That bug pre-existed in master but was masked: ChromaDB happily stored zero embeddings, count() reported normal numbers, smart-reindex skipped them as "already indexed", and queries returned garbage similarity scores with no error visible. v3.8.0 lazy-load expanded the impact (failures moved from startup to query time).

Now raises EmbeddingModelLoadError / EmbeddingError loudly. Sticky _load_failed flag prevents retry storms against HuggingFace rate limits. Sanity checks in __call__ catch count and dim mismatches.

Windows CI flake eliminated (#38)

Five stacked root causes finally untangled:

Test flake: test_concurrent_first_call_loads_once spawned threads with join(timeout=5). If Windows scheduler delayed t2 past the timeout, with patch(TextEmbedding=slow_init) exited while t2 was still scheduled — t2 then called the REAL TextEmbedding and triggered an HF download. Replaced with deterministic test_double_checked_lock_prevents_double_load.
HF download leakage: any real TextEmbedding(...) call spawned concurrent.futures.ThreadPoolExecutor worker threads that outlived pytest and crashed on closed-stdout warnings. Set HF_HUB_OFFLINE=1 + TRANSFORMERS_OFFLINE=1 + HF_HUB_DISABLE_PROGRESS_BARS=1 in CI to fail fast on any leak.
pytest atexit OSError: cleanup_numbered_dir glob raised on Windows during interpreter shutdown. conftest.py wraps it as defense-in-depth.
PowerShell shell: GHA windows-latest default pwsh propagated stderr writes as exit code 1 even when pytest returned 0. Forced shell: bash on the test step.
Version drift: __init__.py, pyproject.toml, npm/package.json were drifting since v3.5.x. Now atomic.

The Windows runner is now genuinely green — no admin merges needed for the v3.8.1 release.

Changes

FIX (critical) Loud-fail embeddings — no more silent zero-vector corruption (#36)
FIX Sticky _load_failed to avoid HF retry storm
NEW EmbeddingError + EmbeddingModelLoadError exception classes
NEW Embed count + dim mismatch sanity checks
TEST 7 new regression cases including test_does_not_return_zero_vectors_silently
CI HF_HUB_OFFLINE=1 + TRANSFORMERS_OFFLINE=1 + HF_HUB_DISABLE_PROGRESS_BARS=1 (#38)
CI shell: bash on Windows test step (#38)
TEST Deterministic replacement for flaky concurrent test (#38)
TEST conftest.py wraps _pytest.pathlib.cleanup_numbered_dir (defense-in-depth, #38)

Install

pip install knowledge-rag==3.8.1
npx -y [email protected]
docker pull ghcr.io/lyonzin/knowledge-rag:3.8.1

Backwards compatibility

Public API unchanged
__call__ raises instead of returning silent zeros — callers that already handled exceptions (index_all line 771, search_knowledge line 1010) work without changes
New exception classes are additive

Who should upgrade

All v3.8.0 users, especially anyone who saw "0 results" queries or has models_cache/ with 0-byte files.

Full Changelog: https://github.com/lyonzin/knowledge-rag/compare/v3.8.0...v3.8.1

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track lyonzin/knowledge-rag

Get notified when new releases ship.

About lyonzin/knowledge-rag

Local RAG system for Claude Code with hybrid search (BM25 + semantic), cross-encoder reranking, markdown-aware chunking, query expansion, and 12 MCP tools. Runs entirely offline with zero external servers.

All releases →