This release adds 2 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+14 more
Summary
AI summaryFastEmbed model now lazily loads on first query to reduce idle process memory.
Full changelog
Highlights
Lazy-loaded embeddings (#32)
The FastEmbed ONNX model (~200MB resident) now loads on the first query, not at startup. Idle knowledge-rag processes are now genuinely cheap. This matters when MCP stdio clients spawn parallel server processes — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes by protocol design. Public API unchanged.
Opt-in single-instance guard (#33)
For users who measured their setup and want a hard cap of one server per data_dir:
export KNOWLEDGE_RAG_SINGLE_INSTANCE=1
A second instance exits immediately with code 75 (EX_TEMPFAIL). OFF by default so multi-client MCP usage continues to work unchanged. Stale-PID recovery + SIGINT/SIGTERM cleanup wired correctly. Full guide: docs/single-instance.md.
Original concept and reproduction by @Hohlas in #31, reworked here as opt-in to preserve legitimate multi-client MCP usage.
v4.0 roadmap (#34)
Long-term fix for multi-process resource duplication tracked: shared-service architecture (one daemon holding model + index, many thin MCP clients connecting via socket).
Changes
- NEW Lazy-load FastEmbed embedding model on first query (#32)
- NEW Opt-in single-instance guard via
KNOWLEDGE_RAG_SINGLE_INSTANCEenv var (#33) - NEW
docs/single-instance.md+examples/mcp-config-single-instance.json - DOCS README troubleshooting + What''s New refreshed
- CHORE Sync version across
pyproject.toml,mcp_server/__init__.py,npm/package.json(was drifting since v3.5.x) - CHORE pytest
tmp_path_retention_count=1to avoid Windows CI flake
Install
pip install knowledge-rag==3.8.0
npx -y [email protected]
docker pull ghcr.io/lyonzin/knowledge-rag:3.8.0
Backwards compatibility
- Lazy embeddings: API unchanged, GPU/CPU fallback identical
- Single-instance guard: default OFF — pre-v3.8.0 behavior preserved for everyone who does not set the env var
- Version sync: cosmetic (no runtime impact); fixes a multi-year drift
Credits
- @Hohlas — original single-instance guard concept and reproduction in #31
- knowledge-rag maintainers — lazy-load implementation, opt-in rework, signal handlers, tests, docs
Full Changelog: https://github.com/lyonzin/knowledge-rag/compare/v3.7.0...v3.8.0
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About lyonzin/knowledge-rag
Local RAG system for Claude Code with hybrid search (BM25 + semantic), cross-encoder reranking, markdown-aware chunking, query expansion, and 12 MCP tools. Runs entirely offline with zero external servers.
Related context
Beta — feedback welcome: [email protected]