lyonzin/knowledge-rag

v3.8.0 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 2mo MCP Developer Tools

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

antigravity claude claude-code claude-code-cli codex cursor-ai

+14 more

document-search hybrid-search inteligencia-artificial knowledge-base local-ai mcp mcp-server llm rag-chatbot rag-pipeline reranking retrieval-augmented-generation semantic-search vector-db

Summary

AI summary

FastEmbed model now lazily loads on first query to reduce idle process memory.

Full changelog

Highlights

Lazy-loaded embeddings (#32)

The FastEmbed ONNX model (~200MB resident) now loads on the first query, not at startup. Idle knowledge-rag processes are now genuinely cheap. This matters when MCP stdio clients spawn parallel server processes — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes by protocol design. Public API unchanged.

Opt-in single-instance guard (#33)

For users who measured their setup and want a hard cap of one server per data_dir:

export KNOWLEDGE_RAG_SINGLE_INSTANCE=1

A second instance exits immediately with code 75 (EX_TEMPFAIL). OFF by default so multi-client MCP usage continues to work unchanged. Stale-PID recovery + SIGINT/SIGTERM cleanup wired correctly. Full guide: docs/single-instance.md.

Original concept and reproduction by @Hohlas in #31, reworked here as opt-in to preserve legitimate multi-client MCP usage.

v4.0 roadmap (#34)

Long-term fix for multi-process resource duplication tracked: shared-service architecture (one daemon holding model + index, many thin MCP clients connecting via socket).

Changes

NEW Lazy-load FastEmbed embedding model on first query (#32)
NEW Opt-in single-instance guard via KNOWLEDGE_RAG_SINGLE_INSTANCE env var (#33)
NEW docs/single-instance.md + examples/mcp-config-single-instance.json
DOCS README troubleshooting + What''s New refreshed
CHORE Sync version across pyproject.toml, mcp_server/__init__.py, npm/package.json (was drifting since v3.5.x)
CHORE pytest tmp_path_retention_count=1 to avoid Windows CI flake

Install

pip install knowledge-rag==3.8.0
npx -y [email protected]
docker pull ghcr.io/lyonzin/knowledge-rag:3.8.0

Backwards compatibility

Lazy embeddings: API unchanged, GPU/CPU fallback identical
Single-instance guard: default OFF — pre-v3.8.0 behavior preserved for everyone who does not set the env var
Version sync: cosmetic (no runtime impact); fixes a multi-year drift

Credits

@Hohlas — original single-instance guard concept and reproduction in #31
knowledge-rag maintainers — lazy-load implementation, opt-in rework, signal handlers, tests, docs

Full Changelog: https://github.com/lyonzin/knowledge-rag/compare/v3.7.0...v3.8.0

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track lyonzin/knowledge-rag

Get notified when new releases ship.

About lyonzin/knowledge-rag

Local RAG system for Claude Code with hybrid search (BM25 + semantic), cross-encoder reranking, markdown-aware chunking, query expansion, and 12 MCP tools. Runs entirely offline with zero external servers.

All releases →