Skip to content

lyonzin/knowledge-rag

v3.8.0 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 24d MCP Developer Tools
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

antigravity claude claude-code claude-code-cli codex cursor-ai
+14 more
document-search hybrid-search inteligencia-artificial knowledge-base local-ai mcp mcp-server llm rag-chatbot rag-pipeline reranking retrieval-augmented-generation semantic-search vector-db

Summary

AI summary

FastEmbed model now lazily loads on first query to reduce idle process memory.

Full changelog

Highlights

Lazy-loaded embeddings (#32)

The FastEmbed ONNX model (~200MB resident) now loads on the first query, not at startup. Idle knowledge-rag processes are now genuinely cheap. This matters when MCP stdio clients spawn parallel server processes — multiple Claude Code windows, Claude Desktop + IDE simultaneously, or review/approval flows that open extra connections all spawn their own processes by protocol design. Public API unchanged.

Opt-in single-instance guard (#33)

For users who measured their setup and want a hard cap of one server per data_dir:

export KNOWLEDGE_RAG_SINGLE_INSTANCE=1

A second instance exits immediately with code 75 (EX_TEMPFAIL). OFF by default so multi-client MCP usage continues to work unchanged. Stale-PID recovery + SIGINT/SIGTERM cleanup wired correctly. Full guide: docs/single-instance.md.

Original concept and reproduction by @Hohlas in #31, reworked here as opt-in to preserve legitimate multi-client MCP usage.

v4.0 roadmap (#34)

Long-term fix for multi-process resource duplication tracked: shared-service architecture (one daemon holding model + index, many thin MCP clients connecting via socket).

Changes

  • NEW Lazy-load FastEmbed embedding model on first query (#32)
  • NEW Opt-in single-instance guard via KNOWLEDGE_RAG_SINGLE_INSTANCE env var (#33)
  • NEW docs/single-instance.md + examples/mcp-config-single-instance.json
  • DOCS README troubleshooting + What''s New refreshed
  • CHORE Sync version across pyproject.toml, mcp_server/__init__.py, npm/package.json (was drifting since v3.5.x)
  • CHORE pytest tmp_path_retention_count=1 to avoid Windows CI flake

Install

pip install knowledge-rag==3.8.0
npx -y [email protected]
docker pull ghcr.io/lyonzin/knowledge-rag:3.8.0

Backwards compatibility

  • Lazy embeddings: API unchanged, GPU/CPU fallback identical
  • Single-instance guard: default OFF — pre-v3.8.0 behavior preserved for everyone who does not set the env var
  • Version sync: cosmetic (no runtime impact); fixes a multi-year drift

Credits

  • @Hohlas — original single-instance guard concept and reproduction in #31
  • knowledge-rag maintainers — lazy-load implementation, opt-in rework, signal handlers, tests, docs

Full Changelog: https://github.com/lyonzin/knowledge-rag/compare/v3.7.0...v3.8.0

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track lyonzin/knowledge-rag

Get notified when new releases ship.

Sign up free

About lyonzin/knowledge-rag

Local RAG system for Claude Code with hybrid search (BM25 + semantic), cross-encoder reranking, markdown-aware chunking, query expansion, and 12 MCP tools. Runs entirely offline with zero external servers.

All releases →

Beta — feedback welcome: [email protected]