Skip to content

nonatofabio/local-faiss-mcp

v0.2.0 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

Published 5mo MCP Data & Storage
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agentic-ai ai-agents ai-tools faiss llm-tools local-rag
+5 more
mcp-server model-context-protocol llm semantic-search vector-db

Summary

AI summary

New CLI for document indexing/search, broad format support, re-ranking with CrossEncoder models, and custom embedding model selection.

Full changelog

Release Notes: v0.2.0

Overview

Version 0.2.0 is a major feature release that transforms local-faiss-mcp from a simple MCP server into a comprehensive local RAG solution with CLI tools, advanced search capabilities, and broad document format support.

🎯 Major Features

1. Command-Line Interface

New local-faiss CLI for standalone document indexing and search:

# Index documents
local-faiss index document.pdf
local-faiss index -r documents/  # Recursive
local-faiss index "docs/**/*.pdf"  # Glob patterns

# Search
local-faiss search "What is FAISS?"
local-faiss search -k 5 "your query"

Key capabilities:

  • Automatic MCP config integration (.mcp.json, ~/.claude/.mcp.json)
  • Incremental indexing (adds to existing index, never overwrites)
  • Progress output showing indexing status
  • Creates default config if none exists

2. Document Format Support

Enhanced document ingestion with broad format support:

Native formats (always available):

  • PDF (via pypdf)
  • TXT, MD, RST, LOG

Extended formats (via pandoc):

  • DOCX, ODT (Office documents)
  • HTML, HTM (Web pages)
  • RTF, EPUB (E-books)
  • 40+ additional formats

MCP tool enhancement:

  • Auto-detects file paths in ingest_document tool
  • Agent can now reference local files directly: ingest_document(document="./file.pdf")
  • Automatic filename extraction for source attribution

3. Re-Ranking for Improved Relevance

Two-stage "retrieve and rerank" search pipeline:

# Enable with default model
local-faiss-mcp --rerank

# Use specific model
local-faiss-mcp --rerank cross-encoder/ms-marco-MiniLM-L-6-v2

How it works:

  1. FAISS retrieves top candidates (10x more than requested)
  2. CrossEncoder re-ranks by query relevance
  3. Returns top-k most relevant results

Recommended models:

  • BAAI/bge-reranker-base (default) - Good balance
  • cross-encoder/ms-marco-MiniLM-L-6-v2 - Fast
  • cross-encoder/ms-marco-TinyBERT-L-2-v2 - Very fast

4. Custom Embedding Models

Flexible embedding model selection:

# Use different embedding model
local-faiss-mcp --embed all-mpnet-base-v2

# Multilingual support
local-faiss-mcp --embed paraphrase-multilingual-MiniLM-L12-v2

Features:

  • Dynamic dimension detection
  • Dimension validation on index load
  • Model persistence in metadata
  • Clear error messages for mismatches

5. MCP Prompts

Built-in prompts for better RAG workflows:

extract-answer:

  • Extracts relevant answers from retrieved chunks
  • Provides citations and source attribution
  • Explains relevance of results

summarize-documents:

  • Creates focused summaries from multiple chunks
  • Configurable max length
  • Topic-based synthesis

📦 New Components

  • local_faiss_mcp/document_parser.py: Document parsing with format detection
  • local_faiss_mcp/cli.py: CLI implementation with MCP config integration
  • Console script: local-faiss command for easy access

🔧 API Changes

Enhanced ingest_document Tool

Before:

{
  "document": "text content only",
  "source": "optional"
}

After:

{
  "document": "text content OR file path",
  "source": "optional"
}

Auto-detection: If document looks like a file path (contains / or \ with extension), it's parsed as a file.

Enhanced query_rag_store Results

Results now include rerank_score when re-ranking is enabled:

{
  "text": "...",
  "source": "...",
  "distance": 0.5,
  "rerank_score": 0.85  // Only present if --rerank enabled
}

📚 Documentation

  • Complete CLI documentation in README
  • MCP registry submission guide (MCP_REGISTRY.md)
  • Document format support matrix
  • Re-ranking model recommendations
  • Configuration examples for all features

🧪 Testing

Comprehensive test suite with 24 tests:

  • Document parser tests (5)
  • Embedding model tests (8)
  • MCP prompts tests (5)
  • Re-ranking tests (5)
  • Integration test (1)

All tests passing on macOS and Linux. Windows support maintained (with known pandoc limitation).

📋 Dependencies

New:

  • pypdf>=4.0.0 - PDF parsing

Optional:

  • pandoc (system package) - Extended format support

Development:

  • pytest-asyncio>=0.21.0 - Async test support

🚀 Migration Guide

From v0.1.0

No breaking changes! Your existing indexes and configurations will continue to work.

To use new features:

  1. Update package:

    pip install --upgrade local-faiss-mcp
    
  2. Enable re-ranking (optional):
    Update .mcp.json:

    {
      "mcpServers": {
        "local-faiss-mcp": {
          "command": "local-faiss-mcp",
          "args": ["--index-dir", "./.vector_store", "--rerank"]
        }
      }
    }
    
  3. Start using CLI:

    local-faiss index document.pdf
    local-faiss search "your query"
    

Configuration Notes

The CLI automatically uses your MCP server configuration, ensuring consistency between CLI and MCP server usage.

🎁 What's Next

Future features under consideration:

  • Additional re-ranking models
  • Hybrid search (BM25 + vector)
  • Index management commands (local-faiss list, local-faiss delete)
  • Batch ingestion improvements
  • More document formats

📝 Full Changelog

Added

  • Command-line interface (local-faiss index, local-faiss search)
  • Document format support (PDF, DOCX, HTML, etc.)
  • Re-ranking with CrossEncoder models
  • Custom embedding model support
  • MCP prompts for RAG workflows (extract-answer, summarize-documents)
  • Auto-detection of file paths in ingest_document
  • MCP config integration for CLI
  • Comprehensive test suite
  • MCP registry submission preparation

Changed

  • ingest_document now accepts file paths in addition to text
  • Query results include rerank_score when enabled
  • Improved error messages for unsupported formats
  • Enhanced documentation with CLI examples

Fixed

  • Windows encoding issues in tests
  • Dimension mismatch detection
  • UTF-8 handling in document parsing

🙏 Acknowledgments

Thanks to the community for feature requests and feedback that shaped this release!

  • Document indexing: User request for file-based ingestion
  • Re-ranking: Production use case feedback
  • CLI tool: Developer workflow improvements

📞 Support

  • GitHub Issues: https://github.com/nonatofabio/local_faiss_mcp/issues
  • Documentation: https://github.com/nonatofabio/local_faiss_mcp
  • MCP Registry: https://modelcontextprotocol.io/servers

Released: [Date]
Package: https://pypi.org/project/local-faiss-mcp/
Version: 0.2.0

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track nonatofabio/local-faiss-mcp

Get notified when new releases ship.

Sign up free

About nonatofabio/local-faiss-mcp

Local FAISS vector database for RAG with document ingestion (PDF/TXT/MD/DOCX), semantic search, re-ranking, and CLI tools for indexing and querying

All releases →

Beta — feedback welcome: [email protected]