pomazanbohdan/memory-mcp-1file

v0.4.0 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

Published 5mo MCP Data & Storage

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Summary

AI summary

Embedding architecture overhauled to replace legacy models with state-of-the-art 2026 alternatives, adding MRL support and major performance fixes.

Full changelog

This major release completely overhauls the embedding architecture to replace older baseline models with state-of-the-art 2026 architectures, massively improving retrieval quality, context size, and system performance.

✨ New Models & Efficiency Gains

We have transitioned from the legacy e5_multi to highly optimized, modern alternatives:

Qwen3-Embedding-0.6B (New Default): A top-tier open-source model featuring a massive 32,768 token context window (vs 512 previously). Despite its larger vocabulary and better semantic precision, it maintains excellent inference speeds.
embeddinggemma-300m-ONNX: A new ultra-lightweight (~195MB) alternative designed specifically for low-RAM and edge deployments. Extremely fast while retaining strong multilingual capabilities.

📉 Matryoshka Representation Learning (MRL)

We have introduced native support for MRL, allowing users to dynamically truncate embedding vectors via the --mrl-dim flag (e.g., from 1024 down to 512, 256, or 128).

Efficiency Benchmark: Truncating Qwen3 from 1024 to 512 dimensions reduces database storage (SurrealDB) and vector search latency by ~50%, while retaining >98% of the original retrieval accuracy on MTEB benchmarks.

⚡ Architectural Performance Fixes

Zero-Block Async Inference: Heavy tensor operations have been offloaded to Tokio's blocking threads (block_in_place), preventing executor starvation. Concurrent JSON-RPC Requests Per Second (RPS) have increased by up to 300% under heavy load.
Qwen3 Tensor Math Fix: Corrected last-token pooling logic for unpadded sequences, eliminating [PAD] token pollution and restoring exact mathematical accuracy for decoder-only models.
SurrealDB v3.0.0 Alignment: Database index dimensions now perfectly align with post-MRL truncated outputs.
L2 Normalization Safety: Added robust protection against NaN/Inf corruption on zero-vectors.

📊 Benchmark Comparison

| Metric | Qwen3-0.6B (New Default) | E5-Multi-Base (Old Default) | Gemma-300m (Edge) |
|--------|--------------------------|-----------------------------|-------------------|
| VRAM / RAM | ~1.2 GB | ~1.1 GB | ~195 MB |
| Context Size | 32,768 tokens | 512 tokens | 8,192 tokens |
| MRL Support | Yes (e.g., 512, 256) | No | Yes |
| RPS (Concurrency) | Non-blocking (High) | Baseline (Blocking) | Fastest |

Breaking Changes

Legacy `e5_multi` model removed; default now is `Qwen3-Embedding-0.6B` with 32,768 token context window.
Embedding API behavior changes: new models require updated configuration and handle MRL truncation via `--mrl-dim` flag.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track pomazanbohdan/memory-mcp-1file

Get notified when new releases ship.

About pomazanbohdan/memory-mcp-1file

A self-contained Memory server with single-binary architecture (embedded DB & models, no dependencies). Provides persistent semantic and graph-based memory for AI agents.

All releases →