Skip to content

This release includes 2 breaking changes for platform teams planning a safe upgrade.

Published 3mo MCP Data & Storage
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Summary

AI summary

Embedding architecture overhauled to replace legacy models with state-of-the-art 2026 alternatives, adding MRL support and major performance fixes.

Full changelog

This major release completely overhauls the embedding architecture to replace older baseline models with state-of-the-art 2026 architectures, massively improving retrieval quality, context size, and system performance.

✨ New Models & Efficiency Gains

We have transitioned from the legacy e5_multi to highly optimized, modern alternatives:

  • Qwen3-Embedding-0.6B (New Default): A top-tier open-source model featuring a massive 32,768 token context window (vs 512 previously). Despite its larger vocabulary and better semantic precision, it maintains excellent inference speeds.
  • embeddinggemma-300m-ONNX: A new ultra-lightweight (~195MB) alternative designed specifically for low-RAM and edge deployments. Extremely fast while retaining strong multilingual capabilities.

📉 Matryoshka Representation Learning (MRL)

We have introduced native support for MRL, allowing users to dynamically truncate embedding vectors via the --mrl-dim flag (e.g., from 1024 down to 512, 256, or 128).

  • Efficiency Benchmark: Truncating Qwen3 from 1024 to 512 dimensions reduces database storage (SurrealDB) and vector search latency by ~50%, while retaining >98% of the original retrieval accuracy on MTEB benchmarks.

⚡ Architectural Performance Fixes

  • Zero-Block Async Inference: Heavy tensor operations have been offloaded to Tokio's blocking threads (block_in_place), preventing executor starvation. Concurrent JSON-RPC Requests Per Second (RPS) have increased by up to 300% under heavy load.
  • Qwen3 Tensor Math Fix: Corrected last-token pooling logic for unpadded sequences, eliminating [PAD] token pollution and restoring exact mathematical accuracy for decoder-only models.
  • SurrealDB v3.0.0 Alignment: Database index dimensions now perfectly align with post-MRL truncated outputs.
  • L2 Normalization Safety: Added robust protection against NaN/Inf corruption on zero-vectors.

📊 Benchmark Comparison

| Metric | Qwen3-0.6B (New Default) | E5-Multi-Base (Old Default) | Gemma-300m (Edge) |
|--------|--------------------------|-----------------------------|-------------------|
| VRAM / RAM | ~1.2 GB | ~1.1 GB | ~195 MB |
| Context Size | 32,768 tokens | 512 tokens | 8,192 tokens |
| MRL Support | Yes (e.g., 512, 256) | No | Yes |
| RPS (Concurrency) | Non-blocking (High) | Baseline (Blocking) | Fastest |

Breaking Changes

  • Legacy `e5_multi` model removed; default now is `Qwen3-Embedding-0.6B` with 32,768 token context window.
  • Embedding API behavior changes: new models require updated configuration and handle MRL truncation via `--mrl-dim` flag.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track pomazanbohdan/memory-mcp-1file

Get notified when new releases ship.

Sign up free

About pomazanbohdan/memory-mcp-1file

A self-contained Memory server with single-binary architecture (embedded DB & models, no dependencies). Provides persistent semantic and graph-based memory for AI agents.

All releases →

Beta — feedback welcome: [email protected]