Skip to content

BetterDB-inc/monitor

vsemantic-cache-py-v0.4.1 scope: semantic-cache-py Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 3d Monitoring & Metrics
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

prometheus monitoring obsevability redis valkey

Summary

AI summary

Introduces a Python port of @betterdb/semantic-cache with Observability instrumentation, expanded What's included features (SemanticCache API, provider adapters, embedding helpers, default cost table), and new filter‑based invalidation.

Full changelog

betterdb-semantic-cache v0.1.0

Python port of @betterdb/semantic-cache. Embeddings-based semantic cache for AI
workloads backed by Valkey vector search — similarity matching, cost tracking,
multi-modal prompts, embedding cache, and threshold tuning, with built-in
OpenTelemetry and Prometheus instrumentation.

Requires Valkey 8+ with the valkey-search module (vector index support).
Works with ElastiCache for Valkey, Memorystore for Valkey, and MemoryDB.


Installation

pip install betterdb-semantic-cache

Optional extras install the provider SDKs alongside the library:

pip install "betterdb-semantic-cache[openai]"
pip install "betterdb-semantic-cache[anthropic]"
pip install "betterdb-semantic-cache[langchain]"
pip install "betterdb-semantic-cache[langgraph]"
pip install "betterdb-semantic-cache[llamaindex]"
pip install "betterdb-semantic-cache[httpx]"   # voyage / cohere / ollama embed helpers
pip install "betterdb-semantic-cache[bedrock]"  # AWS Bedrock embed helper

What's included

SemanticCache

| Method | Description |
|---|---|
| initialize() | Create or attach to the vector index |
| check(prompt) | Similarity lookup — returns hit/miss with confidence and optional cost saved |
| store(prompt, response) | Store a response with optional cost metadata |
| store_multipart(prompt, blocks) | Store structured content blocks |
| check_batch(prompts) | Pipelined batch lookup |
| invalidate(filter) | Delete entries matching a FT.SEARCH filter |
| invalidate_by_model(model) | Delete all entries for a model |
| invalidate_by_category(category) | Delete all entries in a category |
| stats() | Hit/miss counts and cumulative cost saved |
| index_info() | Index name, doc count, vector dimension |
| threshold_effectiveness() | Rolling window analysis and threshold recommendations |
| threshold_effectiveness_all() | Per-category analysis |
| flush() | Drop index and delete all cached entries |

Provider adapters

| Import | Provider |
|---|---|
| betterdb_semantic_cache.adapters.openai | OpenAI Chat Completions |
| betterdb_semantic_cache.adapters.openai_responses | OpenAI Responses API |
| betterdb_semantic_cache.adapters.anthropic | Anthropic Messages |
| betterdb_semantic_cache.adapters.llamaindex | LlamaIndex ChatMessage[] |
| betterdb_semantic_cache.adapters.langchain | LangChain BaseCache (async-only) |
| betterdb_semantic_cache.adapters.langgraph | LangGraph BetterDBSemanticStore |

Embedding helpers

| Import | Provider |
|---|---|
| embed.openai | OpenAI Embeddings API |
| embed.voyage | Voyage AI (httpx, no SDK required) |
| embed.cohere | Cohere Embed v3 (httpx, no SDK required) |
| embed.ollama | Ollama local models (httpx, no SDK required) |
| embed.bedrock | AWS Bedrock Titan / Cohere (boto3) |

Bundled default cost table

A default cost table sourced from LiteLLM's model_prices_and_context_window.json
is bundled and refreshed on every release. Cost savings tracking works out of the
box for 1,900+ models — no cost_table configuration required.

Observability

  • OpenTelemetry spans on every cache operation
  • Prometheus metrics: requests_total, similarity_score, operation_duration_seconds,
    embedding_duration_seconds, cost_saved_total, embedding_cache_total,
    stale_model_evictions_total

Cluster support

Pass a ValkeyCluster client and all SCAN-based operations (flush,
invalidate_by_model, invalidate_by_category) automatically iterate all master nodes.


Quick start

import asyncio
import valkey.asyncio as valkey
from betterdb_semantic_cache import SemanticCache, SemanticCacheOptions
from betterdb_semantic_cache.types import CacheStoreOptions
from betterdb_semantic_cache.embed.openai import create_openai_embed

client = valkey.Valkey(host="localhost", port=6379)
cache = SemanticCache(SemanticCacheOptions(
    client=client,
    embed_fn=create_openai_embed(),
    default_threshold=0.12,
))

async def main():
    await cache.initialize()

    result = await cache.check("What is the capital of France?")
    if result.hit:
        print("Cache hit:", result.response)
    else:
        answer = "Paris"  # ... call your LLM ...
        await cache.store(
            "What is the capital of France?", answer,
            CacheStoreOptions(model="gpt-4o", input_tokens=20, output_tokens=5),
        )

asyncio.run(main())

Full changelog

See CHANGELOG.md for detailed history.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track BetterDB-inc/monitor

Get notified when new releases ship.

Sign up free

About BetterDB-inc/monitor

Valkey-first observability with Redis compatibility. Query real-time metrics, analyze slow commands, detect hot keys, and investigate performance issues directly from AI coding assistants.

All releases →

Related context

Earlier breaking changes

  • vagent-v1.5.0 TLS is now mandatory; agent exits with error if `--valkey-tls true` is not set for IAM auth mode.

Beta — feedback welcome: [email protected]