This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+2 more
Summary
AI summaryNew end‑to‑end benchmark harness, LongMemEval integration with 81.6 % R@5 held‑out generalization.
Full changelog
What's New
- Benchmark harness: end-to-end
WaggleAdapterconnecting the graph engine to ConvoMem / MemBench runners with automated exact-match scoring and latency logging. - LongMemEval integration: CLI-driven ingestion and retrieval evaluation against the official LongMemEval split — 81.6% R@5 held-out is the headline generalization number.
- Logging utilities: structured log helpers (
logging_utils) for consistent, level-aware output across all subsystems. - Evidence tracking:
evidence.pyrecords source provenance on stored nodes so reasoning chains are fully traceable. - Observability stack: Grafana dashboard, Prometheus config, and Docker Compose overlay in
deploy/observability/. - Kubernetes manifests: production-grade
deployment.yaml, network policy, external-secret, and certificate templates underdeploy/kubernetes/. - Operational runbooks: incident response, secret management, API-key rotation, and onboarding guides in
docs/runbooks/. - README: honest benchmark presentation (held-out number leads), audience guide (individual dev vs. team), visible edges warning in Quick Start.
Install
pip install waggle-mcp==0.1.7
Honest benchmark note
81.6% R@5 is the held-out LongMemEval number — not used during development. The full-split ceiling of 97.4% is a retrieval bound on the saved benchmark setup. Both are real; the held-out one is the honest generalization number.
Deduplication recall sits at 77.3% (zero false-positive merges maintained). Improving recall is the primary 0.1.8 focus.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Abhigyan-Shekhar/Waggle-mcp
Persistent graph memory for AI agents. Drop a conversation turn in via `observe_conversation()` and facts are auto-extracted, stored as typed graph nodes with local semantic embeddings (no API key). Supports temporal queries ("what did we decide last week?")
Related context
Beta — feedback welcome: [email protected]