Skip to content

Acacian/aegis

v0.9.4 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 1mo MCP Security & Auth
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-security ai-agent-security ai-agents ai-governance ai-safety ai-security
+14 more
audit-trail compliance guardrails langchain llm-security mcp mcp-security model-context-protocol pii-detection policy-as-code policy-engine policy-testing prompt-injection selection-governance

Summary

AI summary

New CLI command aegis check drift adds an offline entropy‑based drift detector with privacy guarantees.

Full changelog

What's New

`aegis check drift` CLI

Offline entropy-based drift detector for saved agent traces. Same signal that `auto_instrument()` exposes at runtime, now runnable on any JSONL trace from LangSmith, OTel, or custom loggers.

```bash
aegis check drift --trace path/to/trace.jsonl
aegis check drift --trace trace.jsonl --baseline gpt-4o-retail.json
aegis check drift --trace trace.jsonl --json --strict
```

Privacy invariant: reads only the `tool_name` field — never args, CoT, or prompts — so enterprise users can score prod traces without exfiltrating PII. Stdlib-only (Counter + math.log, no numpy).

Research: 1,960 Tau-Bench Agent Trajectories

Measured tool distribution drift on sierra-research/tau-bench public trajectories. 39.8% of 812 scored trajectories show measurable collapse (Δ entropy ≥ 0.3 nats). Cross-model gap on the same retail task family: Sonnet 3.5 New 48.2% vs GPT-4o 28.1% (1.7× ratio, n=599). Distribution is bimodal — agents either stay open or fall off a cliff.

  • Post: https://acacian.github.io/aegis/research/tau-bench-tool-distribution-drift/
  • Reproduces in ~30 seconds on a laptop (stdlib only)

4 pillars of differentiation

Unlike LLM-as-judge approaches (Patronus, Braintrust) and fine-tuned classifiers (Galileo, Maxim), the `check drift` metric is simultaneously:

  1. Deterministic — no second LLM judges the first, two runs give bit-identical results
  2. Privacy-preserving — tool names only, no prompt content ever read
  3. Cross-model comparable — normalized Δ on the same scale across GPT-4o and Sonnet
  4. 30-second reproducible — 120 lines of stdlib Python, no numpy or GPU

Other

  • 15 new tests in `tests/cli/test_check.py` including a hard privacy-invariant assertion (PII planted in fixture traces must never appear in any output)
  • `ScholarlyArticle` JSON-LD schema for `/research/*` pages, sitemap tier 0.8, `llms.txt` canonical facts section for LLM crawlers

Full Changelog: https://github.com/Acacian/aegis/compare/v0.9.3...v0.9.4

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Acacian/aegis

Get notified when new releases ship.

Sign up free

About Acacian/aegis

Policy-based governance for AI agent tool calls. YAML policies, approval gates, risk assessment, and audit logging. Cross-platform: LangChain, OpenAI, Anthropic, MCP.

All releases →

Beta — feedback welcome: [email protected]