Acacian/aegis

v0.9.4 Feature

This release adds 2 notable features for engineering teams evaluating rollout.

Published 3mo MCP Security & Auth

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-security ai-agent-security ai-agents ai-governance ai-safety ai-security

+14 more

audit-trail compliance guardrails langchain llm-security mcp mcp-security model-context-protocol pii-detection policy-as-code policy-engine policy-testing prompt-injection selection-governance

Summary

AI summary

New CLI command aegis check drift adds an offline entropy‑based drift detector with privacy guarantees.

Full changelog

What's New

`aegis check drift` CLI

Offline entropy-based drift detector for saved agent traces. Same signal that `auto_instrument()` exposes at runtime, now runnable on any JSONL trace from LangSmith, OTel, or custom loggers.

```bash
aegis check drift --trace path/to/trace.jsonl
aegis check drift --trace trace.jsonl --baseline gpt-4o-retail.json
aegis check drift --trace trace.jsonl --json --strict
```

Privacy invariant: reads only the `tool_name` field — never args, CoT, or prompts — so enterprise users can score prod traces without exfiltrating PII. Stdlib-only (Counter + math.log, no numpy).

Research: 1,960 Tau-Bench Agent Trajectories

Measured tool distribution drift on sierra-research/tau-bench public trajectories. 39.8% of 812 scored trajectories show measurable collapse (Δ entropy ≥ 0.3 nats). Cross-model gap on the same retail task family: Sonnet 3.5 New 48.2% vs GPT-4o 28.1% (1.7× ratio, n=599). Distribution is bimodal — agents either stay open or fall off a cliff.

Post: https://acacian.github.io/aegis/research/tau-bench-tool-distribution-drift/
Reproduces in ~30 seconds on a laptop (stdlib only)

4 pillars of differentiation

Unlike LLM-as-judge approaches (Patronus, Braintrust) and fine-tuned classifiers (Galileo, Maxim), the `check drift` metric is simultaneously:

Deterministic — no second LLM judges the first, two runs give bit-identical results
Privacy-preserving — tool names only, no prompt content ever read
Cross-model comparable — normalized Δ on the same scale across GPT-4o and Sonnet
30-second reproducible — 120 lines of stdlib Python, no numpy or GPU

Other

15 new tests in `tests/cli/test_check.py` including a hard privacy-invariant assertion (PII planted in fixture traces must never appear in any output)
`ScholarlyArticle` JSON-LD schema for `/research/*` pages, sitemap tier 0.8, `llms.txt` canonical facts section for LLM crawlers

Full Changelog: https://github.com/Acacian/aegis/compare/v0.9.3...v0.9.4

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Acacian/aegis

Get notified when new releases ship.

About Acacian/aegis

Policy-based governance for AI agent tool calls. YAML policies, approval gates, risk assessment, and audit logging. Cross-platform: LangChain, OpenAI, Anthropic, MCP.

All releases →