chernistry/bernstein

v1.10.5 Security

This release includes 1 security fix for security teams reviewing exposed deployments.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

This release patches 1 known CVE

Topics

agent-orchestrator agentic-ai ai-agents aider air-gap audit-trail

+14 more

claude-code cli-tool codex-cli coding-agent deterministic-replay deterministic-scheduler hmac-audit mcp-server model-context-protocol multi-agent parallel-worktrees provenance python reproducibility

Affected surfaces

auth rbac deps

Summary

AI summary

RFC 3161 timestamp chain validation added to audit log and A2A v1.0 signed agent cards with persistent keystore.

Full changelog

v1.10.5

The compliance and A2A v1.0 release. RFC 3161 timestamp chain validation on the audit log, Sigstore release attestation on published artefacts, A2A v1.0 signed agent cards with persistent keystore + JWKS, plus a DeepSeek V4 family adapter with an EU-residency self-hosted guard. 44 adapters total. Six Hypothesis property-test suites land alongside the new code so the invariants are documented as much as exercised.

Honest framing up front. The compliance and A2A surface ships with tests, operator runbooks, and a standalone DSSE verifier, but it has not been bashed against an external regulatory audit yet. Treat it as code an evaluator can read and stand up themselves, not as production telemetry.

Compliance evidence stack

Three pieces wired together so the audit log can stand on its own as regulatory evidence.

HMAC-SHA256 chained audit log with multi-tenant export. Every emitted event is canonicalised via JCS (RFC 8785), HMAC-tagged, and chained to the previous head. The new export path slices the chain by tenant key without breaking either side. bernstein audit slice extracts a deterministic subset for an evaluator.
RFC 3161 timestamp tokens with chain validation. The audit log head is timestamped against an external TSA (FreeTSA in the test fixture, swappable), and the verifier walks the TSA chain. An Ed25519 signature over the timestamped head closes the loop.
DSSE + in-toto v1 envelope for the export bundle. The standalone verifier at tools/verify_audit_dsse.py depends only on the Python standard library and cryptography. Its test asserts that import bernstein raises ModuleNotFoundError from inside the verifier's venv, which is the property an external auditor wants from a verifier they can run themselves.

EU AI Act Article 12 evidence pack and SOC 2 packs are now wired to the real run-log integration rather than fixture data. The FINOS AIGF control mapping covers 16 of 16 controls after the Sigstore release attestation landed. The mapping document is the spec; assertion against an external audit is future work.

A2A v1.0 signed agent cards

Each agent now publishes a signed agent card at /.well-known/agent.json and the public verification keys at /.well-known/jwks.json. The keystore is persistent, with O_EXCL plus 0o600 semantics on creation and a 24-hour rotation grace window, so an A2A peer that fetched JWKS five minutes ago can still verify the previous key after rotation without race conditions.

The signing path is JWS detached signature (RFC 7515) over JCS bytes with Ed25519 (RFC 8037), and audience binding uses RFC 8707 resource indicators. The cold-start RLock fix shipped in this release closes a self-deadlock where the first JWKS fetch could re-acquire the lock under itself.

Adapter additions

DeepSeek V4-Flash and V4-Pro. Self-hosted via an Ollama-compatible endpoint. The adapter ships an EU-residency guard that pins the endpoint host and rejects DNS rebinding via the loopback test. The Hypothesis bug-hunt suite (see Property tests below) caught a 10.example.com rebinding bypass during development.
Adapter inventory. 44 adapters total. The Junie and Q Developer adapters that landed in 1.10.1 are now logged in CHANGELOG.md against their actual ship dates.

Security primitives

OWASP ASI01-10 detector pack (off by default). Static-rule matchers for the OWASP Application Security Initiative top-10 categories. Reads more like a linter than a runtime guard at this stage.
MCP server Ed25519 signing + supply-chain scanner. Signs every MCP server tool manifest with a per-installation Ed25519 keypair and walks the dependency graph for known-bad transitive deps.
Default-on credential scoping. Adapters that previously inherited the host environment now run under a scoped credential view, with the unscoped path behind an opt-in flag for the rare adapter that needs it.
Identity v1.0 fingerprint. HMAC-SHA256 over operator_seed || install_nonce || version_major produces an 80-bit base32 fingerprint per install. Off by default; kill switch is BERNSTEIN_DISABLE_IDENTITY=1. Wired into YAML output, traces, and role prompts.

Research-grade slices

Each item ships as a smallest-viable slice rather than a finished feature, so the spec, the test, and the runtime artefact all exist but the operational surface is intentionally thin:

Wiki build. Generates a per-repo wiki from the agents.md canonical IR.
Append-only JSONL memory log. One file per run; consumers read by tail.
Deterministic sandbox backend selector. Picks Docker / E2B / Modal based on tags rather than env races.
audit slice deterministic subset extractor. Pairs with the multi-tenant audit chain export.
--max-cost-usd hard cap. Aborts a run when cumulative routed model spend crosses the threshold.
Team-hub convention paths + manifest loader. Common manifest paths under .team-hub/ so multi-repo projects share config without symlinks.
bernstein scaffold <prompt> first slice. Prompt-to-repo scaffolder.
A/B runner primitive. Eval harness for comparing two adapter configurations on the same task set.

Observability and orchestrator

Three hardening primitives (concurrency limits, deadline enforcement, budget guard) wired into the orchestrator runtime so they engage on every run rather than living as off-by-default research code.
Opt-in LLM watcher (Haiku). A side-channel observer that reads the deterministic loop's events and annotates them with a natural-language summary. Off by default; useful for explaining a failed run to a human reviewer.

Lineage and EU residency

KMS adapters added for AWS, GCP, and Azure key vaults; lineage signatures are now keyed off operator-controlled KMS rather than per-process Ed25519.
Customer countersign step on lineage verification so the artefact carries both bernstein's signature and the operator's.
EU-residency loopback test that exercises the DNS rebinding edge case mentioned under the DeepSeek adapter.

CI hardening

A property-test stack landed alongside the new code:

Hypothesis on the audit chain, agent card signing, capability matrix, adapter spawn contract, lineage + EU residency, and WAL + CAS recovery. Each suite documents its invariants as docstrings on the failing-but-expected xfail(strict=True) cases so the regression budget is explicit.
Static analysis: Semgrep with custom rules, Bandit baseline, pip-audit on every PR, Schemathesis against the OpenAPI surface.
Type discipline: Beartype runtime checks on selected hot paths, pyright strict zone for the audit and identity packages.
Snapshot regressions: syrupy on CLI output golden files; mutmut diff-mode in the nightly workflow.
Nightly deep workflow that runs the slow property suites and Schemathesis fuzz long-form.

Bug fixes

JWKS cold-start self-deadlock. The first JWKS fetch acquired _KEY_LOCK then called into _get_keystore, which re-acquired the same lock. Replaced with RLock and a comment explaining why.
Audit log binary-append on Windows. Python text-mode writes translate \n to \r\n, which broke byte-level chain verification. Writer switched to open("a", encoding="utf-8", newline="").
Article 12 bundle canonical bytes alignment. _build_event_log used compact JSON separators while AuditLog.verify re-canonicalised with default separators, so the byte-equality check failed on otherwise-correct lines. Both paths now emit the same canonical form.
Test fixture cleanup. Dropped the removed mix_stderr argument and added the no_watchdog_threads fixture that several tests had been silently relying on.

Dependency updates

The bulk are routine dependabot bumps. Three worth flagging:

Click 8.3.3. Required moving semgrep out of the [dev] extras and into a uv tool install to break a transitive pin (semgrep<1.137 needed opentelemetry-sdk<1.26; semgrep>=1.137 needed click<8.2).
OpenTelemetry SDK 1.41.1. Lockfile regenerated on top of the Click 8.3.3 path.
Schemathesis 4.18.1. New major version; the in-repo Schemathesis suites are unchanged in semantics.

GitHub Actions: setup-uv@7, checkout@6, [email protected], upload-artifact@7, sigstore/[email protected], [email protected], [email protected].

Documentation

Lethal-trifecta operator-facing security model. Names the three classes of attack the audit chain is designed to defeat (insider, supply chain, post-hoc tamper) and the threat-model gaps it explicitly does not cover.
HMAC-chained audit log operator guide and regulatory lineage export operator guide. Both written for the on-call operator who needs to produce evidence in two hours.
agents-md cross-CLI sync. New docs page for the canonical IR that fans out to five vendor formats (Claude Code, Cursor, Codex, Junie, Q Developer), plus a nav entry.
Enterprise modernization-fit gap analysis and citation-surface RFC anchors with dated stats. Long-form docs added under docs/research/.

Voice

Anti-AI-tell pass on CLI + role templates. The CLI help text and the per-role prompt templates were running on a uniform mid-formal register. Each was rewritten to match its actual audience: terse for --help, concrete for the role prompts.

Full changelog: https://github.com/sipyourdrink-ltd/bernstein/compare/v1.10.4...v1.10.5

Security Fixes

Fixed JWKS cold‑start self‑deadlock in A2A agent card signing path

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track chernistry/bernstein

Get notified when new releases ship.

About chernistry/bernstein

Deterministic multi-agent orchestrator for 18 CLI coding agents (Claude Code, Codex, Cursor, Aider, Gemini CLI, OpenAI Agents SDK, and more). MCP server mode (stdio + HTTP/SSE) exposes the orchestrator to any MCP client. Git worktree isolation per agent, HMAC-chained audit trail, cost-aware model routing via contextual bandit. ~11K monthly PyPI downloads, Apache 2.0.

All releases →

Related context

Related tools

Earlier breaking changes

v3.7.1 `bernstein approve` and `bernstein reject` now enforce identifier regex `[A-Za-z0-9._-]{1,64}`.
v3.7.1 Tampered mission ledger reports as unverified rather than not-found.
v3.7.1 `mission define` now refuses phases without gate tasks.
v3.5.0 MCP client, transport, and gateway become stateless; calls carry content‑derived trace IDs in _meta.