This release includes 1 security fix for security teams reviewing exposed deployments.
Topics
+14 more
Affected surfaces
Summary
AI summaryRFC 3161 timestamp chain validation added to audit log and A2A v1.0 signed agent cards with persistent keystore.
Full changelog
v1.10.5
The compliance and A2A v1.0 release. RFC 3161 timestamp chain validation on the audit log, Sigstore release attestation on published artefacts, A2A v1.0 signed agent cards with persistent keystore + JWKS, plus a DeepSeek V4 family adapter with an EU-residency self-hosted guard. 44 adapters total. Six Hypothesis property-test suites land alongside the new code so the invariants are documented as much as exercised.
Honest framing up front. The compliance and A2A surface ships with tests, operator runbooks, and a standalone DSSE verifier, but it has not been bashed against an external regulatory audit yet. Treat it as code an evaluator can read and stand up themselves, not as production telemetry.
Compliance evidence stack
Three pieces wired together so the audit log can stand on its own as regulatory evidence.
- HMAC-SHA256 chained audit log with multi-tenant export. Every emitted event is canonicalised via JCS (RFC 8785), HMAC-tagged, and chained to the previous head. The new export path slices the chain by tenant key without breaking either side.
bernstein audit sliceextracts a deterministic subset for an evaluator. - RFC 3161 timestamp tokens with chain validation. The audit log head is timestamped against an external TSA (FreeTSA in the test fixture, swappable), and the verifier walks the TSA chain. An Ed25519 signature over the timestamped head closes the loop.
- DSSE + in-toto v1 envelope for the export bundle. The standalone verifier at
tools/verify_audit_dsse.pydepends only on the Python standard library andcryptography. Its test asserts thatimport bernsteinraisesModuleNotFoundErrorfrom inside the verifier's venv, which is the property an external auditor wants from a verifier they can run themselves.
EU AI Act Article 12 evidence pack and SOC 2 packs are now wired to the real run-log integration rather than fixture data. The FINOS AIGF control mapping covers 16 of 16 controls after the Sigstore release attestation landed. The mapping document is the spec; assertion against an external audit is future work.
A2A v1.0 signed agent cards
Each agent now publishes a signed agent card at /.well-known/agent.json and the public verification keys at /.well-known/jwks.json. The keystore is persistent, with O_EXCL plus 0o600 semantics on creation and a 24-hour rotation grace window, so an A2A peer that fetched JWKS five minutes ago can still verify the previous key after rotation without race conditions.
The signing path is JWS detached signature (RFC 7515) over JCS bytes with Ed25519 (RFC 8037), and audience binding uses RFC 8707 resource indicators. The cold-start RLock fix shipped in this release closes a self-deadlock where the first JWKS fetch could re-acquire the lock under itself.
Adapter additions
- DeepSeek V4-Flash and V4-Pro. Self-hosted via an Ollama-compatible endpoint. The adapter ships an EU-residency guard that pins the endpoint host and rejects DNS rebinding via the loopback test. The Hypothesis bug-hunt suite (see Property tests below) caught a
10.example.comrebinding bypass during development. - Adapter inventory. 44 adapters total. The Junie and Q Developer adapters that landed in 1.10.1 are now logged in
CHANGELOG.mdagainst their actual ship dates.
Security primitives
- OWASP ASI01-10 detector pack (off by default). Static-rule matchers for the OWASP Application Security Initiative top-10 categories. Reads more like a linter than a runtime guard at this stage.
- MCP server Ed25519 signing + supply-chain scanner. Signs every MCP server tool manifest with a per-installation Ed25519 keypair and walks the dependency graph for known-bad transitive deps.
- Default-on credential scoping. Adapters that previously inherited the host environment now run under a scoped credential view, with the unscoped path behind an opt-in flag for the rare adapter that needs it.
- Identity v1.0 fingerprint. HMAC-SHA256 over
operator_seed || install_nonce || version_majorproduces an 80-bit base32 fingerprint per install. Off by default; kill switch isBERNSTEIN_DISABLE_IDENTITY=1. Wired into YAML output, traces, and role prompts.
Research-grade slices
Each item ships as a smallest-viable slice rather than a finished feature, so the spec, the test, and the runtime artefact all exist but the operational surface is intentionally thin:
- Wiki build. Generates a per-repo wiki from the agents.md canonical IR.
- Append-only JSONL memory log. One file per run; consumers read by tail.
- Deterministic sandbox backend selector. Picks Docker / E2B / Modal based on tags rather than env races.
audit slicedeterministic subset extractor. Pairs with the multi-tenant audit chain export.--max-cost-usdhard cap. Aborts a run when cumulative routed model spend crosses the threshold.- Team-hub convention paths + manifest loader. Common manifest paths under
.team-hub/so multi-repo projects share config without symlinks. bernstein scaffold <prompt>first slice. Prompt-to-repo scaffolder.- A/B runner primitive. Eval harness for comparing two adapter configurations on the same task set.
Observability and orchestrator
- Three hardening primitives (concurrency limits, deadline enforcement, budget guard) wired into the orchestrator runtime so they engage on every run rather than living as off-by-default research code.
- Opt-in LLM watcher (Haiku). A side-channel observer that reads the deterministic loop's events and annotates them with a natural-language summary. Off by default; useful for explaining a failed run to a human reviewer.
Lineage and EU residency
- KMS adapters added for AWS, GCP, and Azure key vaults; lineage signatures are now keyed off operator-controlled KMS rather than per-process Ed25519.
- Customer countersign step on lineage verification so the artefact carries both bernstein's signature and the operator's.
- EU-residency loopback test that exercises the DNS rebinding edge case mentioned under the DeepSeek adapter.
CI hardening
A property-test stack landed alongside the new code:
- Hypothesis on the audit chain, agent card signing, capability matrix, adapter spawn contract, lineage + EU residency, and WAL + CAS recovery. Each suite documents its invariants as docstrings on the failing-but-expected
xfail(strict=True)cases so the regression budget is explicit. - Static analysis: Semgrep with custom rules, Bandit baseline, pip-audit on every PR, Schemathesis against the OpenAPI surface.
- Type discipline: Beartype runtime checks on selected hot paths, pyright strict zone for the audit and identity packages.
- Snapshot regressions: syrupy on CLI output golden files; mutmut diff-mode in the nightly workflow.
- Nightly deep workflow that runs the slow property suites and Schemathesis fuzz long-form.
Bug fixes
- JWKS cold-start self-deadlock. The first JWKS fetch acquired
_KEY_LOCKthen called into_get_keystore, which re-acquired the same lock. Replaced withRLockand a comment explaining why. - Audit log binary-append on Windows. Python text-mode writes translate
\nto\r\n, which broke byte-level chain verification. Writer switched toopen("a", encoding="utf-8", newline=""). - Article 12 bundle canonical bytes alignment.
_build_event_logused compact JSON separators whileAuditLog.verifyre-canonicalised with default separators, so the byte-equality check failed on otherwise-correct lines. Both paths now emit the same canonical form. - Test fixture cleanup. Dropped the removed
mix_stderrargument and added theno_watchdog_threadsfixture that several tests had been silently relying on.
Dependency updates
The bulk are routine dependabot bumps. Three worth flagging:
- Click 8.3.3. Required moving
semgrepout of the[dev]extras and into auv tool installto break a transitive pin (semgrep<1.137neededopentelemetry-sdk<1.26;semgrep>=1.137neededclick<8.2). - OpenTelemetry SDK 1.41.1. Lockfile regenerated on top of the Click 8.3.3 path.
- Schemathesis 4.18.1. New major version; the in-repo Schemathesis suites are unchanged in semantics.
GitHub Actions: setup-uv@7, checkout@6, [email protected], upload-artifact@7, sigstore/[email protected], [email protected], [email protected].
Documentation
- Lethal-trifecta operator-facing security model. Names the three classes of attack the audit chain is designed to defeat (insider, supply chain, post-hoc tamper) and the threat-model gaps it explicitly does not cover.
- HMAC-chained audit log operator guide and regulatory lineage export operator guide. Both written for the on-call operator who needs to produce evidence in two hours.
- agents-md cross-CLI sync. New docs page for the canonical IR that fans out to five vendor formats (Claude Code, Cursor, Codex, Junie, Q Developer), plus a nav entry.
- Enterprise modernization-fit gap analysis and citation-surface RFC anchors with dated stats. Long-form docs added under
docs/research/.
Voice
- Anti-AI-tell pass on CLI + role templates. The CLI help text and the per-role prompt templates were running on a uniform mid-formal register. Each was rewritten to match its actual audience: terse for
--help, concrete for the role prompts.
Full changelog: https://github.com/sipyourdrink-ltd/bernstein/compare/v1.10.4...v1.10.5
Security Fixes
- Fixed JWKS cold‑start self‑deadlock in A2A agent card signing path
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About chernistry/bernstein
Deterministic multi-agent orchestrator for 18 CLI coding agents (Claude Code, Codex, Cursor, Aider, Gemini CLI, OpenAI Agents SDK, and more). MCP server mode (stdio + HTTP/SSE) exposes the orchestrator to any MCP client. Git worktree isolation per agent, HMAC-chained audit trail, cost-aware model routing via contextual bandit. ~11K monthly PyPI downloads, Apache 2.0.
Related context
Related tools
Beta — feedback welcome: [email protected]