chernistry/bernstein

v2.5.0 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

agent-orchestrator agentic-ai ai-agents aider air-gap audit-trail

+14 more

claude-code cli-tool codex-cli coding-agent deterministic-replay deterministic-scheduler hmac-audit mcp-server model-context-protocol multi-agent parallel-worktrees provenance python reproducibility

Affected surfaces

auth

Summary

AI summary

Removed baked‑in private infrastructure defaults and added A2A capability cards.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	A2A capability cards enable signed manifest verification between processes. A2A capability cards enable signed manifest verification between processes. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	MCP client now validates upstream capability cards, retries on stream drops, and meters costs per server. MCP client now validates upstream capability cards, retries on stream drops, and meters costs per server. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	MCP server adds prompt catalogue and OAuth-2 PKCE discovery metadata for auto-discovery. MCP server adds prompt catalogue and OAuth-2 PKCE discovery metadata for auto-discovery. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	Deterministic session IDs ensure reproducible replay of runs without collisions. Deterministic session IDs ensure reproducible replay of runs without collisions. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	`bernstein desktop-register --host <name>` registers orchestrator in multiple host config entries. `bernstein desktop-register --host <name>` registers orchestrator in multiple host config entries. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Medium	`bernstein doctor --substrate` reports registered and stale host registrations. `bernstein doctor --substrate` reports registered and stale host registrations. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Bugfix	Medium	Validated `TaskCreate.scope` and `complexity` fields at request boundary, returning 422 for invalid values. Validated `TaskCreate.scope` and `complexity` fields at request boundary, returning 422 for invalid values. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Bugfix	Medium	Removed hardcoded private infrastructure defaults (GlitchTip DSN, telemetry endpoint) from shipped package. Removed hardcoded private infrastructure defaults (GlitchTip DSN, telemetry endpoint) from shipped package. Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—

Full changelog

A note on the voice

v2.4.0 was about observability surfaces, running the four backends through one umbrella so a code-scanning regression or a coverage drop surfaces in the same table as a GlitchTip spike. v2.5.0 is the next question over: now that the orchestrator can see itself, can the hosts an operator already runs see it too. And can it stop quietly phoning home to my private infrastructure when it does.

Interop, finally

The piece that kept blocking me on multi-host runs was the lack of a real handshake. Claude Desktop is one process, Claude Code is another, both can spawn agents, neither knew what the other had already decided. I shipped A2A capability cards (#1698): one process mints a signed manifest of what it can do, the other consumes it, verifies the signature against a trusted-issuer set, and refuses to delegate when the advertised policies do not meet the operator's required policies. The lineage chain rides through the same envelope so the audit trail does not break at the organisation boundary.

The MCP client got the matching upgrade (#1692). Upstream servers will return malformed responses, hang mid-stream, demand re-auth, lie about their capability manifest. The client now treats every upstream as untrusted: capability-card validation before a tool call, retry-with-continuation on dropped streams, in-flight cancellation that preserves partial output, per-server cost metering, schema-violation containment that marks a misbehaving server degraded for the rest of the task. None of this is exotic; it is the brittle-real-world posture that the larger MCP ecosystem will end up needing.

The MCP server side got prompt-catalogue plus OAuth-2 PKCE discovery metadata (#1696, #1709), so auto-discovering hosts that expect a real RFC 8414 / RFC 9728 surface stop skipping us.

bernstein desktop-register

bernstein desktop-register --host <name> (#1697, then #1708 added five more hosts) writes the host-specific config entry for Claude Desktop, Claude Code, Cursor, Continue, Cline, Zed, and Aider. One command. The orchestrator is a guest in the host's settings file; we ship the plugin, the host renders it. bernstein doctor --substrate reports which hosts have us registered, which do not, and which have a stale registration.

The honest disclaimer: if a host changes its plugin spec, the per-host adapter breaks. Each adapter is small enough that a host-spec change is a one-day fix, not a re-architecture.

I removed my private infrastructure from the shipped package

This one was a real silent bug, not a feature. The shipped wheel had errors.bernstein.run baked in as the GlitchTip DSN default, and telemetry.bernstein.run baked in as the telemetry endpoint default. Both backends soft-fail when their env vars are unset, so the package never actually reached out without consent. But the hostnames were sitting there as defaults, which is the kind of thing that turns into a real leak the day someone wires a config they did not read.

#1694 strips those defaults. tests/unit/observability/test_no_hardcoded_infra.py asserts zero operator-private host, IP, or DSN matches in src/ and will fail the build if a future change reintroduces one. Telemetry side-channel is now portable across hosts behind one Sentry-compatible BERNSTEIN_TELEMETRY_DSN (#1691) so each operator runs against their own backend, not mine.

Deterministic replay

Three small things compounded. Session ids are bound deterministically (#1684) so a replayed run reproduces its own event stream without colliding with a sibling. The supervisor enforces a bounded respawn budget and parks an agent when the budget is exhausted (#1683), instead of looping respawns indefinitely. On-disk state has a versioned migrations module (#1689) so an older .sdd/ upgrades predictably. Plus the cosmetic-but-real win: runs surface a memorable deterministic name (#1682) in user-facing output, so the operator can refer to "the brisk-sparrow run" instead of memorising a UUID.

The API stops returning 500 on a fuzzer-found bug

The TaskCreate.scope and complexity fields were typed as plain str with only a length cap. An empty or out-of-range value passed pydantic and then raised ValueError deep in the task store when the enum was constructed, surfacing as an unhandled 500 on POST /tasks and POST /tasks/batch. Schemathesis kept flagging it intermittently and everyone kept rerunning it as a flake. It was not a flake. #1700 validates at the request boundary and returns 422.

What I am not claiming

The two new transports are functional but not load-tested at adversarial scale; the OAuth-2 PKCE discovery surface ships metadata, full token issuance and OIDC federation are deferred to a follow-up. The substrate adapters cover seven hosts; Codex and Gemini CLI are stubbed by design until their respective plugin specs stabilise. The A2A integration honours the protocol as specified at the time of pickup and will need maintenance as the spec evolves.

Try it

pipx install --upgrade bernstein
bernstein interop a2a card --output card.json
bernstein desktop-register --host cursor

Full per-PR notes in docs/release-notes/v2.5.0.md. Source: https://github.com/sipyourdrink-ltd/bernstein (Apache-2.0). 22 commits since v2.4.0.

Breaking Changes

Removed baked‑in defaults `errors.bernstein.run` (GlitchTip DSN) and `telemetry.bernstein.run` (telemetry endpoint).
Telemetry now requires operator‑provided Sentry‑compatible `BERNSTEIN_TELEMETRY_DSN` environment variable.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track chernistry/bernstein

Get notified when new releases ship.

About chernistry/bernstein

Deterministic multi-agent orchestrator for 18 CLI coding agents (Claude Code, Codex, Cursor, Aider, Gemini CLI, OpenAI Agents SDK, and more). MCP server mode (stdio + HTTP/SSE) exposes the orchestrator to any MCP client. Git worktree isolation per agent, HMAC-chained audit trail, cost-aware model routing via contextual bandit. ~11K monthly PyPI downloads, Apache 2.0.

All releases →

Related context

Related tools

Earlier breaking changes

v3.7.1 `bernstein approve` and `bernstein reject` now enforce identifier regex `[A-Za-z0-9._-]{1,64}`.
v3.7.1 Tampered mission ledger reports as unverified rather than not-found.
v3.7.1 `mission define` now refuses phases without gate tasks.
v3.5.0 MCP client, transport, and gateway become stateless; calls carry content‑derived trace IDs in _meta.