This release includes 3 security fixes for security teams reviewing exposed deployments.
Topics
+14 more
Affected surfaces
ReleasePort's take
Light signalRelease v2.6.0 adds new bidirectional drivers for Slack and Discord, introduces image‑attachment provenance with SHA‑256 logging, enables per‑step session replay with hash‑chained journals, supports recurring goals via an internal cron scheduler, and implements signed supervisor escalation receipts.
Why it matters: These feature additions expand chat integration options, enhance auditability of media attachments, improve observability through deterministic replay, enable automated goal scheduling, and strengthen stall detection for supervisors—critical for developers, SREs, and security engineers managing complex workflows.
Summary
AI summaryUpdates Reliability and correctness fixes, Evaluation and observability, and CI and quality infrastructure across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Adds Slack bidirectional driver with signed approvals. Adds Slack bidirectional driver with signed approvals. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Feature | Medium |
Adds Discord bidirectional driver with scheduling fence and signed approvals. Adds Discord bidirectional driver with scheduling fence and signed approvals. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Feature | Medium |
Adds image‑attachment provenance with SHA‑256 recording in audit chain. Adds image‑attachment provenance with SHA‑256 recording in audit chain. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Feature | Medium |
Adds per‑step session replay with hash‑chained journal and deterministic fork. Adds per‑step session replay with hash‑chained journal and deterministic fork. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Feature | Medium |
Adds operator‑registered recurring goals with internal cron scheduler. Adds operator‑registered recurring goals with internal cron scheduler. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Feature | Medium |
Adds signed supervisor escalation receipts for stall detection. Adds signed supervisor escalation receipts for stall detection. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Feature | Medium |
Adds skill catalog with signed manifest installs and lockfile consistency. Adds skill catalog with signed manifest installs and lockfile consistency. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Feature | Low |
Introduces per-step worktree GC reaps anchored to the audit chain with fail-closed behavior. Introduces per-step worktree GC reaps anchored to the audit chain with fail-closed behavior. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Makes deterministic replay hermetic, reporting cache misses and strict violations. Makes deterministic replay hermetic, reporting cache misses and strict violations. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Adds GlitchTip event ingester converting self-hosted error‑tracker issues to regression eval cases. Adds GlitchTip event ingester converting self-hosted error‑tracker issues to regression eval cases. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Adds CI-failure post-mortem ingestion synthesizing regression eval cases from fix‑up commits. Adds CI-failure post-mortem ingestion synthesizing regression eval cases from fix‑up commits. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Implements nightly real-run canary executing end‑to‑end flows against a deterministic stub adapter. Implements nightly real-run canary executing end‑to‑end flows against a deterministic stub adapter. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Adds multi-adapter pentest fan‑out aggregating consensus on vulnerability types and paths. Adds multi-adapter pentest fan‑out aggregating consensus on vulnerability types and paths. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Integrates consolidated SonarQube findings tracker auto‑rendered from live Sonar API. Integrates consolidated SonarQube findings tracker auto‑rendered from live Sonar API. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Routes terminal orchestration failures to the configured error sink. Routes terminal orchestration failures to the configured error sink. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Adds per-row source‑adapter provenance column to SQLiteMemoryStore with migration support. Adds per-row source‑adapter provenance column to SQLiteMemoryStore with migration support. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Introduces merge‑gate stack with autosync, main‑red guard, nightly drift sweep, and native merge queue testing. Introduces merge‑gate stack with autosync, main‑red guard, nightly drift sweep, and native merge queue testing. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Shards unit‑test job across parallel runners for scalable isolated suite execution. Shards unit‑test job across parallel runners for scalable isolated suite execution. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Adds coverage ratchet gate preventing regression of total coverage and nudging diff‑coverage floor. Adds coverage ratchet gate preventing regression of total coverage and nudging diff‑coverage floor. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Expands static‑analysis sweeper to MAJOR Sonar findings, creating backlog tickets keyed by issue ID. Expands static‑analysis sweeper to MAJOR Sonar findings, creating backlog tickets keyed by issue ID. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Makes unit tests hermetic, blocking real outbound network connections in test suite. Makes unit tests hermetic, blocking real outbound network connections in test suite. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Feature | Low |
Provides opt‑in telemetry foundation with consent CLI, schema guard, and off‑by‑default proof. Provides opt‑in telemetry foundation with consent CLI, schema guard, and off‑by‑default proof. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Bugfix | Medium |
Fixes deterministic replay to abort on cache miss instead of calling live model. Fixes deterministic replay to abort on cache miss instead of calling live model. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Bugfix | Medium |
Fixes HSM lineage to fail fast when no real adapter is present; stub opt‑in via env var. Fixes HSM lineage to fail fast when no real adapter is present; stub opt‑in via env var. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Bugfix | Medium |
Fixes MCP OAuth discovery metadata and improves Tier‑3 cordon handling of deletions/renames. Fixes MCP OAuth discovery metadata and improves Tier‑3 cordon handling of deletions/renames. Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Bugfix | Medium |
Ensures sensitive paths are no longer logged in clear text; forensic record keeps hashed digests. Ensures sensitive paths are no longer logged in clear text; forensic record keeps hashed digests. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Bugfix | Low |
Resolves multiple SonarQube findings (S8413, S125, S3516, S5754) and performs refurb sweep across new subsystems. Resolves multiple SonarQube findings (S8413, S125, S3516, S5754) and performs refurb sweep across new subsystems. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Bugfix | Low |
Declares per‑job permissions explicitly in post‑CI dispatcher and syncs secret expectations with GlitchTip forward. Declares per‑job permissions explicitly in post‑CI dispatcher and syncs secret expectations with GlitchTip forward. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Bugfix | Low |
Corrupts leading signature byte in tampered-signature catalog test to ensure deterministic verification failure. Corrupts leading signature byte in tampered-signature catalog test to ensure deterministic verification failure. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Bugfix | Low |
Rejects `interactive: true` at config‑load time, preventing mid‑run crashes. Rejects `interactive: true` at config‑load time, preventing mid‑run crashes. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
Full changelog
v2.6.0
Released 2026-05-22.
A large release. Highlights: bidirectional chat drivers with verifiable approvals, per-step replay with a hash-chained journal, operator-registered recurring goals, a signed supervisor surface, a skill catalog with signed manifests, image-attachment provenance, and a sharded CI test suite behind a native merge queue.
Chat and operator surfaces
- Slack bidirectional driver: drive a session, approve or reject a tool call, and watch streamed output from Slack. Every approval is recorded as a signed entry in the audit chain (covering approver, message timestamp, decision, and tool-call hash), approval scope is pinned to the worker's git worktree, and outbound messages carry an Ed25519 signature so a recipient can verify the workspace identity. Optional
bernstein[slack]extra. (#1794) - Discord bidirectional driver: the same attested-approval model as Slack, plus a per-channel scheduling fence so tasks emitted from one channel cannot land on workers bound to another. Optional
bernstein[discord]extra. (#1795) - Image attachment passthrough:
bernstein run "<prompt>" --attach ./shot.pngcarries an image to a vision-capable adapter (Claude, Gemini). The image's SHA-256 is recorded in the audit chain at decision time and anchored as a lineage parent of any artefact produced that turn; spawning with--attachon a non-multimodal adapter fails before any process launches. (#1797)
Orchestration
- Per-step session replay: each agent step is recorded in a hash-chained journal where
step_hash = H(prev_hash, input_hash, model, prompt, tool_call, tool_result).bernstein replay <agent-id>walks the chain,bernstein session fork <id> --from-step Nbranches a sibling worktree from a chosen step, and replay divergence surfaces as a precise hash mismatch rather than a flaky result. Exported receipts verify offline against the install public key. (#1799) - Operator-registered recurring goals:
bernstein schedule add --cron "<expr>" --goal "<text>"registers a recurring goal that fires inside a single installation, no host cron or external scheduler required. Each fire is a deterministic projection of(schedule_id, fire_time, last_state)onto a canonical task graph and is recorded in the audit chain, sobernstein schedule auditcan prove a nightly sequence ran exactly as expected. (#1798) - Operator supervisor surface:
bernstein supervisor statusaggregates the existing stall, watchdog, and respawn-budget detectors into one view. A detected stall produces a signed escalation receipt (last audit entries, identity tokens, structured reason, and a deterministic recommended action) that any verifier can check offline. (#1800) - Worktree GC reaps are now anchored to the audit chain: each reap appends a
worktree.reapevent capturing the pre-deletion git HEAD and a clean/dirty flag, and the reap is fail-closed (a worktree is not deleted if the reap cannot be recorded). (#1833) - Deterministic replay is now hermetic: a cache miss in replay mode raises a typed error and aborts instead of silently calling the live model, the replay key folds in provider, temperature, and max-tokens, and a coverage line reports hits, misses, and strict violations. A non-strict fall-through stays available behind an explicit, logged opt-in. (#1832)
Skills
- Skill catalog with signed manifest installs:
bernstein skills catalog browse|search|install|upgrade|info|status. Each install appends a signed audit-chain entry referencing the manifest URL and content digest, refuses unverified manifests by default, keeps a lockfile that stays consistent across parallel worktrees, and a CI lineage gate rejects a lockfile referencing an unknown manifest digest. (#1796) - Skill lifecycle CLI foundation: install, sync, lock, lint, watch, and a local activation log with an env-var opt-out. (#1734)
Evaluation and observability
- GlitchTip event ingester: a scraper turns open self-hosted error-tracker issues into one regression eval case each, deduped on the issue id, with administrative wiring-probe issues filtered out. The nightly real-run canary feeds this loop. (#1820)
- CI-failure post-mortem ingestion: a scraper walks merged PRs that needed fix-up commits and synthesizes regression eval cases, so the eval suite tracks the failure modes that surface first in CI. (#1793)
- Nightly real-run canary: a scheduled job runs real end-to-end flows (worker spawn, git worktree lifecycle, audit-chain append plus verify, signed lineage receipt) against a deterministic stub adapter, with no API key or network, and routes any failure to the telemetry sink. (#1822)
- Multi-adapter pentest fan-out:
bernstein eval pentest --adapters a,b,cruns one scenario across adapters and aggregates consensus on(canonical_vuln_type, normalized_path). Single-adapter behaviour stays byte-identical. (#1754) - Consolidated SonarQube findings tracker auto-rendered from the live Sonar API. (#1781)
- Terminal orchestration failures route to the configured error sink. (#1762)
- Per-row source-adapter provenance for the memory subsystem:
SQLiteMemoryStorecarries an optionalsource_adaptercolumn withadd_manyandquery(read_only_from_adapters=[...]), via an additive NULL-backfill migration. (#1759)
CI and quality infrastructure
- Merge-gate stack: pre-merge autosync regenerates mirror docs and formatting on PR branches, a main-red guard blocks merges while main is red, a nightly drift sweep opens a PR on accumulated drift, and the suite now responds to
merge_group:so a native merge queue tests each PR against the combined branch. Operator runbook atdocs/operations/merge-queue.md. (#1756) - The unit-test job is sharded across parallel runners (
scripts/run_tests.py --shard i/N), so the per-file isolated suite scales as the test count grows. (#1845) - Coverage ratchet: a monotonic gate keeps total coverage from regressing and nudges the per-PR diff-coverage floor up over time. (#1829)
- Static-analysis sweeper for Sonar findings, widened to MAJOR severity, turns open findings into backlog tickets keyed on the Sonar issue id. (#1763 / #1819)
- Unit tests are now hermetic: an autouse guard blocks real outbound network connections in
tests/unit/(loopback allowed), so a network-dependent unit test fails deterministically instead of flaking in CI. (#1856) - Opt-in telemetry foundation: consent CLI, a schema guard, and an off-by-default proof. (#1736)
Reliability and correctness fixes
- Deterministic replay no longer calls the live model on a cache miss (see Orchestration above). (#1832)
- HSM lineage kind fails fast at config-load when no real adapter is on the classpath; opt-in to the stub via
BERNSTEIN_ALLOW_HSM_STUB=1. (#1753) - MCP OAuth discovery metadata corrected; the Tier-3 cordon now catches deletions and renames. (#1755)
- Sensitive paths are no longer logged in clear text on the always-allow tamper path; the forensic record keeps the full value with pre-hashed digests. (#1814)
- Resolved SonarQube findings: S8413 router double-mount, S125 commented-code, S3516 invariant-return, S5754 broad-except, plus a refurb sweep across the new subsystems. (#1786 / #1787 / #1788 / #1807 / #1813 / #1814 / #1815 / #1817 / #1818)
- Post-CI dispatcher declares per-job permissions explicitly; its child-secret expectations were synced with the GlitchTip forward. (#1746 / #1801)
- The tampered-signature catalog test corrupts the leading signature byte so verification fails deterministically (a trailing-byte flip could be a no-op and let the test reach the network). (#1843)
- Workflow loader rejects
interactive: trueat config-load instead of crashing mid-run (closes #1110). (#1760) - Adapter dual-binary discovery handles the antigravity / gemini migration. (#1748 and follow-ups)
Internals and docs hygiene
- Dropped the unused
bernstein.benchmark.head_to_headmodule from the wheel. (#1767) - Reorganized the docs tree; internal working notes consolidated under
docs/_internal/. (#1768) - Front-page content, page metadata, and stale README translations refreshed. (#1769)
- Repository-wide formatting pass; AGENTS.md mirror set regenerated. (#1771 / #1776 / #1777)
Operator follow-ups
- Set
BERNSTEIN_AUTOSYNC_TOKEN(fine-grained PAT or GitHub App token withcontents:write) so autosync amends trigger downstream CI without manual empty commits. - Set
vars.PR_TEXT_HYGIENE_DENYLISTto activate the PR text-hygiene gate. - Configure the chat drivers (
BERNSTEIN_SLACK_TOKEN/BERNSTEIN_DISCORD_TOKEN) and the telemetry DSN to enable the chat surfaces and the GlitchTip ingester loop. - Graduate the Sonar sweeper severity per
docs/operations/sonar-sweeper.mdand review the merge-queue runbook atdocs/operations/merge-queue.md.
Security Fixes
- Image attachment provenance records SHA‑256 in audit chain, anchoring artifact lineage
- HSM lineage now fails fast when no real adapter is present; stub opt‑in via BERNSTEIN_ALLOW_HSM_STUB=1
- Sensitive paths redacted from logs with pre‑hashed digests to prevent clear‑text exposure
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About chernistry/bernstein
Deterministic multi-agent orchestrator for 18 CLI coding agents (Claude Code, Codex, Cursor, Aider, Gemini CLI, OpenAI Agents SDK, and more). MCP server mode (stdio + HTTP/SSE) exposes the orchestrator to any MCP client. Git worktree isolation per agent, HMAC-chained audit trail, cost-aware model routing via contextual bandit. ~11K monthly PyPI downloads, Apache 2.0.
Related context
Related tools
Beta — feedback welcome: [email protected]