chernistry/bernstein

v2.3.0 Security

This release includes 4 security fixes for security teams reviewing exposed deployments.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

This release patches 4 known CVEs

Topics

agent-orchestrator agentic-ai ai-agents aider air-gap audit-trail

+14 more

claude-code cli-tool codex-cli coding-agent deterministic-replay deterministic-scheduler hmac-audit mcp-server model-context-protocol multi-agent parallel-worktrees provenance python reproducibility

Affected surfaces

auth

Summary

AI summary

Broad release touches Highlights, Internal / quality, adapters, and orchestration.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	10 backlog-tracker adapters now ship under single TrackerContract. 10 backlog-tracker adapters now ship under single TrackerContract. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Webhook ingestion and plugin hookspec for third-party tracker plugins added. Webhook ingestion and plugin hookspec for third-party tracker plugins added. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Issue-to-PR pipeline introduced in orchestration loop. Issue-to-PR pipeline introduced in orchestration loop. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Tracker comments used as multi-agent handoff message bus. Tracker comments used as multi-agent handoff message bus. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Review-bot acknowledgement gate blocks merge until must-address findings addressed. Review-bot acknowledgement gate blocks merge until must-address findings addressed. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Signed lineage audit log captures signed tracker state moves. Signed lineage audit log captures signed tracker state moves. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Playwright self-testing sandbox for UI/web agent runs added. Playwright self-testing sandbox for UI/web agent runs added. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Secrets broker provides short-lived per-task tokens. Secrets broker provides short-lived per-task tokens. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Progress-watch liveness probe via session-log growth implemented. Progress-watch liveness probe via session-log growth implemented. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Scheduled upstream-signal sweep with operator rollup added. Scheduled upstream-signal sweep with operator rollup added. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Directory-based instance registry for multi-instance hosts introduced. Directory-based instance registry for multi-instance hosts introduced. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	YAML eval harness implemented. YAML eval harness implemented. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Telemetry-grounded autofix MVP added. Telemetry-grounded autofix MVP added. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Long-running session memory feature included. Long-running session memory feature included. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Run-failure classification with structured tracker writeback added. Run-failure classification with structured tracker writeback added. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Stacked branches and per-snapshot undo for git operations introduced. Stacked branches and per-snapshot undo for git operations introduced. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Adapter contract check distinguishes upstream --help from real drift; treats runtime_failure as warning. Adapter contract check distinguishes upstream --help from real drift; treats runtime_failure as warning. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Medium	Bulk refurb auto-fix wave 1 across src/ applied. Bulk refurb auto-fix wave 1 across src/ applied. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low	—
Feature	Medium	CI dependency updates: actions/checkout v4 -> v6, actions/upload-artifact v4 -> v7, Python pin to <=3.13. CI dependency updates: actions/checkout v4 -> v6, actions/upload-artifact v4 -> v7, Python pin to <=3.13. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low	—
Bugfix
Bugfix	Medium	Split scorecard job so SARIF upload completes separately. Split scorecard job so SARIF upload completes separately. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Bugfix	Medium	Close urllib / SHA1 / Trivy alerts. Close urllib / SHA1 / Trivy alerts. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Bugfix	Medium	Tracker_pipeline review follow-ups fixed. Tracker_pipeline review follow-ups fixed. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Bugfix	Medium	Commit-completion module review-bot follow-ups resolved. Commit-completion module review-bot follow-ups resolved. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Bugfix	Medium	Lock aider adapter-integration job to Python 3.13. Lock aider adapter-integration job to Python 3.13. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Bugfix	Medium	Honor SARIF suppressions before Code Scanning upload. Honor SARIF suppressions before Code Scanning upload. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Bugfix	Medium	Dispatch audit events outside broker lock; index tokens by value. Dispatch audit events outside broker lock; index tokens by value. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low	—
Bugfix	Medium	Mask credentials in logger calls. Mask credentials in logger calls. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low	—
Bugfix	Medium	Replace subprocess shell=True with list-form args. Replace subprocess shell=True with list-form args. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low	—
Bugfix	Medium	Playwright runner review follow-ups, including asyncio.CancelledError propagation and unsafe-task_id rejection fixed. Playwright runner review follow-ups, including asyncio.CancelledError propagation and unsafe-task_id rejection fixed. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low	—
Bugfix	Medium	Restore startup banner regression and add coverage in TUI. Restore startup banner regression and add coverage in TUI. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low	—

Full changelog

v2.3.0

127 commits since v2.2.0. The headline is the tracker-adapter family landing: 10 backlog-tracker adapters now ship under a single TrackerContract, plus webhook ingestion and a plugin hookspec for third-party tracker plugins. The orchestration loop also gained an issue-to-PR pipeline, a retry-with-continuation path for success-without-commit runs, and a multi-agent handoff message bus that piggybacks on tracker comments. The supporting workstreams (review-bot acknowledgement gate, signed lineage audit log, secrets broker, telemetry-grounded autofix, Playwright self-testing sandbox) close several long-standing reliability and security gaps.

Highlights

Tracker-adapter family. 10 adapters land, all conforming to the single TrackerContract (Jira Cloud + DC, GitLab Issues, Linear, Plane, Asana, ServiceNow, ClickUp, GitHub Projects v2, plus webhook ingestion). Closes the gap operators have hit when integrating non-GitHub backlogs.
Tracker plugin hookspec + registry + CLI. Third-party tracker integrations now plug in via the same pluggy spec the orchestrator uses internally (#1599).
Issue -> plan-comment -> PR pipeline. New orchestration mode that walks a tracker issue through plan synthesis, plan-comment posting for human review, and PR creation in one path (#1600).
Tracker comments as a multi-agent handoff bus. Worker agents now coordinate over tracker comments so a session can resume across CLI restarts and across operator machines (#1606).
Review-bot acknowledgement gate. CodeRabbit and Sourcery findings classified as must-address now block merge until they are addressed in a fixup commit or acknowledged in the PR body with a structured marker. Nightly sweeper + reusable shepherd workflow template ship in the same PR (#1583).
Lineage v2 - signed audit log of tracker state moves. Each tracker-side state transition is captured as a signed lineage entry, so operators can audit the full chain when a ticket loses or gains the wrong label (#1602).
Playwright-based sandbox for UI/web agent runs. A new self-testing layer drives a Playwright context against the dev server, captures screenshots / console / network errors, and hands the structured result back to an LLM judge for verdict (#1603).

New features

| Area | Change |
|---|---|
| trackers | 10 adapters land under TrackerContract (Asana, ClickUp, GitHub Projects v2, GitLab Issues, Jira Cloud, Jira DC, Linear, Plane, ServiceNow, plus webhook ingestion) (#1560, #1570-#1577, #1601) |
| plugins | Tracker plugin hookspec + registry + bernstein trackers CLI (#1599) |
| orchestration | Issue -> plan-comment -> PR pipeline (#1600), tracker comments as handoff bus (#1606), multi-tracker federation layer (#1561), retry-with-continuation on success-without-commit (#1596) |
| security | Secrets broker for short-lived per-task tokens (#1605) |
| reliability | Progress-watch liveness probe via session-log growth (#1597) |
| sandbox | Playwright self-testing for UI/web agent runs (#1603) |
| lineage | Signed audit log of tracker state moves (#1602), content-addressed trace store + viewer (#1564), per-ticket transcript bundle (#1562) |
| devops | Scheduled upstream-signal sweep with operator rollup (#1594) |
| fleet | Directory-based instance registry for multi-instance hosts (#1592) |
| eval | YAML eval harness (#1565) |
| autofix | Telemetry-grounded autofix MVP (#1566) |
| memory | Long-running session memory (#1559) |
| observability | Run-failure classification with structured tracker writeback (#1569) |
| git | Stacked branches + per-snapshot undo (#1563) |
| quality | Review-bot acknowledgement gate + nightly sweeper + reusable shepherd template (#1583) |
| cost | Hard per-ticket cost cap with clean termination and tracker writeback (#1578) |

Fixes

fix(adapters): refresh aider contract for the upstream --yes -> --yes-always rename; contract checker now distinguishes a broken upstream --help from real drift; CI workflow treats the new runtime-failure exit code as a warning rather than a hard fail (#1595).
fix(security): dispatch audit events outside the broker lock; index tokens by value (#1607). Split scorecard job so SARIF upload completes (#1613). Mask credentials in logger calls (#1519). Replace subprocess shell=True with list-form args (#1513). Close urllib / SHA1 / Trivy alerts (#1518).
fix(orchestration): tracker_pipeline review follow-ups (#1609); commit-completion module review-bot follow-ups (#1608).
fix(sandbox): Playwright runner review follow-ups, including asyncio.CancelledError propagation through broad except handlers and unsafe-task_id rejection (#1610).
fix(tui): restore startup banner regression + add coverage (#1568).
fix(ci): lock aider adapter-integration job to Python 3.13 (#1586); honour SARIF suppressions before Code Scanning upload (#1520); emit CI gate for paths-ignored-only PRs (#1521); restore minimum-required write permissions broken by security hardening (#1481).
fix(review): apply deferred review-bot findings batch (#1584).
fix(quality): bulk refurb auto-fix wave 1 across src/ (#1558).
fix(test): repair main-red after refurb auto-fix removed str() in _run_git (#1591).
fix(docs): sync agents-md module map for the devops sub-package (#1612).

Internal / quality

Bulk refurb auto-fix wave 2. FURB113 (repeated append -> list.extend, 259 sites), FURB107 (try/except: pass -> contextlib.suppress, 267 sites), FURB173 (dict spread -> | merge, 178 sites), FURB108 (chained == -> in {...}) - landed via libcst rewriter + ruff autofix (#1582).
Bulk refurb auto-fix wave 1. Initial refurb sweep across src/ (#1558).
CI dependency churn. actions/checkout v4 -> v6 (#1598), actions/upload-artifact v4 -> v7 (#1611), python pin to <=3.13 until adapter 3.14 compat is confirmed (#1590), aider adapter-integration job locked to Python 3.13 (#1586).
Adapter contract check. Truncated upstream --help output is no longer reported as N missing flags; surfaces on a dedicated runtime_failure field that the workflow treats as a warning rather than drift (part of #1595).

Upgrade notes

No manual operator action required. pip install --upgrade bernstein (or uv pip install --upgrade bernstein) brings v2.3.0 in.
Operators integrating with non-GitHub backlogs can now register their tracker via the new plugin hookspec (bernstein trackers --help for the CLI surface).
The new review-bot acknowledgement gate runs on every PR. Must-address findings need either a fixup commit (bot-ack: <id> in the commit message) or a PR-body marker ().

Security Fixes

Dispatch audit events outside broker lock; index tokens by value (#1607)
Mask credentials in logger calls (#1519)
Replace subprocess shell=True with list-form args to avoid injection (#1513)
Close urllib / SHA1 / Trivy alerts (#1518)

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track chernistry/bernstein

Get notified when new releases ship.

About chernistry/bernstein

Deterministic multi-agent orchestrator for 18 CLI coding agents (Claude Code, Codex, Cursor, Aider, Gemini CLI, OpenAI Agents SDK, and more). MCP server mode (stdio + HTTP/SSE) exposes the orchestrator to any MCP client. Git worktree isolation per agent, HMAC-chained audit trail, cost-aware model routing via contextual bandit. ~11K monthly PyPI downloads, Apache 2.0.

All releases →

Related context

Related tools

Earlier breaking changes

v3.7.1 `bernstein approve` and `bernstein reject` now enforce identifier regex `[A-Za-z0-9._-]{1,64}`.
v3.7.1 Tampered mission ledger reports as unverified rather than not-found.
v3.7.1 `mission define` now refuses phases without gate tasks.
v3.5.0 MCP client, transport, and gateway become stateless; calls carry content‑derived trace IDs in _meta.