This release includes 2 breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+14 more
Affected surfaces
Summary
AI summaryaudit.validation_score now machine‑verified and overwritten, with added machine_verified and self_assessed_score schema fields.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
audit.validation_score now machine-verified, not self-graded audit.validation_score now machine-verified, not self-graded Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
system prompt includes explicit output-format block system prompt includes explicit output-format block Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
README quickstart mentions [llm] extra for Anthropic API integration README quickstart mentions [llm] extra for Anthropic API integration Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Feature | Medium |
schema adds optional audit.machine_verified and audit.self_assessed_score fields schema adds optional audit.machine_verified and audit.self_assessed_score fields Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Feature | Low |
audit.self_assessed_score field added to preserve LLM's original grade audit.self_assessed_score field added to preserve LLM's original grade Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
audit.machine_verified boolean flag added to indicate server verification audit.machine_verified boolean flag added to indicate server verification Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
OUTPUT FORMAT section forbids markdown fences and surrounding prose OUTPUT FORMAT section forbids markdown fences and surrounding prose Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Feature | Low |
OUTPUT FORMAT lists required top‑level keys and provides JSON skeleton OUTPUT FORMAT lists required top‑level keys and provides JSON skeleton Source: granite4.1:30b@2026-05-20-audit Confidence: low |
— |
| Bugfix | Medium |
analyze overwrites self-graded audit score with machine-verified value analyze overwrites self-graded audit score with machine-verified value Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high |
— |
| Refactor | Medium |
system_prompt ends with dedicated OUTPUT FORMAT — STRICT section system_prompt ends with dedicated OUTPUT FORMAT — STRICT section Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
| Other | Medium |
Added test_analyze_overrides_self_graded_audit_score to verify audit score rewriting Added test_analyze_overrides_self_graded_audit_score to verify audit score rewriting Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low |
— |
Full changelog
v0.8.1 — Honest audit + tighter system prompt
v0.8.1 closes three rough edges in the product shell shipped in v0.8.0, found during an end-to-end agent run.
Fixed — audit.validation_score is now machine-verified, not self-graded
Before v0.8.1, analyze returned the memo with the LLM's own audit.validation_score and audit.validation_details. A model could write validation_score: 0.99 and the schema would accept it. From v0.8.1, the server overwrites both fields with values computed from six observable structural checks (schema valid, fact/assessment separation, unknowns acknowledged, modules match routing, watch_next present, evidence_mode within contract). The LLM's self-grade is preserved in a clearly-labeled audit.self_assessed_score field for transparency, and audit.machine_verified: true makes the rewrite explicit. audit.provenance is substantive content (per-claim basis labels) and is preserved as the model wrote it.
The score remains structural only — it is not a claim about whether the analysis is factually correct.
Improved — system prompt has an explicit output-format block
The assembled system_prompt now ends with a dedicated ===== OUTPUT FORMAT — STRICT ===== section that:
- explicitly forbids markdown fences and surrounding prose,
- lists the required top-level keys,
- gives a compact valid skeleton the model can pattern-match against,
- tells the model that
audit.validation_scoreandvalidation_detailsare advisory and will be overwritten by the server.
This raises the chance that weaker host models return parseable JSON on the first attempt.
Added — schema fields for machine-verified audit
agenda-memo.schema.json gains two optional audit properties: machine_verified (bool) and self_assessed_score (number, 0–1). Documentation on validation_score clarifies that it is structural only and, when machine_verified is true, was computed by the server.
Added — README quickstart mentions the [llm] extra
The Quickstart now explains how to install with pip install "agenda-intelligence-md[llm]" and set ANTHROPIC_API_KEY to let analyze call the Anthropic API directly. Without the extra, the tool still returns a usable system_prompt for the host model to complete.
Tests
tests/test_product_shell.py adds test_analyze_overrides_self_graded_audit_score, which feeds the analyze pipeline a mocked LLM response with validation_score: 0.99 and missing unknowns, and asserts that the server rewrites the score downward, marks machine_verified: true, preserves self_assessed_score: 0.99, flags unknowns_acknowledged as failed, and keeps the provenance entries intact.
Unchanged
- 16 MCP tools, request/memo schemas, geography routing, signal vendoring — all behave as in v0.8.0.
- Live source retrieval is still not implemented.
- No new hard dependencies; the
anthropicSDK is still gated behind the[llm]extra.
Breaking Changes
- `audit.validation_score` and `audit.validation_details` are now overwritten by server‑computed values; original LLM self‑grades moved to `audit.self_assessed_score`
- Schema updated: added optional `audit.machine_verified` (bool) and `audit.self_assessed_score` (number 0–1)
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Agenda Intel MD
All releases →Related context
Related tools
Earlier breaking changes
- v0.8.0 MCP tool count increased from 11 to 16, adding five new tools.
Beta — feedback welcome: [email protected]