Agenda Intel MD

v0.8.1 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

Published 2mo MCP Developer Tools

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-agents claim-grounding cli deterministic evidence-packet evidence-validation

+6 more

human-review json-schema llm-evaluation mcp-server python source-grounding

Affected surfaces

auth

Summary

AI summary

audit.validation_score now machine‑verified and overwritten, with added machine_verified and self_assessed_score schema fields.

Changes in this release

Type	Severity	Summary	CVE
Feature
Feature	Medium	audit.validation_score now machine-verified, not self-graded audit.validation_score now machine-verified, not self-graded Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	system prompt includes explicit output-format block system prompt includes explicit output-format block Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	README quickstart mentions [llm] extra for Anthropic API integration README quickstart mentions [llm] extra for Anthropic API integration Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Feature	Medium	schema adds optional audit.machine_verified and audit.self_assessed_score fields schema adds optional audit.machine_verified and audit.self_assessed_score fields Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Feature	Low	audit.self_assessed_score field added to preserve LLM's original grade audit.self_assessed_score field added to preserve LLM's original grade Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	audit.machine_verified boolean flag added to indicate server verification audit.machine_verified boolean flag added to indicate server verification Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	OUTPUT FORMAT section forbids markdown fences and surrounding prose OUTPUT FORMAT section forbids markdown fences and surrounding prose Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Feature	Low	OUTPUT FORMAT lists required top‑level keys and provides JSON skeleton OUTPUT FORMAT lists required top‑level keys and provides JSON skeleton Source: granite4.1:30b@2026-05-20-audit Confidence: low	—
Bugfix	Medium	analyze overwrites self-graded audit score with machine-verified value analyze overwrites self-graded audit score with machine-verified value Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Refactor	Medium	system_prompt ends with dedicated OUTPUT FORMAT — STRICT section system_prompt ends with dedicated OUTPUT FORMAT — STRICT section Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Other	Medium	Added test_analyze_overrides_self_graded_audit_score to verify audit score rewriting Added test_analyze_overrides_self_graded_audit_score to verify audit score rewriting Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—

Full changelog

v0.8.1 — Honest audit + tighter system prompt

v0.8.1 closes three rough edges in the product shell shipped in v0.8.0, found during an end-to-end agent run.

Fixed — `audit.validation_score` is now machine-verified, not self-graded

Before v0.8.1, analyze returned the memo with the LLM's own audit.validation_score and audit.validation_details. A model could write validation_score: 0.99 and the schema would accept it. From v0.8.1, the server overwrites both fields with values computed from six observable structural checks (schema valid, fact/assessment separation, unknowns acknowledged, modules match routing, watch_next present, evidence_mode within contract). The LLM's self-grade is preserved in a clearly-labeled audit.self_assessed_score field for transparency, and audit.machine_verified: true makes the rewrite explicit. audit.provenance is substantive content (per-claim basis labels) and is preserved as the model wrote it.

The score remains structural only — it is not a claim about whether the analysis is factually correct.

Improved — system prompt has an explicit output-format block

The assembled system_prompt now ends with a dedicated ===== OUTPUT FORMAT — STRICT ===== section that:

explicitly forbids markdown fences and surrounding prose,
lists the required top-level keys,
gives a compact valid skeleton the model can pattern-match against,
tells the model that audit.validation_score and validation_details are advisory and will be overwritten by the server.

This raises the chance that weaker host models return parseable JSON on the first attempt.

Added — schema fields for machine-verified audit

agenda-memo.schema.json gains two optional audit properties: machine_verified (bool) and self_assessed_score (number, 0–1). Documentation on validation_score clarifies that it is structural only and, when machine_verified is true, was computed by the server.

Added — README quickstart mentions the `[llm]` extra

The Quickstart now explains how to install with pip install "agenda-intelligence-md[llm]" and set ANTHROPIC_API_KEY to let analyze call the Anthropic API directly. Without the extra, the tool still returns a usable system_prompt for the host model to complete.

Tests

tests/test_product_shell.py adds test_analyze_overrides_self_graded_audit_score, which feeds the analyze pipeline a mocked LLM response with validation_score: 0.99 and missing unknowns, and asserts that the server rewrites the score downward, marks machine_verified: true, preserves self_assessed_score: 0.99, flags unknowns_acknowledged as failed, and keeps the provenance entries intact.

Unchanged

16 MCP tools, request/memo schemas, geography routing, signal vendoring — all behave as in v0.8.0.
Live source retrieval is still not implemented.
No new hard dependencies; the anthropic SDK is still gated behind the [llm] extra.

Breaking Changes

`audit.validation_score` and `audit.validation_details` are now overwritten by server‑computed values; original LLM self‑grades moved to `audit.self_assessed_score`
Schema updated: added optional `audit.machine_verified` (bool) and `audit.self_assessed_score` (number 0–1)

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Agenda Intel MD

Get notified when new releases ship.

About Agenda Intel MD

All releases →

Related context

Related tools

Earlier breaking changes

v0.8.0 MCP tool count increased from 11 to 16, adding five new tools.