agentic-context-engine releases

v0.12.0 Breaking risk 2mo

⚠ Upgrade required

Update any code referencing `record_observation` to use `think`
Migrate Skillbook v1 usage to the new v2 schema; legacy aliases are no longer available

Breaking changes

Skillbook v1 legacy aliases removed — only Skillbook v2 schema remains
`record_observation` renamed to `think`

Notable features

RecursiveAgent core abstraction extracted for generic recursive PydanticAI agent with sandbox and microcompaction
RR collapsed into a single RRStep, implementing true recursive loop
Agentic SkillManager initial tool‑calling loop with atomic mutation tools (add_skill, update_skill, remove_skill, tag_skill) and read‑only tools

Full changelog

This is the merger of two release lines that had not yet shipped to PyPI: the 0.11.0 architectural rewrite and the 0.12.0 SkillManager hardening. Skipping a separate v0.11.0 tag — v0.12.0 supersets it.

0.11.0 — Architectural rewrite

RecursiveAgent core abstraction extracted from RR (ace/core/recursive_agent.py). Generic recursive PydanticAI agent with sandbox, microcompaction, default tool set, depth-aware sub-agent registration.
RR collapsed into a single RRStep. Orchestrator/worker split, batch machinery, and AttachInsightSourcesStep removed. RR is now a true recursive loop.
Skillbook v2 — full schema rewrite, section-grouped storage (context / harness), richer InsightSource provenance, BM25-backed retrieval (rank-bm25 runtime dep). Skillbook.as_prompt() now returns markdown; python-toon dropped.
Agentic SkillManager (first cut) — tool-calling loop (ace/implementations/sm_tools.py) with atomic mutation tools (add_skill, update_skill, remove_skill, tag_skill) and read-only tools (search_skills, read_skill).
Reflector skillbook tools — Reflector can introspect / propose updates from inside the recursive loop.
Anthropic prompt caching enabled by default for RR; cache_read_tokens / cache_write_tokens forwarded in run metadata.
Logfire spans around recursive agent sessions.
Online / offline mode in the ACE runner.
record_observation renamed to think.

0.12.0 — SM hardening

Cross-trace generalization gate (four-criterion: ≥3 instances across ≥2 domains, named slot, no API-specific params in action, verifiable runtime trigger). Backed by skill_generalization.md (14 cited sources).
Action-equivalence rule — splits on action, not trigger surface.
Atomicity rule for insight — one trigger + one action; explicit good/bad shape examples.
ICL-grounded insight format drawn from icl_skill_formatting.md: 15-50 word cap, imperative voice, positive framing default.
Evidence-only tagging — SM no longer iterates injected_skill_ids; tags only skills the reflection actually implicates.
Broaden-via-comparison for UPDATE — same root cause in different niches → broaden issue, don't duplicate.
Prompt caching for SM via CachePoint(ttl="5m"), mirroring RR.
Hard removal cap removed — harmful_count >= 3 no longer auto-REMOVES skills.
update_skills signature: source is optional; SkillbookView dropped from parameters.
Skillbook v1 legacy aliases removed — v2 is the only schema.

End-to-end retail result (Haiku 4.5)

| Metric | Value |
|---|---|
| Baseline pass@1 | 45.0% |
| With learned skillbook | 67.5% |
| Δ pass@1 | +22.5 pp (12 improved, 3 regressed) |
| Skillbook size | 35 skills |

Tau-bench fix

evaluation_type=ALL_WITH_NL_ASSERTIONS on both run_task and run_tasks call sites in ace-eval/src/ace_eval/e2e/benchmarks/tau_bench.py. Retail and any future benchmark with NL_ASSERTION in reward_basis now produces real reward numbers instead of crashing in reward computation.

See CHANGELOG.md for full details.

All releases

0.11.0 — Architectural rewrite

0.12.0 — SM hardening

End-to-end retail result (Haiku 4.5)

Tau-bench fix

Added

Changed

Back-compat

Added

Back-compat

What's new

What's changed

Added

Fixed

Docs

Fixed

Added

Introducing the Kayba CLI: automated agent self-improvement from your terminal

🚀 Try it free

Added

What's New

Full Changelog

What's New

Insight Source Tracing

Added

What's New

What's New

Agentic System Prompting

Other Changes