Skip to content

Release history

agentic-context-engine releases

All releases

23 shown

v0.12.0 Breaking risk
⚠ Upgrade required
  • Update any code referencing `record_observation` to use `think`
  • Migrate Skillbook v1 usage to the new v2 schema; legacy aliases are no longer available
Breaking changes
  • Skillbook v1 legacy aliases removed — only Skillbook v2 schema remains
  • `record_observation` renamed to `think`
Notable features
  • RecursiveAgent core abstraction extracted for generic recursive PydanticAI agent with sandbox and microcompaction
  • RR collapsed into a single RRStep, implementing true recursive loop
  • Agentic SkillManager initial tool‑calling loop with atomic mutation tools (add_skill, update_skill, remove_skill, tag_skill) and read‑only tools
Full changelog

This is the merger of two release lines that had not yet shipped to PyPI: the 0.11.0 architectural rewrite and the 0.12.0 SkillManager hardening. Skipping a separate v0.11.0 tag — v0.12.0 supersets it.

0.11.0 — Architectural rewrite

  • RecursiveAgent core abstraction extracted from RR (ace/core/recursive_agent.py). Generic recursive PydanticAI agent with sandbox, microcompaction, default tool set, depth-aware sub-agent registration.
  • RR collapsed into a single RRStep. Orchestrator/worker split, batch machinery, and AttachInsightSourcesStep removed. RR is now a true recursive loop.
  • Skillbook v2 — full schema rewrite, section-grouped storage (context / harness), richer InsightSource provenance, BM25-backed retrieval (rank-bm25 runtime dep). Skillbook.as_prompt() now returns markdown; python-toon dropped.
  • Agentic SkillManager (first cut) — tool-calling loop (ace/implementations/sm_tools.py) with atomic mutation tools (add_skill, update_skill, remove_skill, tag_skill) and read-only tools (search_skills, read_skill).
  • Reflector skillbook tools — Reflector can introspect / propose updates from inside the recursive loop.
  • Anthropic prompt caching enabled by default for RR; cache_read_tokens / cache_write_tokens forwarded in run metadata.
  • Logfire spans around recursive agent sessions.
  • Online / offline mode in the ACE runner.
  • record_observation renamed to think.

0.12.0 — SM hardening

  • Cross-trace generalization gate (four-criterion: ≥3 instances across ≥2 domains, named slot, no API-specific params in action, verifiable runtime trigger). Backed by skill_generalization.md (14 cited sources).
  • Action-equivalence rule — splits on action, not trigger surface.
  • Atomicity rule for insight — one trigger + one action; explicit good/bad shape examples.
  • ICL-grounded insight format drawn from icl_skill_formatting.md: 15-50 word cap, imperative voice, positive framing default.
  • Evidence-only tagging — SM no longer iterates injected_skill_ids; tags only skills the reflection actually implicates.
  • Broaden-via-comparison for UPDATE — same root cause in different niches → broaden issue, don't duplicate.
  • Prompt caching for SM via CachePoint(ttl="5m"), mirroring RR.
  • Hard removal cap removedharmful_count >= 3 no longer auto-REMOVES skills.
  • update_skills signature: source is optional; SkillbookView dropped from parameters.
  • Skillbook v1 legacy aliases removed — v2 is the only schema.

End-to-end retail result (Haiku 4.5)

| Metric | Value |
|---|---|
| Baseline pass@1 | 45.0% |
| With learned skillbook | 67.5% |
| Δ pass@1 | +22.5 pp (12 improved, 3 regressed) |
| Skillbook size | 35 skills |

Tau-bench fix

evaluation_type=ALL_WITH_NL_ASSERTIONS on both run_task and run_tasks call sites in ace-eval/src/ace_eval/e2e/benchmarks/tau_bench.py. Retail and any future benchmark with NL_ASSERTION in reward_basis now produces real reward numbers instead of crashing in reward computation.

See CHANGELOG.md for full details.

openclaw-tracing-v0.1.1 New feature
Notable features
  • Kayba-tracing plugin registers as 'kayba-tracing' and emits structured trace per agent turn including user message, full LLM input/output, tool calls, sessionId, userId, request/response previews, and folder tagging.
  • Capture knobs: captureSystemPrompt (default true), captureHistory (delta|full|none, default delta), maxAttributeBytes (default 64 KiB).
  • Recursively unwraps OpenClaw pre-stringified content fields for clean JSON in trace dashboard.
Full changelog

Initial release of the OpenClaw tracing plugin.

Registers as an OpenClaw plugin (`kayba-tracing`) and ships one structured Kayba trace per agent turn — user message, full LLM input/output with thinking, tool calls, final reply — with sessionId, userId, request/response previews, and folder tagging. No client SDK changes required.

Capture knobs (`plugins.entries.kayba-tracing.config`):

  • `captureSystemPrompt`: capture system prompt once per session (default: true)
  • `captureHistory`: `delta` (default, per-session cursor — eliminates N² bloat) | `full` | `none`
  • `maxAttributeBytes`: per-attribute truncation (default: 64 KiB)

Recursively unwraps OpenClaw's pre-stringified `content` fields so the trace dashboard renders clean JSON instead of multi-level escaped strings.

See `sdk/openclaw/README.md` for setup.

kayba-tracing-ts-v0.10.0 New feature
⚠ Upgrade required
  • Internal API rename: `injectFolderTag` → `injectKaybaContext`; folder, session, and user tags are now emitted via a single `updateCurrentTrace` call
  • `package-lock.json` corrected to remove stale `@kayba/[email protected]` lockfile name and stray `openai`/`dotenv` dependencies
Notable features
  • `kayba.setSession(id) / getSession()` auto‑injects `mlflow.trace.session` metadata on all traces
  • `kayba.setUser(id) / getUser()` auto‑injects `mlflow.trace.user` metadata for per‑user attribution
  • `kayba.updateTrace({ tags, metadata })` provides an escape hatch to attach per‑call values to the active trace
Full changelog

TypeScript SDK release. (No changes to ace-framework or kayba-tracing Python in this release.)

Added

  • kayba.setSession(id) / getSession() — auto-injects mlflow.trace.session metadata on every subsequent trace, so multiple traces produced by the same agent run can be filtered together in the dashboard.
  • kayba.setUser(id) / getUser() — auto-injects mlflow.trace.user metadata for per-user attribution.
  • kayba.updateTrace({ tags, metadata }) — escape hatch for attaching per-call values (e.g. tool call ids) to the active trace.

Changed

  • Internal injectFolderTag renamed to injectKaybaContext; folder, session, and user are now emitted in a single updateCurrentTrace call.
  • package-lock.json corrected (was stuck on a stale @kayba/[email protected] lockfile name with stray openai/dotenv deps).

Back-compat

Additive only — existing callers using configure, trace, startSpan, setFolder are unchanged.

v0.10.0 New feature
Notable features
  • RecursiveConfig.usage_callback fires once per pydantic-ai model request via ace.rr.MeteredModel
  • RRStep, create_rr_agent, create_sub_agent, and RecursiveConfig.subagent_model now accept a pre-built pydantic_ai.models.Model instance in addition to model-id strings
  • create_sub_agent threads an explicit ModelSettings parameter into its PydanticAgent constructor
Full changelog

Added

  • Usage metering hookRecursiveConfig.usage_callback: (RequestUsage, model_id) -> None fires once per pydantic-ai model request (orchestrator turns, sub-agent runs, tool-call follow-ups). Implemented via ace.rr.MeteredModel, a pydantic_ai.models.wrapper.WrapperModel subclass, so metering lives at the framework's own model boundary — one firing site, no per-call-site plumbing. Callback exceptions are caught and logged so metering never crashes the pipeline.
  • Pre-built model instance supportRRStep, create_rr_agent, create_sub_agent, and RecursiveConfig.subagent_model now accept either a model-id string or a pre-built pydantic_ai.models.Model instance. Enables callers that need a custom provider (e.g. a Bedrock model carrying STS-assumed credentials) to inject a fully-configured model rather than resolving from a string.
  • Sub-agent model_settingscreate_sub_agent now threads an explicit ModelSettings parameter into its PydanticAgent constructor.

Back-compat

Existing RRStep(model="...") callers are unchanged. The widened type signatures are additive.

v0.9.7 Maintenance

Routine maintenance release for agentic-context-engine.

Changelog

TypeScript SDK

v0.9.5 New feature
Notable features
  • TypeScript tracing SDK @kayba_ai/tracing for Node.js agents
  • Standalone Python tracing package kayba-tracing installable independently
Full changelog

What's new

  • TypeScript tracing SDK (@kayba_ai/tracing) — instrument Node.js agents and send traces to Kayba, mirroring the Python SDK API
  • Standalone Python tracing package (kayba-tracing) — can be installed independently without the full ace-framework
  • ace.tracing continues to work as before (re-exports from kayba-tracing)
  • CI publishes all three packages (ace-framework, kayba-tracing, @kayba_ai/tracing) on release
v0.9.4 Feature
Notable features
  • Added `ace.tracing` module that wraps MLflow tracing with Kayba-native configuration, folder organization, and input sanitization (install via `pip install ace-framework[tracing]`)
Full changelog
  • Kayba tracing SDKace.tracing module wraps MLflow tracing with Kayba-native configuration, folder organization, and input sanitization (pip install ace-framework[tracing])

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.9.3...v0.9.4

v0.9.3 Breaking risk
Breaking changes
  • Removed tag counters (helpful/harmful/neutral) from the Skill model.
  • Removed TagStep from the Skill processing pipeline.
Full changelog
  • Structured design docs — split ACE_DESIGN.md into architecture, reference, and decisions docs under docs/design/
  • Simplified Skill model — removed unused tag counters (helpful/harmful/neutral) and TagStep from the pipeline
  • Cleaner InsightSource provenance — restored error_identification and learning_text fields

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.9.2...v0.9.3

v0.9.2 New feature
Notable features
  • Insight source provenance model (`InsightSource`) and automatic enrichment step `AttachInsightSourcesStep`
  • Claude SDK integration via `ClaudeSDKStep` for running Claude Code sub‑agents in ACE pipelines
  • Recursive Reflector can delegate to code‑execution sub‑agents at runtime
Full changelog

What's changed

Added

  • Insight source provenanceInsightSource typed model captures the origin of each skillbook update (trace ID, sample question, epoch/step, reflection summary, integration metadata); AttachInsightSourcesStep automatically enriches UpdateBatch operations with provenance and is wired into the default learning tail
  • Claude SDK stepClaudeSDKStep integration for running Claude Code sub-agents from within ACE pipelines
  • RR sub-agent code execution — Recursive Reflector can now delegate to code-execution sub-agents at runtime
  • RR raw trace batch helpersbuild_raw_trace_batches and related runtime utilities for feeding raw traces directly into the RR pipeline

Fixed

  • Logfire scrubbing — added scrubbing callback to stop Logfire over-redacting trace content (reasoning, answers, messages now visible in Logfire UI)
  • RR combined-batch normalization — fixed ordering/deduplication of combined task batches in multi-sample runs

Docs

  • Logfire query API guide clarifications
  • MCP client setup guide and compatibility tests
  • Design docs updated to reflect insight source provenance model

Full changelog: https://github.com/kayba-ai/agentic-context-engine/blob/main/CHANGELOG.md

v0.9.1 Bug fix

Fixed CLI packaging to include .md data files enabling `kayba setup` and skill install on pip/uv‑installed packages.

Full changelog

Fixed

  • CLI packaging — include .md data files in wheel so kayba setup and skill install work on pip/uv-installed packages

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.9.0...v0.9.1

v0.9.0 Breaking risk
Breaking changes
  • Legacy ACE roles (Agent, Reflector, SkillManager) removed and rebuilt on PydanticAI agents with structured output
Notable features
  • Recursive Reflector: PydanticAI-powered trace analysis agent with sandboxed code execution, sub-agent delegation, and working memory (`save_notes` tool)
  • Kayba CLI: full hosted API client for trace upload/management, interactive run, insights, prompts, batch processing, materialization, and integration commands
Full changelog

Added

  • PydanticAI migration — ACE roles (Agent, Reflector, SkillManager) rebuilt on PydanticAI agents with structured output, replacing the legacy role system
  • Recursive Reflector — PydanticAI-powered trace analysis agent with sandboxed code execution, sub-agent delegation, and working memory (save_notes tool)
  • Kayba CLI — full hosted API client with trace upload/management, interactive run, insights, prompts, batch processing, materialization, and integration commands (kayba entry point)

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.9...v0.9.0

v0.8.9 Feature
Notable features
  • Thread CancellationToken propagated through TraceAnalyser.run() enabling pipeline cancellation
Changelog

Thread CancellationToken through TraceAnalyser.run() for pipeline cancellation support.

v0.8.8 New feature
Notable features
  • Kayba CLI enabling automated agent self‑improvement from traces
  • PipelineHook protocol and CancellationToken for observing/controlling pipeline execution
  • 7‑stage dynamic evaluation pipeline skills for Claude Code with domain‑aware benchmark generation
Full changelog

Introducing the Kayba CLI: automated agent self-improvement from your terminal

We built a CLI that plugs into Claude Code, Codex, or any coding agent and turns your agent's execution traces into improvements.

Upload traces → Kayba surfaces failure patterns → your coding agent proposes edits to your codebase. Pick what makes sense, implement, and repeat.

First test on tau2-bench: 34.3% improvement after a single cycle auto-accepting all changes.

🚀 Try it free

7-day free trial (no credit card required) at kayba.ai:

  • Automated agent self-improvement
  • CLI for Claude Code, Codex & more
  • Hosted dashboard & analytics
  • Team collaboration

The core engine (ACE) stays open source and MIT licensed. Run kayba setup to get started.


Added

  • Pipeline hooks & cancellationPipelineHook protocol and CancellationToken for observing and controlling pipeline execution
  • Kayba pipeline skills for Claude Code — 7-stage dynamic evaluation pipeline that generates custom benchmarks tailored to your agent's domain. Instead of static test suites, the skills analyze your API, build domain-aware metrics and rubrics, create action plans, and run human-in-the-loop validation — all as composable Claude Code skills
  • kayba setup command — one command to install the full evaluation skill pipeline into your .claude/skills/ directory, ready to use inside Claude Code out of the box

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.7...v0.8.8

v0.8.7 New feature
Notable features
  • Traces display the first 80 characters of the question text instead of generic names
  • OpikStep and RROpikStep accept an optional thread_id parameter for grouping related traces
Full changelog
  • Improved Opik trace naming — traces now display the question text (first 80 chars) instead of generic names like "ace_pipeline" or "rr_reflect"
  • Thread ID support for OpikOpikStep and RROpikStep accept an optional thread_id parameter for grouping related traces

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.6...v0.8.7

v0.8.6 New feature
Notable features
  • Kayba CLI with commands: upload, insights generate/list/triage, prompts generate/list/pull, status, materialize, batch, setup
  • KaybaClient HTTP client supporting Bearer auth for the hosted API
  • kayba setup now prints/appends coding agent instructions to CLAUDE.md, AGENTS.md, .cursorrules
Full changelog

What's New

  • Kayba CLI — New kayba CLI for the hosted API with commands: upload, insights generate/list/triage, prompts generate/list/pull, status, materialize, batch, setup
  • HTTP clientKaybaClient with Bearer auth for the Kayba hosted API
  • Agent integrationkayba setup prints/appends coding agent instructions (CLAUDE.md, AGENTS.md, .cursorrules)

Full Changelog

https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.5...v0.8.6

v0.8.5 New feature
⚠ Upgrade required
  • Opik is now opt‑in via the `observability` extra (previously a hard dependency)
Notable features
  • Self-contained RR module (`ace_next/rr/`) with sandbox, subagent, trace_context and extracted config
  • `build_steps()` classmethod added to all runners for pipeline customization
  • Shared `CallBudget` instance across RR pipeline steps
Full changelog
  • Self-contained RR module (ace_next/rr/) — sandbox, subagent, trace_context, config extracted from ace/reflector/
  • v5.6 prompt promoted as default — prompt evolution (v4 → v5.1–v5.6) for the RR pipeline
  • build_steps() API — all runners gain a build_steps() classmethod for pipeline customization
  • Shared CallBudget — single budget instance shared across RR pipeline steps
  • ACE MCP server (optional) — stdio MCP server with tools: ace.ask, ace.learn.sample, ace.learn.feedback, ace.skillbook.get/save/load
  • MCP packaging + CLI — optional mcp extra and ace-mcp entrypoint
  • Composing pipelines guide — new docs/guides/composing-pipelines.md
  • RR examplesrr_demo.py, rr_opik_demo.py, compose_custom_pipeline.py
  • Opik made opt-in — moved from hard dependency to observability extra

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.4...v0.8.5

v0.8.4 New feature
Notable features
  • OpenClawToTraceStep and LoadTracesStep pipeline steps for learning from OpenClaw session transcripts (JSONL)
  • ExportSkillbookMarkdownStep to export skillbook to markdown
  • Example script and integration documentation for OpenClaw
Full changelog

What's New

  • OpenClaw integration — learn from OpenClaw session transcripts (JSONL) via new OpenClawToTraceStep and LoadTracesStep pipeline steps (#86)
  • ExportSkillbookMarkdownStep — export skillbook to markdown file
  • OpenClaw example script and integration docs
v0.8.3 New feature
Notable features
  • Generic pipeline framework with branching, async boundaries, and parallel execution
  • _build_traces() helper for raw trace data passthrough to RecursiveReflector sandbox
Full changelog
  • Pipeline engine — generic pipeline framework with branching, async boundaries, and parallel execution (#78)
  • Trace passthrough_build_traces() helper and raw trace data passed to RecursiveReflector sandbox

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.2...v0.8.3

v0.8.2 New feature
Notable features
  • `LiteLLMClient.complete_messages()` provides native multi‑turn completion preserving structured message lists
  • RecursiveReflector now includes a None-response guard that retries on empty/None LLM outputs
Full changelog
  • RecursiveReflector None-response guard — gracefully handles empty/None LLM responses (e.g. from Gemini) with retry prompt instead of crashing
  • LiteLLMClient.complete_messages() — native multi-turn completion that preserves structured message lists

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.1...v0.8.2

v0.8.1 New feature
Notable features
  • InsightSource dataclass tracks skill provenance (epoch, sample ID, trace refs, error info, learning text)
  • Sample.id promoted to first‑class UUID field with auto‑generation
  • Skillbook query API: source_map(), source_summary(), source_filter() for lineage
Full changelog

Insight Source Tracing

Track where every skill in your skillbook came from.

Added

  • Insight source tracingInsightSource dataclass tracks skill provenance (epoch, sample, trace refs, error identification, learning text)
  • Sample.id promoted to first-class field with UUID auto-generation
  • Skillbook query APIsource_map(), source_summary(), source_filter() for skill lineage
  • Insight sources wired through OfflineACE, OnlineACE, and async learning pipelines
  • UpdateOperation.learning_index for linking operations to reflector learnings
  • Bedrock e2e example (examples/litellm/bedrock_insight_source_test.py)
  • docs/INSIGHT_SOURCES.md guide

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.8.0...v0.8.1

v0.8.0 New feature
Notable features
  • TAU-bench integration: full benchmark framework for evaluating agents on TAU-bench tasks
  • Recursive Reflector module with sandbox execution, trace context, and sub-agent support
  • Skillbook utility scripts to clean, consolidate, and merge skillbooks
Full changelog

What's New

  • TAU-bench integration: Full benchmark framework for evaluating agents on TAU-bench tasks
  • Recursive Reflector: New reflector module with sandbox execution, trace context, and sub-agent support
  • Skillbook tools: Clean, consolidate, and merge skillbooks via new utility scripts

Full Changelog: https://github.com/kayba-ai/agentic-context-engine/compare/v0.7.0...v0.8.0

v0.7.3 Maintenance

Routine maintenance release for agentic-context-engine.

Changelog

Release v0.7.3

v0.7.2 New feature
Notable features
  • Agentic system prompting workflow: analyzes past traces/conversations to generate prompt suggestions with justification and evidence
Full changelog

What's New

Agentic System Prompting

New workflow to automatically optimize your agent's system prompts using your own data. Feed in past traces or conversations, and ACE analyzes what worked and what failed to generate actionable prompt suggestions.

Traces / ConversationsACEPrompt Suggestions

Each suggestion includes the recommended prompt text, justification for why it helps, and evidence from your actual traces. You review and decide what to implement.

See examples/agentic-system-prompting/ for the full workflow.

Other Changes

  • Fix: Align test matrix with Python 3.12 requirement
  • Fix: Use setup-uv action for Windows CI compatibility

Beta — feedback welcome: [email protected]