Skip to content

Release history

Agent-fox releases

All releases

60 shown

No immediate action
v3.7.1 Breaking risk

--debug removal

No immediate action
v3.7.0 Breaking risk

Nightshift mode change + lint‑specs flag removal

No immediate action
v3.6.5 Breaking risk

Dry-run test fix

No immediate action
v3.6.4 Breaking risk

Non‑retryable push failure fix

No immediate action
v3.6.3 New feature

--dry-run flag

No immediate action
v3.6.2 Breaking risk

--dry-run flag + fallback_model removal

v3.6.1 New feature
Notable features
  • _push_with_retry mechanism with error classification, audit events, and lock reentrancy for develop sync
  • Atomic push integrated into harvest and session lifecycle
Full changelog

What's Changed

Features

  • Atomic push with retry — new _push_with_retry mechanism with error classification, audit events, and lock reentrancy for develop sync (#121)
  • Wired atomic push into harvest and session lifecycle (#121)

Bug Fixes

  • Findings reporting: replaced legacy archetype labels with current names (#591)
  • Insights command: added --dismiss flag for manual finding invalidation (#592)
  • Config: fixed retries_before_escalation config path and added deprecation warning (#589)
  • Fixed double push in fix_pipeline harvest flow (#121)

Tests

  • Added failing spec tests for atomic push with retry (#121)
v3.6.0 Breaking risk
Breaking changes
  • Removed `fix` command from CLI
  • Removed `--output` flag from `standup` command
  • Deprecated `[models]` section; moved `fallback_model` to `[routing]` and removed obsolete config options
Security fixes
  • Added transport‑level DNS re‑validation in nightshift to close SSRF TOCTOU (#580)
  • Sanitized exception content in fix session failure comment (#583)
  • Added path containment check before file deletion (#579)
Full changelog

What's Changed

Refactoring

  • engine: Consolidate 6 parallel tracking dicts in SessionResultHandler into a single _NodeRetryState dataclass
  • engine: Inline assessment.py (single-consumer module) into engine.py
  • Remove dead code: _estimate_tokens, _table_exists, _column_exists
  • Add run_git_sync() to workspace/git.py and migrate sync callers
  • Extract _resolve_github_remote() helper in nightshift/platform_factory.py
  • Deduplicate comment formatting in nightshift/fix_pipeline.py

Bug Fixes

  • nightshift: Add transport-level DNS re-validation to close SSRF TOCTOU (#580)
  • nightshift: Sanitize exception in fix session failure comment (#583)
  • nightshift: Add path containment check before file deletion (#579)
  • nightshift: Reject path traversal in archetype/mode/name parameters (#585)
  • nightshift: Add symlink checks to profiles.py and analyzer.py (#586)
  • nightshift: Replace shutil.rmtree with _safe_rmtree to avoid symlink traversal (#587)
  • nightshift: Reject reserved, multicast, and unspecified addresses in SSRF check (#581)
  • nightshift: Deprecate [models] section, move fallback_model to [routing] (#577)
  • nightshift: Remove fix command from CLI (#575)
  • nightshift: Remove --output flag from standup command (#573)
  • nightshift: Remove obsolete config options and dead code (#574)
  • nightshift: Apply ruff format to barrier.py and test_barrier.py (#588)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.6...v3.6.0

v3.5.6 Bug fix

Fixed nightshift audit‑review blockage by filtering future‑group findings.

Full changelog

What's Changed

Bug Fixes

  • nightshift: Filter deferred-to-future-group findings from audit-review blocking (#572)
  • nightshift: Collapse ternary expression to satisfy ruff format (#571)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.5...v3.5.6

v3.5.5 Bug fix

Audit-reviewer now grades test design quality instead of execution results, fixing misclassification.

Full changelog

What's Changed

Bug Fixes

  • Wire audit_max_retries into audit-review retry logic (#567, #569)

    • ReviewerConfig.audit_max_retries was defined but never read by the retry logic. Added a dedicated per-coder-node counter (_audit_retry_counts) so audit-review retries are tracked independently of the generic EscalationLadder, preventing infinite retry loops.
  • Audit-reviewer grades test design quality, not execution results (#568, #570)

    • The audit-review profile was conflating test pass/fail status with test design quality, marking well-designed tests as WEAK when they failed due to unimplemented upstream specs. Updated the template to grade design quality only — a correctly designed test that cannot pass yet is PASS, not WEAK. Added anti-pattern examples and multi-spec dependency guidance.

Documentation

  • Updated CLI and config reference to match current codebase

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.4...v3.5.5

v3.5.4 Bug fix

Fixed four bugs that prevented stored knowledge from being retrieved in downstream sessions.

Full changelog

Knowledge Retrieval Fixes (Spec 120)

Fixes four bugs in the knowledge system's read side that caused stored knowledge to never reach downstream sessions.

Bug Fixes

  • Wire run_id to FoxKnowledgeProvider_run_id was initialized to None and never set by the engine, causing _query_same_spec_summaries() and _query_cross_spec_summaries() to always return empty. Session summaries were stored but never retrieved. Every fox_provider log line showed 0 context + 0 cross-spec items.

  • Elevate pre-review findings to tracked context — Group 0 (skeptic pre-review) findings were served as untracked [CROSS-GROUP] items instead of tracked [REVIEW] items. They are now included in primary review results, tracked in finding_injections, and properly superseded on session completion.

  • All-archetype summary storage — Only coder sessions produced summaries. Reviewer and verifier sessions now generate structured summaries (finding counts, pass/fail ratios) that are stored and served to downstream sessions.

  • Cross-run finding carry-forward — Active findings orphaned by stalled runs are now surfaced as [PRIOR-RUN] context items at the start of a new run, capped at 5 per spec.

Stats

  • 4658 tests passing
  • 26 files changed, +3944 / -53 lines
v3.5.3 New feature
Notable features
  • Git stack hardening: workspace health checks, force-clean option, non-retryable error classification, pre-session guards, run lifecycle management, idempotent cascade blocking, and develop sync audit trail
Full changelog

What's Changed

Features

  • Session summary storage (spec 119): Session summaries are now stored, retrieved, and integrated into the knowledge provider and lifecycle audit events
  • Git stack hardening (spec 118): Workspace health checks, force-clean option, non-retryable error classification, pre-session workspace guards, run lifecycle management, idempotent cascade blocking, and develop sync audit trail

Fixes

  • Night shift: Raised stale_timeout default to 3600s and added heartbeat (#561)

Refactoring

  • Eliminated single-file routing/ and security/ packages — moved to core/
  • Merged knowledge/provider.py protocol into knowledge/fox_provider.py

Stats

  • 4621 tests passing
  • 97 files changed across the release
v3.5.2 New feature
Notable features
  • Cross‑group knowledge retrieval with [CROSS-GROUP] prefix, limited by max_cross_group_items (default 3), ranked by relevance and excluded from injection tracking
Full changelog

What's Changed

Features

  • #559: Add cross-group knowledge retrieval — sessions now see findings and FAIL verdicts from other task groups in the same spec via [CROSS-GROUP] prefix, capped at max_cross_group_items (default 3), ranked by keyword relevance, and excluded from injection tracking

Fixes

  • __init__.py: Sync __version__ with pyproject.toml (was stuck at 3.5.0)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.1...v3.5.2

v3.5.1 Bug fix

Fixed knowledge retrieval to surface FAIL verdicts from verification results.

Full changelog

What's Changed

Bug Fixes

  • #553: Collapse list comprehension to single line for ruff format
  • #554: Gate audit-review on active findings to trigger coder retry
  • #555: Surface FAIL verdicts from verification_results in knowledge retrieval
  • #556: Filter knowledge findings by task_group to avoid redundant injection
  • #557: Rank findings and verdicts by task_description keyword overlap
  • #558: Track injected findings per session to prevent re-injection after successful completion

Docs

  • Rewrite architecture.md to reflect current codebase

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.0...v3.5.1

v3.5.0 Maintenance

Routine maintenance release for Agent-fox.

Changelog

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.4.1...v3.5.0

v3.4.1 Bug fix
Notable features
  • ADR ingestion pipeline (MADR parser, validator, DuckDB migration v22 for adr_entries table)
Full changelog

What's Changed

Bug Fixes

  • fix(#547): Add errata markdown-to-DuckDB indexing path — errata files in docs/errata/ are now indexed into the DuckDB errata table, closing the write-only gap where errata were created but never retrievable
  • fix(#548): Fix audit-review task_group partitioning causing supersession silos — audit findings no longer use a hardcoded empty task_group, enabling proper supersession across review modes
  • fix(#549): Move steering.md from .agent-fox/specs/ to .agent-fox/
  • fix(#546): Fix ruff format violations in harvest warning strings
  • fix(#545): Serialize AuditJsonlSink writes with threading.Lock to prevent interleaved concurrent appends

Features

  • feat: ADR ingestion pipeline (spec 117) — MADR parser, validator, DuckDB migration v22 for adr_entries table, and integration into FoxKnowledgeProvider for retrieval during coder sessions

Documentation

  • docs: ADR 07 — Define audit JSONL event format (envelope schema + complete event type catalog)
  • docs: Code quality audit (specs 7–9)
  • docs: Parking service 3.4.0 audit

Chores

  • Bump version to 3.4.1
v3.4.0 Breaking risk
Breaking changes
  • Renamed CLI command `findings` → `insights`
Full changelog

What's Changed

Fixes

  • #543: Drop dead knowledge system columns retrieval_summary and coverage_data
  • #542: Fix ruff format violation in warning string
  • #541: Fix ruff format violation in list comprehension
  • #539: Add quick-triage bail-out to coder prompt
  • #537: Rename CLI command findings to insights
  • #536: Add AC-4 test and fix ruff format violation
  • #534: Add AC-3 integration test for verifier dispatch without phantom task group

Chores

  • Upgrade dependency version pins (pydantic, rich, duckdb, sentence-transformers, scikit-learn, pathspec, tree-sitter, pytest, ruff, mypy, and more)
  • Update auto-generated errata dates
v3.3.1 New feature
Notable features
  • Test coverage regression gate blocks tasks on decreased per-file line coverage
  • Multi-language coverage tool detection: pytest-cov (Python), cargo-tarpaulin (Rust), go test -cover (Go)
  • Coverage data stored in session outcomes for trend tracking (migration v20)
Full changelog

What's New

Features

  • Test coverage regression gate — measures per-file line coverage before and after coder sessions; blocks the task if coverage decreases on modified files (#520)
    • Multi-language coverage tool detection: pytest-cov (Python), cargo-tarpaulin (Rust), go test -cover (Go)
    • Coverage data stored in session outcomes for trend tracking (migration v20)
    • Blocking findings emitted via review_findings table on regression

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.3.0...v3.3.1

v3.3.0 New feature
Notable features
  • Structured verification checklist for spec compliance (#521)
  • State transition validation in GraphSync to catch illegal graph moves (#523)
  • Eager pre-review with retry on predecessor failure restored (#519)
Full changelog

What's New

Features

  • Verification checklist & task completion enforcement — structured verification checklist for spec compliance (#521)
  • State transition validation in GraphSync — validates engine state transitions to catch illegal graph moves (#523)
  • Eager pre-review with retry-predecessor — restores eager pre-review behavior with retry on predecessor failure (#519)
  • Lightweight errata generation from blocking — reinstates errata generation when issues are blocked (#522)
  • Knowledge system pruning — migration v18 removes causal links and dead knowledge modules (spec 116)

Bug Fixes

  • Fix max_items in property test to avoid retrieval cap masking failures
  • Use Path-typed specs_path variable in plan_cmd (#516)
  • Fix ruff format violation in RuntimeError f-string (#515)
  • Add proper type annotations for embedder and backend variables (#514)

Refactoring

  • Extract strategy classes from engine, fix_pipeline, and result_handler (#518)
  • Inline single-consumer modules and deduplicate review parser
  • Remove dead code and consolidate single-consumer modules (2 passes)
  • Remove dead code and consolidate near-identical abstractions
  • Delete dead knowledge modules (blocking_history, errata_store, gotcha_extraction, gotcha_store) and simplify provider

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.2.0...v3.3.0

v3.2.0 Breaking risk
Breaking changes
  • Removed `KnowledgeProvider` protocol decoupling of legacy knowledge pipeline modules
  • Removed obsolete knowledge pipeline configuration options
  • Removed onboard CLI command and legacy nightshift streams
Notable features
  • Decouple knowledge subsystem via `KnowledgeProvider` protocol (spec 114)
  • Pluggable knowledge provider with gotcha extraction, errata store, and content hashing (spec 115)
  • Wire `FoxKnowledgeProvider` into engine startup
Full changelog

What's Changed

Features

  • knowledge: Decouple knowledge subsystem via KnowledgeProvider protocol (spec 114)
  • knowledge: Pluggable knowledge provider with gotcha extraction, errata store, and content hashing (spec 115)
  • engine: Wire FoxKnowledgeProvider into engine startup

Refactors

  • knowledge: Delete 40+ legacy knowledge pipeline modules (lang analyzers, retrieval, consolidation, embeddings, etc.)
  • config: Remove obsolete knowledge pipeline configuration options
  • cli: Remove onboard command and legacy nightshift streams

Chores

  • Supersede specs 112 (sleep time compute) and 113 (knowledge effectiveness)
  • Fix Unicode edge case in content hash determinism property test
  • Clean up leftover __pycache__ directories in deleted knowledge subdirectories
v3.1.4 Bug fix
Notable features
  • Pre-flight check to skip coder sessions when work is already done
Full changelog

What's Changed

Bug Fixes

  • engine: Close AsyncAnthropic clients to prevent event loop shutdown crash (fixes #506)
  • engine: Skip redundant cleanup ingestion when barrier already ran (fixes #505)
  • knowledge: Always write agent trace JSONL for transcript reconstruction (fixes #507)
  • Guard trace reconstruction behind debug flag to suppress spurious warning

Features

  • engine: Add pre-flight check to skip coder sessions when work is done (fixes #511)

Other

  • New specs 114 (knowledge decoupling), 115 (pluggable knowledge)
  • Coding-session architecture documentation
  • General cleanup

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.3...v3.1.4

v3.1.3 Bug fix
Notable features
  • Knowledge system effectiveness improvements (spec 113) covering transcript reconstruction, compaction, entity signals, cold-start handling, git extraction, audit consumption, retrieval validation, and prompt injection
Full changelog

What's Changed

Bug Fixes

  • Budget exhaustion detection: Sessions that hit the SDK max-budget-usd limit are now detected and blocked immediately instead of being wastefully retried. The SDK returns is_error=True with no message on budget exhaustion — previously mapped to "Unknown error" and retried through the escalation ladder.
  • AssessmentManager config: Pass full_config (not OrchestratorConfig) to AssessmentManager, fixing missing attribute errors.
  • Escalation ladder starting tier: The escalation ladder now respects config.models.coding for the starting tier instead of always defaulting to STANDARD.
  • Timed-out session metrics: Emit descriptive error messages and metrics for sessions that time out.

Features

  • Knowledge system effectiveness (spec 113): Transcript reconstruction, compaction improvements, entity signal activation, cold-start handling, git extraction, audit consumption, retrieval quality validation, and audit prompt injection.

Other

  • Parking service audit report
  • Session budget increased for lengthy tasks
v3.1.2 Bug fix
⚠ Upgrade required
  • If a run is stuck with audit-review tasks blocked by "Retry limit exceeded", clear the stale state using `agent-fox reset --spec`.
Full changelog

Bug Fixes

  • engine: Move review concurrency cap before _prepare_launch to prevent phantom retry exhaustion (fixes #503)

    The review concurrency cap in _fill_parallel_pool was checked after _prepare_launch(), which increments the attempt tracker on "allowed" verdicts. When the single review slot was occupied, audit-review tasks were skipped but their attempt counter was already incremented. After max_retries + 1 (default 3) such pool-refill cycles, the circuit breaker permanently blocked the task with "Retry limit exceeded" — without ever starting a session. This cascade-blocked all downstream coding and verifier tasks, exceeding the block budget and halting the entire run.

Recovery for affected runs

If you have a stuck run with audit-review tasks blocked by "Retry limit exceeded", clear the stale state:

agent-fox reset --spec <affected_spec_name>

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.1...v3.1.2

v3.1.1 Bug fix

Fixed death-loop caused by stale session‑scoped DB rows after reset.

Full changelog

Bug Fixes

  • reset: clear session-scoped tables on reset to prevent block_limit death-loop (#501)

    After a block_limit run, reset --hard (and soft reset) left stale data in six session-scoped DB tables (runs, session_outcomes, review_findings, verification_results, drift_findings, blocking_history). The stale runs.status='block_limit' caused load_state_from_db() to load a terminal status, making the engine loop exit immediately on every subsequent agent-fox code invocation — a self-perpetuating death-loop with no CLI recovery path.

    All four reset paths (reset_all, reset_task, reset_spec, hard_reset_all/hard_reset_task) now clear session-scoped tables so that plan and code start from a clean state.

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.0...v3.1.1

v3.1.0 New feature
Notable features
  • Core protocol, orchestrator, and configuration schema for sleep-time tasks
  • ContextRewriter task that rewrites and enriches knowledge context during idle periods
  • BundleBuilder task that consolidates knowledge into bundles
Full changelog

What's New

Sleep-Time Compute (Spec 112)

A new knowledge-processing pipeline that runs background computation during idle periods:

  • Core protocol & orchestrator — schema, configuration, and orchestration layer for sleep-time tasks
  • ContextRewriter — sleep task that rewrites and enriches knowledge context
  • BundleBuilder — sleep task that builds consolidated knowledge bundles
  • Retriever & integration wiring — retrieval layer with full integration into the existing knowledge system
  • Wiring verification — end-to-end verification of the sleep-time compute pipeline

Full Changelog

  • feat(112): implement core protocol, orchestrator, config, and schema
  • feat(112): implement ContextRewriter sleep task
  • feat(112): implement retriever and integration wiring
  • test(112): failing spec tests, checkpoint, and wiring verification
v3.0.5 Bug fix
Notable features
  • --specs-dir flag added to plan and night-shift commands
  • Progress spinner added to onboard command
Full changelog

What's Changed

Bug Fixes

  • nightshift: exclude .agent-fox/ from onboard file scanning (#499)
  • nightshift: add --specs-dir flag to plan and night-shift commands (#498)
  • nightshift: add progress spinner to onboard command (#497)

Other

  • Updated config

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.4...v3.0.5

v3.0.4 Bug fix

Fixed triage task prompt causing parse failures.

Full changelog

What's Changed

  • fix(nightshift): Triage agent now receives a triage-specific task prompt instead of the coder's "Fix the issue" prompt. This was the root cause of all triage parse failures — the agent would implement the fix instead of producing a JSON triage report.
  • fix(tests): Knowledge wiring tests no longer leak .specs/ directories into the working tree.
v3.0.3 Breaking risk
Breaking changes
  • Removed deprecated `extract_spec_name` wrapper
  • Deleted backward-compatibility re-export shims: `session/archetypes`, `nightshift/config`, `knowledge/query`
Notable features
  • Three-tier priority scheduling places coders before reviews for better throughput
  • Deferred review injection lazily promotes review nodes when slots are idle
  • Review concurrency cap limits parallel pool size
Full changelog

What's Changed

Features

  • Three-tier priority scheduling — coders scheduled before reviews for better throughput (#490)
  • Deferred review injection — lazy promotion of review nodes when slots are idle (#491)
  • Review concurrency cap in parallel pool (#489)

Performance

  • Pre-review scheduling optimization for critical-path specs (#476)
  • Skip LLM extraction for reviewer archetypes and short transcripts (#475)

Bug Fixes

  • Cascade blocking through in_progress nodes to prevent downstream dispatch (#481)
  • Use datetime.now(UTC) for run timestamps (#480)
  • Remove duplicate harvest.complete emission (#482)
  • Populate commit_sha in git.merge audit events (#484)
  • Classify review findings into multiple categories (#485)
  • Generate embeddings for consolidated and pattern facts
  • Correct return type annotation in _sort_key
  • Drop dead tables and remove orphaned code (#460)
  • Annotate test-file errors and add false-positive guidance to hunt prompts (#493)

Refactoring

  • Consolidate file-only language analyzers (HTML, JSON, regex) into SimpleAnalyzer base class
  • Replace 17 try/except blocks in language registry with data-driven loop
  • Merge WorkStream protocol into streams.py
  • Remove deprecated extract_spec_name wrapper
  • Delete backward-compatibility re-export shims (session/archetypes, nightshift/config, knowledge/query)
  • Update ~75 import sites to canonical module paths

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.2...v3.0.3

v3.0.2 Breaking risk

Fixed duplicate session outcomes rows and validated UUIDs in causal link parsing to prevent data loss.

Full changelog

Bug Fixes

  • fix(engine): Remove duplicate session_outcomes rows — every session was writing two DB entries, one with incomplete data (cost=0, model=NULL). The redundant sink-based insertion path has been removed; session outcomes are now written exclusively by SessionResultHandler.process(). (#473)

  • fix(knowledge): Validate UUIDs in causal link parsing — the LLM sometimes returned truncated UUIDs or git SHAs instead of valid fact UUIDs, causing ConversionException in DuckDB and silently dropping all causal links for the session. parse_causal_links() now validates UUID format before returning. (#474)

Other

  • Updated README
v3.0.1 Bug fix
Notable features
  • Updated default config.toml template for v3 in init
Full changelog

What's Changed

Bug Fixes

  • nightshift: Eliminate contradictory 'skipping'/'Applied' log messages for migrations v5 and v10
  • nightshift: Leave issue open when coder produces no commits (#466)
  • nightshift: Prevent re-processing of closed issues in drain loop (#465)
  • nightshift: Increment scan counter in _run_issue_check (#469)
  • nightshift: Propagate fix_run_id from engine into process_issue (#468)
  • nightshift: Populate runs and session_outcomes from fix pipeline (#467)
  • nightshift: Add run_id to empty-body rejection comment and GitHub issue comments (#464)
  • nightshift: Remove obsolete memory.jsonl creation from init (#461)
  • nightshift: Move rev-list checks outside merge lock in _sync_develop_with_remote (#458)
  • nightshift: Pass project root (not .agent-fox dir) as repo_root to barrier (#454)
  • nightshift: Clean up stale running runs on orchestrator startup (#456)
  • nightshift: Stop per-file row explosion in session_outcomes (#457)
  • nightshift: Add AC-5 test for record_tool_error sink failure resilience (#459)
  • harvest: Pass embedder to extract_and_store_knowledge (#453)
  • harvest: Use feature branch commit message for squash merges

Improvements

  • init: Update default config.toml template for v3
  • refactor: Rename agent_base.md profile to agent.md

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.0rc4...v3.0.1

v3.0.0 Breaking risk
⚠ Upgrade required
  • Legacy configuration keys for Skeptic, Oracle, and Auditor archetypes are automatically migrated but emit deprecation warnings.
  • Projects using the default `.specs/` directory will see a deprecation warning; update `[paths] spec_root` in `config.toml` to the new location.
Breaking changes
  • Removed Skeptic, Oracle, and Auditor archetypes; replaced with unified Reviewer archetype having modes `pre-review`, `drift-review`, `audit-review`, and `fix-review`.
  • Plan state storage moved from `plan.json` to DuckDB tables (`plan_meta`, `plan_nodes`).
  • Spec root directory is now configurable via `[paths] spec_root` in `config.toml`; projects using `.specs/` receive deprecation warnings.
Notable features
  • Unified retriever with weighted Reciprocal Rank Fusion (RRF) combining keyword, vector, entity graph, and causal chain signals.
  • First‑class customizable markdown agent profiles located in `.agent-fox/profiles/` with mode‑specific resolution; `agent-fox init --profiles` installs defaults.
Full changelog

agent-fox v3.0.0

The first stable release of agent-fox v3. This release completes the transition
from the v2 architecture to a consolidated, mode-based archetype system with
DuckDB-backed state management, adaptive knowledge retrieval, and comprehensive
documentation.

Highlights

Archetype Consolidation

The former Skeptic, Oracle, and Auditor archetypes are now unified into a
single Reviewer archetype with four modes: pre-review, drift-review,
audit-review, and fix-review. Legacy configuration keys are automatically
migrated with deprecation warnings. The archetype registry now contains four
entries: Coder, Reviewer, Verifier, and Maintainer.

Adaptive Knowledge Retrieval

A new unified retriever fuses four signals — keyword, vector, entity graph,
and causal chain — via weighted Reciprocal Rank Fusion (RRF). Intent profiles
adjust signal weights per archetype and task status. Salience-based token
budgeting ensures the most relevant facts get full detail while staying within
context limits.

Agent Profiles

Profiles are now first-class, customizable markdown files that define agent
behavioral guidance. Projects can override any profile via
.agent-fox/profiles/ with mode-specific resolution. Run
agent-fox init --profiles to install defaults for customization.

DuckDB Plan Persistence

Plan state is now stored in DuckDB tables (plan_meta, plan_nodes) instead
of plan.json, consolidating all persistent state in a single store.

Configurable Spec Root

The spec root directory is now configurable via [paths] spec_root in
config.toml (default: .agent-fox/specs). Projects using .specs/ are
auto-detected with a deprecation warning.

What's Changed (since v3.0.0-rc6)

Features

  • Configurable spec root directory (#371)
  • Bash, HTML, JSON, CSS, regex, and Swift language analyzers (#426)
  • Agent base profile replaces CLAUDE.md in Layer 1 (#430)
  • Mode-specific reviewer profiles to prevent schema cross-contamination
  • Templates field on ModeConfig and ArchetypeEntry
  • Schema, data models, and entity store for spec 95

Fixes

  • Embedding dimension assertion in allowlist before SQL interpolation (#346)
  • Hot-load queries plan_nodes DB table instead of plan.json (#444)
  • Blocking history and learned thresholds migration (#449)
  • Wire config.models.coding into resolve_model_tier for coder archetype
  • Squash merge in harvest fallback to eliminate double-commit pattern
  • Night-shift: replace af:fix removal with af:fixed label on issue closure (#429)
  • Hollow generate_status test and production bug (#428)
  • Include archived specs in dependency validation

Documentation

  • Complete documentation audit and update for v3
  • New profiles guide (docs/profiles.md)
  • Expanded prompt generation section in architecture docs
  • All legacy Skeptic/Oracle/Auditor references updated to mode-based terminology
  • CLI reference: added --profiles, findings, and onboard commands
  • Config reference: removed stale hooks section, added missing pricing entries
  • Architecture docs verified against source code

Dependencies

  • Upgraded anthropic to 0.96 and claude-agent-sdk to 0.1.60

Installation

uv tool install agent-fox

Full Changelog

https://github.com/agent-fox-dev/agent-fox/compare/v3.0.0-rc6...v3.0.0

v2.9.1 Bug fix
Notable features
  • Nightshift cost tracking (spec 91) with SinkDispatcher plumbing, auxiliary and quality‑gate paths
  • Transient audit reports moved to .agent-fox/audit/ with PASS deletion and spec completion cleanup
Full changelog

What's Changed

Features

  • Nightshift cost tracking (spec 91): Wire SinkDispatcher plumbing, auxiliary cost tracking, and quality gate cost tracking path
  • Transient audit reports (spec 92): Move audit reports to .agent-fox/audit/, add PASS deletion and spec completion cleanup

Bug Fixes

  • #330: Guard fetchone() result against None before indexing in nightshift
  • #329: Correct return type annotation of _default_config from object to AgentFoxConfig

Housekeeping

  • Moved implemented specs to .specs/archive/

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.9.0...v2.9.1

v2.9.0 Breaking risk
Notable features
  • Scope guard subsystem (parser, validator, detector, checker, builder, classifier, telemetry)
  • fix_coder archetype with dedicated `fix_coding.md` template integrated into fix pipeline
  • Fact lifecycle management (deduplication, decay, cleanup, LLM contradiction detection) in harvest pipeline
Full changelog

What's New

Features

  • Scope guard subsystem (spec 87) — source parser, stub validator, overlap detector, preflight checker, prompt builder, session classifier, and telemetry persistence
  • fix_coder archetype (spec 88) — dedicated archetype with fix_coding.md template, wired into the fix pipeline
  • Fact lifecycle management (spec 90) — dedup, decay, cleanup, and LLM-based contradiction detection wired into the harvest pipeline and sync barrier
  • Simplified model routing (spec 89) — removed prediction pipeline, feature enrichment, duration estimation, and calibration modules; routing now uses ladder-based assessment only

Fixes

  • Wire SDK Notification hook for activity progress events (#320)
  • Guard fetchone() results against None in run_cleanup
  • Add routing_assessments and routing_pipeline params to SessionResultHandler (#325)
  • Wire fix_coder archetype into fix pipeline
  • Fix type errors and stale test assertions across nightshift tests

Maintenance

  • Updated dependencies: claude-agent-sdk 0.1.52 → 0.1.58, anthropic 0.84.0 → 0.93.0, ruff 0.15.4 → 0.15.10, and all transitive deps
  • Removed ~2,500 lines of dead prediction/routing code (assessor, calibration, duration, features modules)
  • Added scope guard and SDK improvement documentation
v2.7.6 Bug fix
Notable features
  • Rewrite of fix pipeline using triage and fix_reviewer archetypes (spec 82)
  • Added triage and fix_reviewer archetype registration, prompt templates, data types, and parse functions
Full changelog

What's Changed

Bug Fixes

  • fix: map SDK TextBlock to AssistantMessage in ClaudeBackend_map_message() silently dropped TextBlock content blocks, so the agent's actual text response (including JSON findings/verdicts from review archetypes) was never captured. Skeptic, verifier, and oracle always fell back to parsing markdown metadata, producing 100% parse failures.
  • fix: capture review archetype response text for parsing — Added response field to SessionOutcome and wired it through _extract_knowledge_and_findings() so review parsers receive the agent's actual output instead of a fallback transcript.

Features

  • feat: rewrite fix pipeline with triage/reviewer archetypes (spec 82) — Replaced skeptic/verifier in the fix pipeline with purpose-built triage and fix_reviewer archetypes. Triage produces structured acceptance criteria from GitHub issues; fix_reviewer verifies coder changes against those criteria with per-criterion PASS/FAIL verdicts. Includes retry loop with escalation ladder.
  • feat: add triage and fix_reviewer archetype registration and prompt templates
  • feat: add triage and fix-review data types and parse functions

Tests

  • Added unit, property, and integration smoke tests for the new fix pipeline (spec 82)

Other

  • Multiple type annotation and test fixture fixes
  • New specs: 82 (fix pipeline triage/reviewer), 83 (lint-spec coverage gaps)
v2.7.5 New feature
Notable features
  • Night‑shift issue‑first gate drains `af:fix` labeled issues before/after hunt scans with fail‑open semantics
  • Added `activity_callback`, `task_callback`, and `status_callback` to NightShiftEngine and FixPipeline with per‑archetype TaskEvent emission
  • Integrated ProgressDisplay for phase/idle status rendering in the Night‑shift CLI
Full changelog

What's Changed

Features

  • Night-shift issue-first gate: Issues with af:fix label are now drained before and after hunt scans, with fail-open semantics for platform API failures
  • Callback plumbing: Added activity_callback, task_callback, and status_callback to NightShiftEngine and FixPipeline, with per-archetype TaskEvent emission
  • Night-shift CLI display: Integrated ProgressDisplay with phase/idle status rendering

Improvements

  • Consolidated duplicated utilities and decoupled cross-module imports

Documentation

  • Added architecture documentation suite (spec authoring, planning, execution, night-shift)
  • Added coding harness analysis comparing agent-fox to Raschka's framework

Tests

  • Added integration smoke tests for night-shift wiring verification
  • Added unit tests for issue-first gate, callbacks, and display integration
v2.7.4 Bug fix

Fixed broken AI analysis import, isolated DuckDB tests, closed SDK response stream leaks.

Full changelog

Quality gate and type safety fixes

This release fixes issues surfaced by the night-shift daemon's quality gate scan.

Bug fixes

  • Broken import in quality gate AI analysis: quality_gate.py imported nonexistent get_client — fixed to use create_async_anthropic_client. AI-powered finding analysis now works instead of falling back to mechanical findings.
  • Test isolation for DuckDB: test_cost_limit_terminates hit the real knowledge database during parallel test execution, causing lock contention failures. Now properly mocks oracle context to avoid shared state.
  • SDK response stream leak: Close SDK response stream before client teardown to prevent ProcessError during async generator cleanup (#215).

Type safety improvements

  • Replaced object parameter types with PlatformProtocol in dedup.py and finding.py, removing stale type: ignore comments.
  • Added proper type casts in config_schema.py for nested model extraction.
  • Fixed classmethod decorator typing in config validator factory.
  • Added assert match is not None guards in prompt safety tests.
  • Typed _tgd test helpers and task tuple lists across 10 test files.

Lint fixes

  • Resolved all 8 ruff errors: sorted imports in resolver.py, added noqa: E402 for intentional late imports in tests/conftest.py, and auto-fixed import ordering in steering and graph test files.
v2.7.3 Breaking risk
Notable features
  • Spec 79: Hunt scan cross-iteration deduplication
  • Spec 80: Worktree cleanup hardening
Full changelog

Night-shift daemon fixes

This release fixes critical wiring gaps in the night-shift autonomous maintenance daemon that prevented it from operating correctly.

Bug fixes

  • Fix branch creation: _create_fix_branch() was defined but never called — archetype sessions ran on whatever branch was checked out instead of a dedicated fix branch. Issue closure now gated on successful harvest.
  • Scheduled re-polling: Issue checks and hunt scans only ran once at startup, then the daemon spin-looped idle. Now repeats at configured intervals (issue_check_interval, hunt_scan_interval).
  • Cost/session tracking: state.total_cost, state.total_sessions, and state.issues_created were never updated, making cost limits ineffective. Sessions now report token usage back to the engine for cost calculation.
  • Session limit enforcement: orchestrator.max_sessions was never checked in the night-shift engine (61-REQ-9.3 compliance).
  • Develop checkout restoration: After fix sessions, the repo stayed on the fix branch. Now restores develop after harvest so the next issue starts clean.

Other fixes

  • Resolved 8 pre-existing test failures across CLI override handling, review-only graph construction, and worktree hardening property tests.
  • Fixed flaky parallel test execution caused by deprecated asyncio.get_event_loop() usage.

Specs

  • Spec 79: Hunt scan cross-iteration deduplication
  • Spec 80: Worktree cleanup hardening
v2.7.2 Bug fix

Fixed flaky test failures by disabling hypothesis deadline.

Full changelog

What's Changed

  • fix: disable hypothesis deadline globally to eliminate flaky test failures
  • fix: disable hypothesis deadline on flaky property tests
  • chore: bump version to 2.7.2

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.7.1...v2.7.2

v2.7.1 Bug fix

Increased default session limits to prevent premature failures.

Full changelog

What's Changed

  • fix: correct staleness fallback when AI evaluation fails
  • fix: resolve four night-shift integration gaps (#226 #227 #228 #229)
  • fix: harvest branch and close issue after night-shift fix pipeline (#225)
  • fix: increase default session limits to prevent premature failures (#205)
  • fix: include enriched feature vector fields in StatisticalAssessor (#206)
  • docs+chore: integration gap analysis and strengthened af-spec template (#230)
  • style: auto-format code with ruff

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.7.0...v2.7.1

v2.7.0 New feature
Notable features
  • Implement watch loop core with `--watch` and `--watch-interval` flags, watch gate, stall detection (spec 70)
  • Add CachePolicy config and `cached_messages_create()` helper; migrate auxiliary modules to cached API (spec 77)
  • Make feature branches local‑only, push only develop (spec 78)
Full changelog

What's Changed

Features

  • feat(watch): implement watch loop core — --watch and --watch-interval CLI flags, watch gate, stall detection (spec 70)
  • feat(caching): add CachePolicy config and cached_messages_create() helper; migrate all auxiliary modules to cached API (spec 77)
  • feat(harvest): make feature branches local-only, push only develop (spec 78)
  • feat(fix): add FixProgressEvent/CheckEvent types, wire ProgressDisplay and callbacks (spec 76)
  • feat(engine): timeout-aware escalation with per-node retry logic (spec 75)
  • feat(engine): tolerant review parser with fuzzy wrapper key matching and field normalization (spec 74)
  • feat(engine): auto-reset blocked tasks on engine resume; clear attempt tracker on reset
  • feat(nightshift): AI critic for finding consolidation, batch triage, post-fix staleness check (spec 73)
  • feat(nightshift): reference parsing, dependency graph, and edge merging
  • feat(reporting): active tasks in status command (spec 72)
  • feat(platform): sort/direction params for list_issues_by_label
  • feat(ui): show agent archetype in spinner line
  • feat(config): timeout retry configuration fields in RoutingConfig

Fixes

  • fix(cli): enforce CLI separation by delegating to backing modules (fixes #210)
  • fix(barrier): run knowledge compaction during sync barriers (fixes #211)
  • fix(reporting): display agent archetype in status and standup output (fixes #216)
  • fix(retry): retry on network-level transport errors (fixes #208)
  • fix: handle SIGTERM gracefully and prune stale worktrees before branch deletion
  • fix(tests): reset agent_fox logger between tests to fix xdist flakiness
  • fix(tests): mock _setup_infrastructure in run_code tests to prevent MagicMock directory leak

Documentation

  • docs(watch): add --watch and --watch-interval to CLI reference
  • docs(caching): add [caching] section to configuration reference
  • docs(spec-78): update AGENTS.md for local-only feature branch workflow

Chores

  • Archived completed specs (59–76)
  • New specs: 77 (prompt caching), 78 (local-only feature branches)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.6.2...v2.7.0

v2.6.2 Maintenance

Minor fixes and improvements.

Full changelog

What's Changed

  • refactor: simplify engine, platform factory, and package re-exports
  • specs: add spec 74 (review parse resilience), spec 75 (timeout-aware escalation), spec 76 (fix progress display)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.6.1...v2.6.2

v2.6.1 New feature
Notable features
  • Simplified config template generation with visible sections and quality defaults (spec 68)
  • Round‑robin spec‑fair scheduling across specs (spec 69)
  • Watch mode added via watch_interval config field and WATCH_POLL audit event type (spec 70)
Full changelog

What's New in 2.6.1

  • Config simplification (spec 68): simplified config template generation with visible sections and quality defaults
  • Spec-fair scheduling (spec 69): round-robin scheduling across specs
  • Watch mode (spec 70): watch_interval config field and WATCH_POLL audit event type
  • Fix ordering (spec 71): spec and task ordering improvements
  • Status command: show active agents in status output
  • Develop sync fix: use update-ref instead of branch -f to avoid failures when develop is checked out in a worktree
v2.6.0 Breaking
Breaking changes
  • Removal of the Coordinator (spec 62) breaks existing workflows.
Notable features
  • Night-shift engine, CLI command, and audit events (spec 61)
  • Plan always-rebuild behavior (spec 63)
  • CLI separation and logging improvements (spec 59)
Full changelog

Release 2.6.0

Highlights:

  • Night-shift engine, CLI command, and audit events (spec 61)
  • Coordinator removal (spec 62)
  • Plan always-rebuild (spec 63)
  • CLI separation and logging improvements (spec 59)
  • End-of-run discovery (spec 60)
  • Steering document spec (spec 64)
  • Various archived specs (52–58) moved to archive
v2.5.2 Bug fix

Fixed session failures caused by identical main and fallback models and corrected CLI flag naming for extra_args.

Full changelog

What's Changed

Bug Fixes

  • fix(engine): skip fallback model when it equals the main model — The default fallback model (claude-sonnet-4-6) is the same as the STANDARD tier model used for coder sessions. The Claude CLI rejects --fallback-model when it matches the primary model, causing sessions to fail with "Fallback model cannot be the same as the main model." Now the fallback is omitted when it equals the session's model ID.

  • fix(session): use hyphenated CLI flag names for Claude SDK extra_args--max_budget_usd and --fallback_model were passed with underscores instead of hyphens, causing "unknown option" errors when budget or fallback model were configured.

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.5.1...v2.5.2

v2.5.1 Bug fix

Fixed session CLI flag naming to use hyphens instead of underscores, correcting unknown option errors.

Full changelog

What's Changed

Bug Fixes

  • fix(session): use hyphenated CLI flag names for Claude SDK extra_args--max_budget_usd and --fallback_model were passed with underscores instead of hyphens, causing "unknown option" errors when budget or fallback model were configured.
  • fix(engine): reset sub-task checkboxes during hard reset (fixes #163)
  • fix(workspace,platform): sanitize error messages to prevent path and API detail leakage (fixes #192)
  • fix(knowledge): escape markdown special characters in rendered output (fixes #193)
  • fix(cli): restrict config file and directory permissions to owner-only (fixes #191)
  • fix(core): validate LLM JSON responses with field-level constraints (fixes #186)
  • fix(knowledge): add SQL table allowlist for query safety (fixes #188)
  • fix(workspace): validate git ref names to prevent command injection (fixes #189)
  • fix(session): validate and cap review parser output sizes (fixes #187)
  • fix(core): add prompt content sanitization for injection defense (fixes #190)
  • fix(cli): add Claude CLI version and settings validation (fixes #185)

Refactoring

  • refactor: consolidate duplicated utilities and review parser logic — extracted shared audit helper, decomposed engine.py god-class, extracted SDK parameter resolution, inlined single-consumer blocking.py, split knowledge/query.py into focused modules. Net reduction of ~736 lines.

New Specs

  • Spec 59: CLI Separation and Logging Improvements
  • Spec 60: End-of-Run Discovery

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.5.0...v2.5.1

v2.5.0 Breaking risk
Breaking changes
  • Removed multi-backend abstraction; system now Claude‑only (spec 55).
  • Deleted entire `tools/` package including fox tools and MCP server.
Notable features
  • Knowledge feedback loop with automated extraction, causal context, fallback inputs, and embedding generation (spec 52)
  • Review persistence: structured parsing and DB storage of review findings across skeptic, verifier, oracle archetypes; retry context injection for coder retries; review‑only CLI mode (spec 53)
  • Quality gate complexity assessment with feature vector enrichment and heuristic evaluation (spec 54)
Full changelog

What's New in v2.5.0

New Features

  • Knowledge feedback loop (spec 52) — automated knowledge extraction with causal context, fallback inputs, and embedding generation
  • Review persistence (spec 53) — structured parsing and DB storage of review findings from skeptic, verifier, and oracle archetypes; retry context injection for coder retries; review-only CLI mode
  • Quality gate complexity assessment (spec 54) — quality gate execution with feature vector enrichment and heuristic assessment
  • SDK feature adoption (spec 56) — max_turns, max_budget_usd, fallback_model, and thinking configuration with hierarchical defaults and archetype overrides
  • Archetype model tiers (spec 57) — per-archetype default model tiers (ADVANCED for review archetypes, STANDARD for coder) with config overrides and ADVANCED escalation ceiling
  • Predecessor escalation (spec 58) — escalation ladder awareness in retry logic; predecessors only block after exhausting all ladder levels

Architecture

  • Claude-only commitment (spec 55) — removed multi-backend abstraction; simplified to Claude-exclusive backend with ADR documentation
  • Removed fox tools and MCP server — deleted the entire tools/ package (server, registry, edit, read, search, outline) in favor of Claude Code's native tooling
  • Code simplification — eliminated backward-compat shims, consolidated duplicated review parsing logic, extracted AssessmentManager and sync barrier sequence from Orchestrator, split long methods into focused units

Bug Fixes

  • Fixed missing severity normalization in engine review parser (accepted invalid severity values)
  • Fixed redundant archetype resolution in launch preparation
v2.4.6 Security relevant
Security fixes
  • Block shell metacharacters in command allowlist (fixes #178)
  • Harden spec name, improve error redaction and migration handling (fixes #179)
Full changelog

What's Changed

  • fix(security): block shell metacharacters in command allowlist (fixes #178) by @mickume in https://github.com/agent-fox-dev/agent-fox/pull/180
  • fix(security): harden spec name, merge lock, error redaction, migration dim (fixes #179) by @mickume in https://github.com/agent-fox-dev/agent-fox/pull/181

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.4.5...v2.4.6

v2.4.5 Breaking risk
Breaking changes
  • Removed functions: resolve_tier_ceiling, ensure_blocking_tables, create_circuit_breaker_event and unused tools/_file_io.py re-export shim.
  • Circular dependency between cli and knowledge broken by moving path constants to core/paths.
Full changelog

What's Changed

Refactoring & Simplification

  • Remove dead code: resolve_tier_ceiling, ensure_blocking_tables, create_circuit_breaker_event, unused tools/_file_io.py re-export shim
  • Consolidate duplicated insert/query logic in review_store.py via shared helpers (_insert_with_supersession, _query_active)
  • Consolidate three identical missing-section fixers via _append_missing_section
  • Consolidate hard_reset_all/hard_reset_task shared logic via _perform_hard_reset
  • Replace archetype if/elif dispatch with dict lookup in session_lifecycle.py
  • Unify coverage matrix and traceability table validators via _check_section_with_table
  • Extract _count_node_status helper in orchestrator engine
  • Break cli ↔ knowledge circular dependency by moving path constants to core/paths

Sync Barrier Hardening (spec 51)

  • Worktree verification and orphan detection
  • Bidirectional develop branch sync with merge lock
  • Hot-load gate pipeline (tracking → completeness → linting)
  • Parallel drain before barrier entry
  • Comprehensive test coverage (unit, property, integration)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.4.4...v2.4.5

v2.4.4 Bug fix

Improved blocked task error messages with clearer root‑cause details.

Full changelog

What's Changed

Code Simplification & Refactoring

  • Centralized node ID parsing: New core/node_id.py module with parse_node_id() and spec_name_of(), replacing 8 scattered node_id.split(":") patterns across the codebase
  • Consolidated path constants: Expanded cli/paths.py with PLAN_PATH, STATE_PATH, MEMORY_PATH, AUDIT_DIR — replacing inline path construction
  • Merged tool utilities: Combined tools/hashing.py + tools/_file_io.py into tools/_utils.py
  • Merged routing data+persistence: Combined routing/types.py + routing/storage.py into routing/core.py
  • Extracted blocking logic: New engine/blocking.py (145 lines) — skeptic/oracle blocking evaluation is now a pure function with BlockDecision return type, reducing engine.py from 1622 to 1522 lines

Bug Fixes (from v2.4.3)

  • Fixed log/spinner interference with LiveAwareHandler
  • Suppressed third-party warnings during orchestrator runs
  • Improved blocked task error messages with root cause
v2.4.3 Bug fix

Fixed blocked task errors to include the root cause of the last failed attempt.

Full changelog

What's Changed

  • Fix log/spinner interference: Log messages now route through Rich's Live console when the progress spinner is active, preventing corrupted display output
  • Suppress third-party warnings: HF Hub and sentence-transformers warnings no longer leak into the spinner display
  • Better blocked task messages: Blocked task errors now include the root cause from the last failed attempt
  • Status report accuracy: Fixed status filtering for archetype nodes with state overlay
v2.4.2 New feature
Notable features
  • `reset --spec <spec>`: resets all tasks for a single spec to pending, cleans worktrees/branches, and synchronizes `tasks.md` and `plan.json` without rolling back git or compacting knowledge
Full changelog

New Features

  • reset --spec <spec_name> — Spec-scoped reset command that resets all tasks (coder + archetype nodes) belonging to a single spec to pending, cleans worktrees/branches, and synchronizes tasks.md and plan.json. No git rollback or knowledge compaction — safe for re-executing one spec without affecting others. Mutually exclusive with --hard and positional <task_id>.

Bug Fixes (from v2.4.1)

  • plan/status: exclude archetype nodes from task counts — Injected archetype nodes were inflating totals; now only coder nodes are counted with review nodes shown separately.
  • status: honour tasks.md checkbox state — Status now seeds from graph statuses (reflecting [x] checkboxes) before overlaying orchestrator state.
  • plan: propagate completion to archetype nodes — Archetype nodes for completed specs no longer appear in the execution order.
  • git: prevent infinite hang on expired PATrun_git() now sets GIT_TERMINAL_PROMPT=0 and enforces timeouts (60s/120s) to prevent credential prompt hangs.
v2.4.1 Bug fix

Fixed task count inflation by excluding archetype nodes and honored manual checkbox completions in status.

Full changelog

Bug Fixes

  • plan/status: exclude archetype nodes from task counts — Injected archetype nodes (skeptic, oracle, verifier, auditor) were counted alongside real task groups, inflating totals and making progress appear lower than actual. Plan and status now report only coder nodes in task counts, with review nodes shown separately.

  • status: honour tasks.md checkbox state — When state.jsonl existed, the status command ignored tasks.md [x] checkboxes for manually completed work. Now seeds from graph statuses first (reflecting checkboxes), then overlays orchestrator state.

  • plan: propagate completion to archetype nodes — Archetype nodes were always shown as pending in the execution order, even when all coder tasks in their spec were completed. Now marks them as completed when all coder nodes in their spec are done.

  • git: prevent infinite hang on expired PATrun_git() had no timeout and did not suppress interactive credential prompts. When a PAT expired, git commands would hang indefinitely. Now sets GIT_TERMINAL_PROMPT=0 and enforces timeouts (60s default, 120s for remote operations).

v2.4.0 Bug fix

Fixed single-class calibration crash and suppressed HuggingFace Hub auth warning.

Full changelog

What's Changed

Fixes

  • Suppress HuggingFace Hub auth warning and fix single-class calibration crash

Documentation

  • Slim down root README to a concise project hook with install and quick start
  • Move detailed documentation (archetypes, model routing, fox tools, spec-driven development) into docs/
  • Add docs/README.md as the central documentation index

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.3.3...v2.4.0

v2.3.3 New feature
⚠ Upgrade required
  • Add `orchestrator.max_blocked_fraction` to TOML configuration to enable early stop on high block rates.
  • Review and adjust `archetypes.skeptic_settings.block_threshold` and `archetypes.oracle_settings.block_threshold` as needed for desired blocking behavior.
Notable features
  • Skeptic & Oracle archetypes enforce blocking when critical findings exceed configurable `block_threshold` (default Skeptic = 3, Oracle advisory-only if omitted).
  • New orchestrator config `max_blocked_fraction` stops runs early if blocked nodes reach the set fraction (e.g., 0.4 for 40%).
Full changelog

What's New

Skeptic & Oracle Blocking Enforcement

Skeptic and oracle archetypes now enforce blocking decisions in the engine. When a skeptic or oracle session completes, the engine queries its persisted review findings, counts critical findings against the configured block_threshold, and cascade-blocks the downstream coder task (and all its dependents) when the threshold is exceeded.

  • Skeptic: blocks when critical findings exceed archetypes.skeptic_settings.block_threshold (default: 3)
  • Oracle: blocks when critical drift findings exceed archetypes.oracle_settings.block_threshold; remains advisory-only when threshold is omitted (the default)
  • Blocking decisions are recorded to the blocking_history table for threshold learning

Block Budget

New orchestrator.max_blocked_fraction config option (default: disabled). When set, the engine stops the run early if the fraction of blocked nodes reaches the configured threshold, preventing wasted cost on doomed sessions when a systemic failure blocks a significant portion of the task graph.

[orchestrator]
max_blocked_fraction = 0.4  # stop if 40%+ of tasks are blocked

Other Changes

  • refactor: decompose engine.py and workspace.py god-modules into focused submodules (circuit.py, serial.py, injection.py, develop.py, git.py, worktree.py)
  • refactor: replace dict-based plan data with typed TaskGraph in engine
  • feat: add retry with backoff to all Anthropic API calls
  • fix: remove broken audit and serve-tools CLI commands (#174)
  • fix: update dump_knowledge.py to discover all DuckDB tables dynamically
  • refactor: consolidate duplicated API usage tracking and findings rendering
  • refactor: remove redundant DuckDB writes and double-sort
v2.3.2 Bug fix
Notable features
  • Introduced read_all_facts() with a 3-tier fallback strategy for resilient fact reading
Full changelog

What's Changed

Bug Fixes

  • Resilient fact reading with automatic fallback — Introduced read_all_facts() with a 3-tier fallback strategy (provided connection → read-only DuckDB → JSONL file), so reading facts always works regardless of DB availability.
  • Fixed empty docs/memory.mdrender_summary() was called without a DuckDB connection in the engine, producing an empty file every time. Now uses the fallback pipeline and receives the active connection from the Orchestrator.
  • Simplified status command — Replaced manual DuckDB/JSONL fallback in agent-fox status with the unified read_all_facts() function.
v2.3.1 Bug fix

Fixed progress spinner to reflect operation type and removed untracked files blocking harvest merges.

Full changelog

Fixes

  • Progress spinner: Extract tool-use blocks from SDK AssistantMessage content so the spinner shows "Reading…", "Editing…", etc. instead of always "Thinking…"
  • Harvest merge: Remove untracked files that would block fast-forward merge during harvest
v2.3.0 New feature
Notable features
  • New `auditor` archetype (spec 46) validates test code against `test_spec.md` contracts, checking coverage, assertion strength, edge‑case rigor and independence; opt‑in via `[archetypes] auditor = true`.
  • `agent-fox init` scaffolds Claude Code skill files alongside project configuration (spec 47).
Full changelog

What's New

Test Auditor Archetype (spec 46)

  • New auditor archetype that validates test code against test_spec.md contracts before implementation begins
  • Checks coverage, assertion strength, precondition fidelity, edge case rigor, and test independence
  • auto_mid injection mode (after test-writing group, before implementation)
  • Conservative convergence (union semantics — worst verdict wins)
  • Retry-predecessor with configurable circuit breaker
  • Disabled by default, opt-in via [archetypes] auditor = true

Init Skills (spec 47)

  • agent-fox init now scaffolds Claude Code skill files alongside project config

Token Counting & Cache Pricing Fix

  • Fixed cache token tracking: cache_read_input_tokens and cache_creation_input_tokens now flow through the full pipeline (SDK → ResultMessage → SessionOutcome → audit events → status report)
  • Fixed audit event payload storing combined tokens instead of separate input_tokens/output_tokens
  • Fixed build_status_report_from_audit always reporting output_tokens = 0
  • Added cache pricing to ModelPricing (cache read at 10%, cache creation at 125% of input price)

Other Changes

  • Merge lock and agent fallback for harvest/workspace operations
  • Merge agent for AI-based conflict resolution
  • Various test and infrastructure improvements
v2.2.2 Bug fix

Fixed harvest checkout failure when untracked runtime files existed.

Full changelog

Bug Fix

  • Fix harvest checkout failure with untracked files: When agent-fox runtime files (.agent-fox/config.toml, .agent-fox/state.jsonl, .claude/settings.local.json, docs/memory.md) existed as untracked files in the working directory but were also tracked on the develop branch, git checkout develop during harvest would fail, blocking all subsequent tasks in the same spec. Fixed by using force checkout in the harvest step, which is safe because all coding work happens in an isolated worktree.
v2.2.1 Bug fix
Notable features
  • Structured finding persistence for skeptic/verifier/oracle sessions
Full changelog

What's Changed

Bug Fixes

  • fix: align __version__ with 2.2.0 — runtime version was still reporting 2.1.2 after the 2.2.0 release
  • fix: use numeric confidence in DuckDB ingestion — two INSERT statements used string 'high' for the confidence column (migrated to DOUBLE in v5), causing ConversionException during background knowledge ingestion
  • fix: resolve model tier to model ID for pricing lookupsNodeSessionRunner._resolved_model_id stored tier names (e.g. "ADVANCED") instead of model IDs (e.g. "claude-opus-4-6"), causing pricing config misses and zero-cost estimates

Features

  • feat: wire structured finding persistence for skeptic/verifier/oracle — the review parsers and DB insert functions existed but were never called from the session lifecycle; skeptic, verifier, and oracle sessions now persist their structured JSON output (findings, verdicts, drift reports) to DuckDB, enabling downstream context rendering for coders and blocking/convergence logic

Internal

  • Version bump to 2.2.1
v2.2.0 Breaking risk
Breaking changes
  • DuckDB is now a hard requirement; `open_knowledge_store()` raises RuntimeError instead of returning None.
  • Removed all Optional connection parameters from session lifecycle, knowledge harvest, memory store, context assembly, and routing.
Full changelog

What's New in v2.2.0

Predictive Planning & Knowledge (Spec 39)

  • Duration-based task ordering — ready tasks sorted by predicted duration (longest first) to minimize wall-clock time, with regression model, historical median, and configurable presets as fallback chain
  • Causal graph + review findings — review/drift/verification findings integrated into causal traversal for richer downstream context
  • Confidence-aware fact selection — facts below a configurable confidence threshold are excluded from session context
  • Pre-computed ranked facts — fact rankings cached at plan time for faster context assembly
  • Cross-group finding propagation — critical findings from earlier task groups visible to downstream groups under "Prior Group Findings"
  • Project model — aggregate spec outcomes, module stability scores, and archetype effectiveness via agent-fox status --model
  • Critical path forecasting — identifies the longest-duration path through the task graph with tied-path detection
  • File conflict detection — predicts file overlaps between parallel tasks and serializes conflicting pairs (opt-in)
  • Learned blocking thresholds — adapts skeptic/oracle block thresholds from historical precision (opt-in)

Confidence Normalization (Spec 37)

  • Unified confidence representation as float [0.0, 1.0] across memory, knowledge, and routing
  • parse_confidence() function handles string enum → float conversion with canonical mapping
  • DuckDB migration v5: TEXT → DOUBLE for confidence columns
  • JSONL backward compatibility preserved

DuckDB Hardening (Spec 38)

  • DuckDB is now a hard requirementopen_knowledge_store() raises RuntimeError instead of returning None
  • Removed all Optional connection parameters from session lifecycle, knowledge harvest, memory store, context assembly, and routing
  • DuckDB errors propagate instead of being silently swallowed
  • Added knowledge_conn / knowledge_db test fixtures for isolated in-memory DuckDB

Other Changes

  • Hard reset (Spec 35) — agent-fox reset --hard with commit SHA tracking
  • Config generation (Spec 33) — agent-fox init generates config.toml from schema
  • Token tracking (Spec 34) — per-archetype and per-spec cost breakdowns in status
  • Oracle archetype (Spec 32) — drift detection agent with blocking logic
  • Prompt rewrites — oracle, librarian, cartographer, coordinator prompts rewritten to gold standard pattern
  • AGENTS.md rewrite — project-specific conventions documented
  • Harvest reconciliation (Spec 36) — post-harvest develop branch reconciliation

Beta — feedback welcome: [email protected]