No immediate action

v4.2.6 Breaking risk 21h

Removes redundant findings and audit

Open

Review required

v4.2.5 New feature 6d

Auth RBAC

Cross‑spec checks + generate improvements

Open

No immediate action

v4.2.4 Bug fix 8d

Nightshift harvest cleanup

Open

No immediate action

v4.2.3 Bug fix 11d

Fixes ordering bug

Open

No immediate action

v4.2.2 Bug fix 11d

Harvest deadlock fix

Open

No immediate action

v4.2.1 Breaking risk 11d

--no-parallel + scoped context

Open

No immediate action

v4.2.0 Maintenance 16d

Routine maintenance and dependency updates.

Open

No immediate action

v4.1.19 Breaking risk 16d

Pre-flight fix + error structuring + deps mandate

Open

No immediate action

v4.1.18 Bug fix 17d

Fix ProcessLookupError on kill

Open

No immediate action

v4.1.17 Bug fix 17d

Fix streaming timeout

Open

No immediate action

v4.1.16 New feature 17d

Skills installation and symlinks

Open

No immediate action

v4.1.15 Bug fix 18d

nightshift fixes

Open

No immediate action

v4.1.14 Maintenance 18d

Routine maintenance and dependency updates.

Open

No immediate action

v4.1.13 New feature 18d

af-issue skill

Open

No immediate action

v4.1.12 New feature 19d

GoogleADKBackend + deepagents

Open

Config change

v4.1.11 Breaking risk 19d

Breaking upgrade

Pre-flight merge + Curator removal

Open

No immediate action

v4.1.10 Maintenance 19d

Routine maintenance and dependency updates.

Open

No immediate action

v4.1.9 Mixed 19d

Nightshift summary + tier docs + bug fixes

Open

Config change

v4.1.8 Breaking risk 19d

Breaking upgrade

Convergence detection, gate archetype, effort config

Open

Upgrade now

v4.1.7 Bug fix 20d

Dispatch blockage fix

Open

No immediate action

v4.1.6 Breaking risk 20d

--json refactoring

Open

No immediate action

v4.1.5 Breaking risk 20d

File removals + fix

Open

No immediate action

v4.1.4 New feature 20d

Cross-spec interface consistency

Open

No immediate action

v4.1.3 Maintenance 22d

Routine maintenance and dependency updates.

Open

Review required

v4.1.2 Mixed 22d

Auth RBAC

Post‑merge validation + drift‑review blocking

Open

No immediate action

v4.1.1 Bug fix 23d

Fix DriftFinding sort error

Open

No immediate action

v4.1.0 New feature 23d

Configurable session cap + af-audit skill

Open

Config change

v4.0.3 Breaking risk 24d

Breaking upgrade

Local-only config loading

Open

No immediate action

v4.0.2 Mixed 24d

Engine refactor + af spec fixes + bash installer

Open

No immediate action

v4.0.1 Breaking risk 25d

Refactoring + Bug Fixes + Chores

Open

No immediate action

v4.0.0 Breaking risk 1mo

nightshift CLI + spec v1.2 + knowledge overhaul

Open

No immediate action

v3.7.2 Bug fix 1mo

Nightshift bug fixes

Open

No immediate action

v3.7.1 Breaking risk 1mo

--debug removal

Open

No immediate action

v3.7.0 Breaking risk 1mo

Nightshift mode change + lint‑specs flag removal

Open

No immediate action

v3.6.5 Breaking risk 2mo

Dry-run test fix

Open

No immediate action

v3.6.4 Breaking risk 2mo

Non‑retryable push failure fix

Open

No immediate action

v3.6.3 New feature 2mo

--dry-run flag

Open

No immediate action

v3.6.2 Breaking risk 2mo

--dry-run flag + fallback_model removal

Open

v3.6.1 New feature 2mo

Notable features

_push_with_retry mechanism with error classification, audit events, and lock reentrancy for develop sync
Atomic push integrated into harvest and session lifecycle

Full changelog

What's Changed

Features

Atomic push with retry — new _push_with_retry mechanism with error classification, audit events, and lock reentrancy for develop sync (#121)
Wired atomic push into harvest and session lifecycle (#121)

Bug Fixes

Findings reporting: replaced legacy archetype labels with current names (#591)
Insights command: added --dismiss flag for manual finding invalidation (#592)
Config: fixed retries_before_escalation config path and added deprecation warning (#589)
Fixed double push in fix_pipeline harvest flow (#121)

Tests

Added failing spec tests for atomic push with retry (#121)

View release on GitHub

v3.6.0 Breaking risk 2mo

Breaking changes

Removed `fix` command from CLI
Removed `--output` flag from `standup` command
Deprecated `[models]` section; moved `fallback_model` to `[routing]` and removed obsolete config options

Security fixes

Added transport‑level DNS re‑validation in nightshift to close SSRF TOCTOU (#580)
Sanitized exception content in fix session failure comment (#583)
Added path containment check before file deletion (#579)

Full changelog

What's Changed

Refactoring

engine: Consolidate 6 parallel tracking dicts in SessionResultHandler into a single _NodeRetryState dataclass
engine: Inline assessment.py (single-consumer module) into engine.py
Remove dead code: _estimate_tokens, _table_exists, _column_exists
Add run_git_sync() to workspace/git.py and migrate sync callers
Extract _resolve_github_remote() helper in nightshift/platform_factory.py
Deduplicate comment formatting in nightshift/fix_pipeline.py

Bug Fixes

nightshift: Add transport-level DNS re-validation to close SSRF TOCTOU (#580)
nightshift: Sanitize exception in fix session failure comment (#583)
nightshift: Add path containment check before file deletion (#579)
nightshift: Reject path traversal in archetype/mode/name parameters (#585)
nightshift: Add symlink checks to profiles.py and analyzer.py (#586)
nightshift: Replace shutil.rmtree with _safe_rmtree to avoid symlink traversal (#587)
nightshift: Reject reserved, multicast, and unspecified addresses in SSRF check (#581)
nightshift: Deprecate [models] section, move fallback_model to [routing] (#577)
nightshift: Remove fix command from CLI (#575)
nightshift: Remove --output flag from standup command (#573)
nightshift: Remove obsolete config options and dead code (#574)
nightshift: Apply ruff format to barrier.py and test_barrier.py (#588)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.6...v3.6.0

View release on GitHub

v3.5.6 Bug fix 2mo

Fixed nightshift audit‑review blockage by filtering future‑group findings.

Full changelog

What's Changed

Bug Fixes

nightshift: Filter deferred-to-future-group findings from audit-review blocking (#572)
nightshift: Collapse ternary expression to satisfy ruff format (#571)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.5...v3.5.6

View release on GitHub

v3.5.5 Bug fix 2mo

Audit-reviewer now grades test design quality instead of execution results, fixing misclassification.

Full changelog

What's Changed

Bug Fixes

Wire audit_max_retries into audit-review retry logic (#567, #569)
- ReviewerConfig.audit_max_retries was defined but never read by the retry logic. Added a dedicated per-coder-node counter (_audit_retry_counts) so audit-review retries are tracked independently of the generic EscalationLadder, preventing infinite retry loops.
Audit-reviewer grades test design quality, not execution results (#568, #570)
- The audit-review profile was conflating test pass/fail status with test design quality, marking well-designed tests as WEAK when they failed due to unimplemented upstream specs. Updated the template to grade design quality only — a correctly designed test that cannot pass yet is PASS, not WEAK. Added anti-pattern examples and multi-spec dependency guidance.

Documentation

Updated CLI and config reference to match current codebase

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.4...v3.5.5

View release on GitHub

v3.5.4 Bug fix 2mo

Fixed four bugs that prevented stored knowledge from being retrieved in downstream sessions.

Full changelog

Knowledge Retrieval Fixes (Spec 120)

Fixes four bugs in the knowledge system's read side that caused stored knowledge to never reach downstream sessions.

Bug Fixes

Wire run_id to FoxKnowledgeProvider — _run_id was initialized to None and never set by the engine, causing _query_same_spec_summaries() and _query_cross_spec_summaries() to always return empty. Session summaries were stored but never retrieved. Every fox_provider log line showed 0 context + 0 cross-spec items.
Elevate pre-review findings to tracked context — Group 0 (skeptic pre-review) findings were served as untracked [CROSS-GROUP] items instead of tracked [REVIEW] items. They are now included in primary review results, tracked in finding_injections, and properly superseded on session completion.
All-archetype summary storage — Only coder sessions produced summaries. Reviewer and verifier sessions now generate structured summaries (finding counts, pass/fail ratios) that are stored and served to downstream sessions.
Cross-run finding carry-forward — Active findings orphaned by stalled runs are now surfaced as [PRIOR-RUN] context items at the start of a new run, capped at 5 per spec.

Stats

4658 tests passing
26 files changed, +3944 / -53 lines

View release on GitHub

v3.5.3 New feature 2mo

Notable features

Git stack hardening: workspace health checks, force-clean option, non-retryable error classification, pre-session guards, run lifecycle management, idempotent cascade blocking, and develop sync audit trail

Full changelog

What's Changed

Features

Session summary storage (spec 119): Session summaries are now stored, retrieved, and integrated into the knowledge provider and lifecycle audit events
Git stack hardening (spec 118): Workspace health checks, force-clean option, non-retryable error classification, pre-session workspace guards, run lifecycle management, idempotent cascade blocking, and develop sync audit trail

Fixes

Night shift: Raised stale_timeout default to 3600s and added heartbeat (#561)

Refactoring

Eliminated single-file routing/ and security/ packages — moved to core/
Merged knowledge/provider.py protocol into knowledge/fox_provider.py

Stats

4621 tests passing
97 files changed across the release

View release on GitHub

v3.5.2 New feature 2mo

Notable features

Cross‑group knowledge retrieval with [CROSS-GROUP] prefix, limited by max_cross_group_items (default 3), ranked by relevance and excluded from injection tracking

Full changelog

What's Changed

Features

#559: Add cross-group knowledge retrieval — sessions now see findings and FAIL verdicts from other task groups in the same spec via [CROSS-GROUP] prefix, capped at max_cross_group_items (default 3), ranked by keyword relevance, and excluded from injection tracking

Fixes

__init__.py: Sync __version__ with pyproject.toml (was stuck at 3.5.0)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.1...v3.5.2

View release on GitHub

v3.5.1 Bug fix 2mo

Fixed knowledge retrieval to surface FAIL verdicts from verification results.

Full changelog

What's Changed

Bug Fixes

#553: Collapse list comprehension to single line for ruff format
#554: Gate audit-review on active findings to trigger coder retry
#555: Surface FAIL verdicts from verification_results in knowledge retrieval
#556: Filter knowledge findings by task_group to avoid redundant injection
#557: Rank findings and verdicts by task_description keyword overlap
#558: Track injected findings per session to prevent re-injection after successful completion

Docs

Rewrite architecture.md to reflect current codebase

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.0...v3.5.1

View release on GitHub

v3.5.0 Maintenance 3mo

Routine maintenance release for Agent-fox.

Changelog

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.4.1...v3.5.0

View release on GitHub

v3.4.1 Bug fix 3mo

Notable features

ADR ingestion pipeline (MADR parser, validator, DuckDB migration v22 for adr_entries table)

Full changelog

What's Changed

Bug Fixes

fix(#547): Add errata markdown-to-DuckDB indexing path — errata files in docs/errata/ are now indexed into the DuckDB errata table, closing the write-only gap where errata were created but never retrievable
fix(#548): Fix audit-review task_group partitioning causing supersession silos — audit findings no longer use a hardcoded empty task_group, enabling proper supersession across review modes
fix(#549): Move steering.md from .agent-fox/specs/ to .agent-fox/
fix(#546): Fix ruff format violations in harvest warning strings
fix(#545): Serialize AuditJsonlSink writes with threading.Lock to prevent interleaved concurrent appends

Features

feat: ADR ingestion pipeline (spec 117) — MADR parser, validator, DuckDB migration v22 for adr_entries table, and integration into FoxKnowledgeProvider for retrieval during coder sessions

Documentation

docs: ADR 07 — Define audit JSONL event format (envelope schema + complete event type catalog)
docs: Code quality audit (specs 7–9)
docs: Parking service 3.4.0 audit

Chores

Bump version to 3.4.1

View release on GitHub

v3.4.0 Breaking risk 3mo

Breaking changes

Renamed CLI command `findings` → `insights`

Full changelog

What's Changed

Fixes

#543: Drop dead knowledge system columns retrieval_summary and coverage_data
#542: Fix ruff format violation in warning string
#541: Fix ruff format violation in list comprehension
#539: Add quick-triage bail-out to coder prompt
#537: Rename CLI command findings to insights
#536: Add AC-4 test and fix ruff format violation
#534: Add AC-3 integration test for verifier dispatch without phantom task group

Chores

Upgrade dependency version pins (pydantic, rich, duckdb, sentence-transformers, scikit-learn, pathspec, tree-sitter, pytest, ruff, mypy, and more)
Update auto-generated errata dates

View release on GitHub

v3.3.1 New feature 3mo

Notable features

Test coverage regression gate blocks tasks on decreased per-file line coverage
Multi-language coverage tool detection: pytest-cov (Python), cargo-tarpaulin (Rust), go test -cover (Go)
Coverage data stored in session outcomes for trend tracking (migration v20)

Full changelog

What's New

Features

Test coverage regression gate — measures per-file line coverage before and after coder sessions; blocks the task if coverage decreases on modified files (#520)
- Multi-language coverage tool detection: pytest-cov (Python), cargo-tarpaulin (Rust), go test -cover (Go)
- Coverage data stored in session outcomes for trend tracking (migration v20)
- Blocking findings emitted via review_findings table on regression

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.3.0...v3.3.1

View release on GitHub

v3.3.0 New feature 3mo

Notable features

Structured verification checklist for spec compliance (#521)
State transition validation in GraphSync to catch illegal graph moves (#523)
Eager pre-review with retry on predecessor failure restored (#519)

Full changelog

What's New

Features

Verification checklist & task completion enforcement — structured verification checklist for spec compliance (#521)
State transition validation in GraphSync — validates engine state transitions to catch illegal graph moves (#523)
Eager pre-review with retry-predecessor — restores eager pre-review behavior with retry on predecessor failure (#519)
Lightweight errata generation from blocking — reinstates errata generation when issues are blocked (#522)
Knowledge system pruning — migration v18 removes causal links and dead knowledge modules (spec 116)

Bug Fixes

Fix max_items in property test to avoid retrieval cap masking failures
Use Path-typed specs_path variable in plan_cmd (#516)
Fix ruff format violation in RuntimeError f-string (#515)
Add proper type annotations for embedder and backend variables (#514)

Refactoring

Extract strategy classes from engine, fix_pipeline, and result_handler (#518)
Inline single-consumer modules and deduplicate review parser
Remove dead code and consolidate single-consumer modules (2 passes)
Remove dead code and consolidate near-identical abstractions
Delete dead knowledge modules (blocking_history, errata_store, gotcha_extraction, gotcha_store) and simplify provider

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.2.0...v3.3.0

View release on GitHub

v3.2.0 Breaking risk 3mo

Breaking changes

Removed `KnowledgeProvider` protocol decoupling of legacy knowledge pipeline modules
Removed obsolete knowledge pipeline configuration options
Removed onboard CLI command and legacy nightshift streams

Notable features

Decouple knowledge subsystem via `KnowledgeProvider` protocol (spec 114)
Pluggable knowledge provider with gotcha extraction, errata store, and content hashing (spec 115)
Wire `FoxKnowledgeProvider` into engine startup

Full changelog

What's Changed

Features

knowledge: Decouple knowledge subsystem via KnowledgeProvider protocol (spec 114)
knowledge: Pluggable knowledge provider with gotcha extraction, errata store, and content hashing (spec 115)
engine: Wire FoxKnowledgeProvider into engine startup

Refactors

knowledge: Delete 40+ legacy knowledge pipeline modules (lang analyzers, retrieval, consolidation, embeddings, etc.)
config: Remove obsolete knowledge pipeline configuration options
cli: Remove onboard command and legacy nightshift streams

Chores

Supersede specs 112 (sleep time compute) and 113 (knowledge effectiveness)
Fix Unicode edge case in content hash determinism property test
Clean up leftover __pycache__ directories in deleted knowledge subdirectories

View release on GitHub

v3.1.4 Bug fix 3mo

Notable features

Pre-flight check to skip coder sessions when work is already done

Full changelog

What's Changed

Bug Fixes

engine: Close AsyncAnthropic clients to prevent event loop shutdown crash (fixes #506)
engine: Skip redundant cleanup ingestion when barrier already ran (fixes #505)
knowledge: Always write agent trace JSONL for transcript reconstruction (fixes #507)
Guard trace reconstruction behind debug flag to suppress spurious warning

Features

engine: Add pre-flight check to skip coder sessions when work is done (fixes #511)

Other

New specs 114 (knowledge decoupling), 115 (pluggable knowledge)
Coding-session architecture documentation
General cleanup

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.3...v3.1.4

View release on GitHub

v3.1.3 Bug fix 3mo

Notable features

Knowledge system effectiveness improvements (spec 113) covering transcript reconstruction, compaction, entity signals, cold-start handling, git extraction, audit consumption, retrieval validation, and prompt injection

Full changelog

What's Changed

Bug Fixes

Budget exhaustion detection: Sessions that hit the SDK max-budget-usd limit are now detected and blocked immediately instead of being wastefully retried. The SDK returns is_error=True with no message on budget exhaustion — previously mapped to "Unknown error" and retried through the escalation ladder.
AssessmentManager config: Pass full_config (not OrchestratorConfig) to AssessmentManager, fixing missing attribute errors.
Escalation ladder starting tier: The escalation ladder now respects config.models.coding for the starting tier instead of always defaulting to STANDARD.
Timed-out session metrics: Emit descriptive error messages and metrics for sessions that time out.

Features

Knowledge system effectiveness (spec 113): Transcript reconstruction, compaction improvements, entity signal activation, cold-start handling, git extraction, audit consumption, retrieval quality validation, and audit prompt injection.

Other

Parking service audit report
Session budget increased for lengthy tasks

View release on GitHub

v3.1.2 Bug fix 3mo

⚠ Upgrade required

If a run is stuck with audit-review tasks blocked by "Retry limit exceeded", clear the stale state using `agent-fox reset --spec`.

Full changelog

Bug Fixes

engine: Move review concurrency cap before _prepare_launch to prevent phantom retry exhaustion (fixes #503)

The review concurrency cap in _fill_parallel_pool was checked after _prepare_launch(), which increments the attempt tracker on "allowed" verdicts. When the single review slot was occupied, audit-review tasks were skipped but their attempt counter was already incremented. After max_retries + 1 (default 3) such pool-refill cycles, the circuit breaker permanently blocked the task with "Retry limit exceeded" — without ever starting a session. This cascade-blocked all downstream coding and verifier tasks, exceeding the block budget and halting the entire run.

Recovery for affected runs

If you have a stuck run with audit-review tasks blocked by "Retry limit exceeded", clear the stale state:

agent-fox reset --spec <affected_spec_name>

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.1...v3.1.2

View release on GitHub

v3.1.1 Bug fix 3mo

Fixed death-loop caused by stale session‑scoped DB rows after reset.

Full changelog

Bug Fixes

reset: clear session-scoped tables on reset to prevent block_limit death-loop (#501)

After a block_limit run, reset --hard (and soft reset) left stale data in six session-scoped DB tables (runs, session_outcomes, review_findings, verification_results, drift_findings, blocking_history). The stale runs.status='block_limit' caused load_state_from_db() to load a terminal status, making the engine loop exit immediately on every subsequent agent-fox code invocation — a self-perpetuating death-loop with no CLI recovery path.

All four reset paths (reset_all, reset_task, reset_spec, hard_reset_all/hard_reset_task) now clear session-scoped tables so that plan and code start from a clean state.

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.0...v3.1.1

View release on GitHub

v3.1.0 New feature 3mo

Notable features

Core protocol, orchestrator, and configuration schema for sleep-time tasks
ContextRewriter task that rewrites and enriches knowledge context during idle periods
BundleBuilder task that consolidates knowledge into bundles

Full changelog

What's New

Sleep-Time Compute (Spec 112)

A new knowledge-processing pipeline that runs background computation during idle periods:

Core protocol & orchestrator — schema, configuration, and orchestration layer for sleep-time tasks
ContextRewriter — sleep task that rewrites and enriches knowledge context
BundleBuilder — sleep task that builds consolidated knowledge bundles
Retriever & integration wiring — retrieval layer with full integration into the existing knowledge system
Wiring verification — end-to-end verification of the sleep-time compute pipeline

Full Changelog

feat(112): implement core protocol, orchestrator, config, and schema
feat(112): implement ContextRewriter sleep task
feat(112): implement retriever and integration wiring
test(112): failing spec tests, checkpoint, and wiring verification

View release on GitHub

v3.0.5 Bug fix 3mo

Notable features

--specs-dir flag added to plan and night-shift commands
Progress spinner added to onboard command

Full changelog

What's Changed

Bug Fixes

nightshift: exclude .agent-fox/ from onboard file scanning (#499)
nightshift: add --specs-dir flag to plan and night-shift commands (#498)
nightshift: add progress spinner to onboard command (#497)

Other

Updated config

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.4...v3.0.5

View release on GitHub

v3.0.4 Bug fix 3mo

Fixed triage task prompt causing parse failures.

Full changelog

What's Changed

fix(nightshift): Triage agent now receives a triage-specific task prompt instead of the coder's "Fix the issue" prompt. This was the root cause of all triage parse failures — the agent would implement the fix instead of producing a JSON triage report.
fix(tests): Knowledge wiring tests no longer leak .specs/ directories into the working tree.

View release on GitHub

v3.0.3 Breaking risk 3mo

Breaking changes

Removed deprecated `extract_spec_name` wrapper
Deleted backward-compatibility re-export shims: `session/archetypes`, `nightshift/config`, `knowledge/query`

Notable features

Three-tier priority scheduling places coders before reviews for better throughput
Deferred review injection lazily promotes review nodes when slots are idle
Review concurrency cap limits parallel pool size

Full changelog

What's Changed

Features

Three-tier priority scheduling — coders scheduled before reviews for better throughput (#490)
Deferred review injection — lazy promotion of review nodes when slots are idle (#491)
Review concurrency cap in parallel pool (#489)

Performance

Pre-review scheduling optimization for critical-path specs (#476)
Skip LLM extraction for reviewer archetypes and short transcripts (#475)

Bug Fixes

Cascade blocking through in_progress nodes to prevent downstream dispatch (#481)
Use datetime.now(UTC) for run timestamps (#480)
Remove duplicate harvest.complete emission (#482)
Populate commit_sha in git.merge audit events (#484)
Classify review findings into multiple categories (#485)
Generate embeddings for consolidated and pattern facts
Correct return type annotation in _sort_key
Drop dead tables and remove orphaned code (#460)
Annotate test-file errors and add false-positive guidance to hunt prompts (#493)

Refactoring

Consolidate file-only language analyzers (HTML, JSON, regex) into SimpleAnalyzer base class
Replace 17 try/except blocks in language registry with data-driven loop
Merge WorkStream protocol into streams.py
Remove deprecated extract_spec_name wrapper
Delete backward-compatibility re-export shims (session/archetypes, nightshift/config, knowledge/query)
Update ~75 import sites to canonical module paths

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.2...v3.0.3

View release on GitHub

v3.0.2 Breaking risk 3mo

Fixed duplicate session outcomes rows and validated UUIDs in causal link parsing to prevent data loss.

Full changelog

Bug Fixes

fix(engine): Remove duplicate session_outcomes rows — every session was writing two DB entries, one with incomplete data (cost=0, model=NULL). The redundant sink-based insertion path has been removed; session outcomes are now written exclusively by SessionResultHandler.process(). (#473)
fix(knowledge): Validate UUIDs in causal link parsing — the LLM sometimes returned truncated UUIDs or git SHAs instead of valid fact UUIDs, causing ConversionException in DuckDB and silently dropping all causal links for the session. parse_causal_links() now validates UUID format before returning. (#474)

Other

Updated README

View release on GitHub

v3.0.1 Bug fix 3mo

Notable features

Updated default config.toml template for v3 in init

Full changelog

What's Changed

Bug Fixes

nightshift: Eliminate contradictory 'skipping'/'Applied' log messages for migrations v5 and v10
nightshift: Leave issue open when coder produces no commits (#466)
nightshift: Prevent re-processing of closed issues in drain loop (#465)
nightshift: Increment scan counter in _run_issue_check (#469)
nightshift: Propagate fix_run_id from engine into process_issue (#468)
nightshift: Populate runs and session_outcomes from fix pipeline (#467)
nightshift: Add run_id to empty-body rejection comment and GitHub issue comments (#464)
nightshift: Remove obsolete memory.jsonl creation from init (#461)
nightshift: Move rev-list checks outside merge lock in _sync_develop_with_remote (#458)
nightshift: Pass project root (not .agent-fox dir) as repo_root to barrier (#454)
nightshift: Clean up stale running runs on orchestrator startup (#456)
nightshift: Stop per-file row explosion in session_outcomes (#457)
nightshift: Add AC-5 test for record_tool_error sink failure resilience (#459)
harvest: Pass embedder to extract_and_store_knowledge (#453)
harvest: Use feature branch commit message for squash merges

Improvements

init: Update default config.toml template for v3
refactor: Rename agent_base.md profile to agent.md

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.0rc4...v3.0.1

View release on GitHub

v3.0.0 Breaking risk 3mo

⚠ Upgrade required

Legacy configuration keys for Skeptic, Oracle, and Auditor archetypes are automatically migrated but emit deprecation warnings.
Projects using the default `.specs/` directory will see a deprecation warning; update `[paths] spec_root` in `config.toml` to the new location.

Breaking changes

Removed Skeptic, Oracle, and Auditor archetypes; replaced with unified Reviewer archetype having modes `pre-review`, `drift-review`, `audit-review`, and `fix-review`.
Plan state storage moved from `plan.json` to DuckDB tables (`plan_meta`, `plan_nodes`).
Spec root directory is now configurable via `[paths] spec_root` in `config.toml`; projects using `.specs/` receive deprecation warnings.

Notable features

Unified retriever with weighted Reciprocal Rank Fusion (RRF) combining keyword, vector, entity graph, and causal chain signals.
First‑class customizable markdown agent profiles located in `.agent-fox/profiles/` with mode‑specific resolution; `agent-fox init --profiles` installs defaults.

Full changelog

agent-fox v3.0.0

The first stable release of agent-fox v3. This release completes the transition
from the v2 architecture to a consolidated, mode-based archetype system with
DuckDB-backed state management, adaptive knowledge retrieval, and comprehensive
documentation.

Highlights

Archetype Consolidation

The former Skeptic, Oracle, and Auditor archetypes are now unified into a
single Reviewer archetype with four modes: pre-review, drift-review,
audit-review, and fix-review. Legacy configuration keys are automatically
migrated with deprecation warnings. The archetype registry now contains four
entries: Coder, Reviewer, Verifier, and Maintainer.

Adaptive Knowledge Retrieval

A new unified retriever fuses four signals — keyword, vector, entity graph,
and causal chain — via weighted Reciprocal Rank Fusion (RRF). Intent profiles
adjust signal weights per archetype and task status. Salience-based token
budgeting ensures the most relevant facts get full detail while staying within
context limits.

Agent Profiles

Profiles are now first-class, customizable markdown files that define agent
behavioral guidance. Projects can override any profile via
.agent-fox/profiles/ with mode-specific resolution. Run
agent-fox init --profiles to install defaults for customization.

DuckDB Plan Persistence

Plan state is now stored in DuckDB tables (plan_meta, plan_nodes) instead
of plan.json, consolidating all persistent state in a single store.

Configurable Spec Root

The spec root directory is now configurable via [paths] spec_root in
config.toml (default: .agent-fox/specs). Projects using .specs/ are
auto-detected with a deprecation warning.

What's Changed (since v3.0.0-rc6)

Features

Configurable spec root directory (#371)
Bash, HTML, JSON, CSS, regex, and Swift language analyzers (#426)
Agent base profile replaces CLAUDE.md in Layer 1 (#430)
Mode-specific reviewer profiles to prevent schema cross-contamination
Templates field on ModeConfig and ArchetypeEntry
Schema, data models, and entity store for spec 95

Fixes

Embedding dimension assertion in allowlist before SQL interpolation (#346)
Hot-load queries plan_nodes DB table instead of plan.json (#444)
Blocking history and learned thresholds migration (#449)
Wire config.models.coding into resolve_model_tier for coder archetype
Squash merge in harvest fallback to eliminate double-commit pattern
Night-shift: replace af:fix removal with af:fixed label on issue closure (#429)
Hollow generate_status test and production bug (#428)
Include archived specs in dependency validation

Documentation

Complete documentation audit and update for v3
New profiles guide (docs/profiles.md)
Expanded prompt generation section in architecture docs
All legacy Skeptic/Oracle/Auditor references updated to mode-based terminology
CLI reference: added --profiles, findings, and onboard commands
Config reference: removed stale hooks section, added missing pricing entries
Architecture docs verified against source code

Dependencies

Upgraded anthropic to 0.96 and claude-agent-sdk to 0.1.60

Installation

uv tool install agent-fox

Full Changelog

https://github.com/agent-fox-dev/agent-fox/compare/v3.0.0-rc6...v3.0.0

View release on GitHub

v2.9.1 Bug fix 3mo

Notable features

Nightshift cost tracking (spec 91) with SinkDispatcher plumbing, auxiliary and quality‑gate paths
Transient audit reports moved to .agent-fox/audit/ with PASS deletion and spec completion cleanup

Full changelog

What's Changed

Features

Nightshift cost tracking (spec 91): Wire SinkDispatcher plumbing, auxiliary cost tracking, and quality gate cost tracking path
Transient audit reports (spec 92): Move audit reports to .agent-fox/audit/, add PASS deletion and spec completion cleanup

Bug Fixes

#330: Guard fetchone() result against None before indexing in nightshift
#329: Correct return type annotation of _default_config from object to AgentFoxConfig

Housekeeping

Moved implemented specs to .specs/archive/

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.9.0...v2.9.1

View release on GitHub

v2.9.0 Breaking risk 3mo

Notable features

Scope guard subsystem (parser, validator, detector, checker, builder, classifier, telemetry)
fix_coder archetype with dedicated `fix_coding.md` template integrated into fix pipeline
Fact lifecycle management (deduplication, decay, cleanup, LLM contradiction detection) in harvest pipeline

Full changelog

What's New

Features

Scope guard subsystem (spec 87) — source parser, stub validator, overlap detector, preflight checker, prompt builder, session classifier, and telemetry persistence
fix_coder archetype (spec 88) — dedicated archetype with fix_coding.md template, wired into the fix pipeline
Fact lifecycle management (spec 90) — dedup, decay, cleanup, and LLM-based contradiction detection wired into the harvest pipeline and sync barrier
Simplified model routing (spec 89) — removed prediction pipeline, feature enrichment, duration estimation, and calibration modules; routing now uses ladder-based assessment only

Fixes

Wire SDK Notification hook for activity progress events (#320)
Guard fetchone() results against None in run_cleanup
Add routing_assessments and routing_pipeline params to SessionResultHandler (#325)
Wire fix_coder archetype into fix pipeline
Fix type errors and stale test assertions across nightshift tests

Maintenance

Updated dependencies: claude-agent-sdk 0.1.52 → 0.1.58, anthropic 0.84.0 → 0.93.0, ruff 0.15.4 → 0.15.10, and all transitive deps
Removed ~2,500 lines of dead prediction/routing code (assessor, calibration, duration, features modules)
Added scope guard and SDK improvement documentation

View release on GitHub

v2.7.6 Bug fix 3mo

Notable features

Rewrite of fix pipeline using triage and fix_reviewer archetypes (spec 82)
Added triage and fix_reviewer archetype registration, prompt templates, data types, and parse functions

Full changelog

What's Changed

Bug Fixes

fix: map SDK TextBlock to AssistantMessage in ClaudeBackend — _map_message() silently dropped TextBlock content blocks, so the agent's actual text response (including JSON findings/verdicts from review archetypes) was never captured. Skeptic, verifier, and oracle always fell back to parsing markdown metadata, producing 100% parse failures.
fix: capture review archetype response text for parsing — Added response field to SessionOutcome and wired it through _extract_knowledge_and_findings() so review parsers receive the agent's actual output instead of a fallback transcript.

Features

feat: rewrite fix pipeline with triage/reviewer archetypes (spec 82) — Replaced skeptic/verifier in the fix pipeline with purpose-built triage and fix_reviewer archetypes. Triage produces structured acceptance criteria from GitHub issues; fix_reviewer verifies coder changes against those criteria with per-criterion PASS/FAIL verdicts. Includes retry loop with escalation ladder.
feat: add triage and fix_reviewer archetype registration and prompt templates
feat: add triage and fix-review data types and parse functions

Tests

Added unit, property, and integration smoke tests for the new fix pipeline (spec 82)

Other

Multiple type annotation and test fixture fixes
New specs: 82 (fix pipeline triage/reviewer), 83 (lint-spec coverage gaps)

View release on GitHub

v2.7.5 New feature 3mo

Notable features

Night‑shift issue‑first gate drains `af:fix` labeled issues before/after hunt scans with fail‑open semantics
Added `activity_callback`, `task_callback`, and `status_callback` to NightShiftEngine and FixPipeline with per‑archetype TaskEvent emission
Integrated ProgressDisplay for phase/idle status rendering in the Night‑shift CLI

Full changelog

What's Changed

Features

Night-shift issue-first gate: Issues with af:fix label are now drained before and after hunt scans, with fail-open semantics for platform API failures
Callback plumbing: Added activity_callback, task_callback, and status_callback to NightShiftEngine and FixPipeline, with per-archetype TaskEvent emission
Night-shift CLI display: Integrated ProgressDisplay with phase/idle status rendering

Improvements

Consolidated duplicated utilities and decoupled cross-module imports

Documentation

Added architecture documentation suite (spec authoring, planning, execution, night-shift)
Added coding harness analysis comparing agent-fox to Raschka's framework

Tests

Added integration smoke tests for night-shift wiring verification
Added unit tests for issue-first gate, callbacks, and display integration

View release on GitHub

v2.7.4 Bug fix 3mo

Fixed broken AI analysis import, isolated DuckDB tests, closed SDK response stream leaks.

Full changelog

Quality gate and type safety fixes

This release fixes issues surfaced by the night-shift daemon's quality gate scan.

Bug fixes

Broken import in quality gate AI analysis: quality_gate.py imported nonexistent get_client — fixed to use create_async_anthropic_client. AI-powered finding analysis now works instead of falling back to mechanical findings.
Test isolation for DuckDB: test_cost_limit_terminates hit the real knowledge database during parallel test execution, causing lock contention failures. Now properly mocks oracle context to avoid shared state.
SDK response stream leak: Close SDK response stream before client teardown to prevent ProcessError during async generator cleanup (#215).

Type safety improvements

Replaced object parameter types with PlatformProtocol in dedup.py and finding.py, removing stale type: ignore comments.
Added proper type casts in config_schema.py for nested model extraction.
Fixed classmethod decorator typing in config validator factory.
Added assert match is not None guards in prompt safety tests.
Typed _tgd test helpers and task tuple lists across 10 test files.

Lint fixes

Resolved all 8 ruff errors: sorted imports in resolver.py, added noqa: E402 for intentional late imports in tests/conftest.py, and auto-fixed import ordering in steering and graph test files.

View release on GitHub

v2.7.3 Breaking risk 3mo

Notable features

Spec 79: Hunt scan cross-iteration deduplication
Spec 80: Worktree cleanup hardening

Full changelog

Night-shift daemon fixes

This release fixes critical wiring gaps in the night-shift autonomous maintenance daemon that prevented it from operating correctly.

Bug fixes

Fix branch creation: _create_fix_branch() was defined but never called — archetype sessions ran on whatever branch was checked out instead of a dedicated fix branch. Issue closure now gated on successful harvest.
Scheduled re-polling: Issue checks and hunt scans only ran once at startup, then the daemon spin-looped idle. Now repeats at configured intervals (issue_check_interval, hunt_scan_interval).
Cost/session tracking: state.total_cost, state.total_sessions, and state.issues_created were never updated, making cost limits ineffective. Sessions now report token usage back to the engine for cost calculation.
Session limit enforcement: orchestrator.max_sessions was never checked in the night-shift engine (61-REQ-9.3 compliance).
Develop checkout restoration: After fix sessions, the repo stayed on the fix branch. Now restores develop after harvest so the next issue starts clean.

Other fixes

Resolved 8 pre-existing test failures across CLI override handling, review-only graph construction, and worktree hardening property tests.
Fixed flaky parallel test execution caused by deprecated asyncio.get_event_loop() usage.

Specs

Spec 79: Hunt scan cross-iteration deduplication
Spec 80: Worktree cleanup hardening

View release on GitHub

v2.7.2 Bug fix 3mo

Fixed flaky test failures by disabling hypothesis deadline.

Full changelog

What's Changed

fix: disable hypothesis deadline globally to eliminate flaky test failures
fix: disable hypothesis deadline on flaky property tests
chore: bump version to 2.7.2

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.7.1...v2.7.2

View release on GitHub

v2.7.1 Bug fix 3mo

Increased default session limits to prevent premature failures.

Full changelog

What's Changed

fix: correct staleness fallback when AI evaluation fails
fix: resolve four night-shift integration gaps (#226 #227 #228 #229)
fix: harvest branch and close issue after night-shift fix pipeline (#225)
fix: increase default session limits to prevent premature failures (#205)
fix: include enriched feature vector fields in StatisticalAssessor (#206)
docs+chore: integration gap analysis and strengthened af-spec template (#230)
style: auto-format code with ruff

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.7.0...v2.7.1

View release on GitHub

v2.7.0 New feature 3mo

Notable features

Implement watch loop core with `--watch` and `--watch-interval` flags, watch gate, stall detection (spec 70)
Add CachePolicy config and `cached_messages_create()` helper; migrate auxiliary modules to cached API (spec 77)
Make feature branches local‑only, push only develop (spec 78)

Full changelog

What's Changed

Features

feat(watch): implement watch loop core — --watch and --watch-interval CLI flags, watch gate, stall detection (spec 70)
feat(caching): add CachePolicy config and cached_messages_create() helper; migrate all auxiliary modules to cached API (spec 77)
feat(harvest): make feature branches local-only, push only develop (spec 78)
feat(fix): add FixProgressEvent/CheckEvent types, wire ProgressDisplay and callbacks (spec 76)
feat(engine): timeout-aware escalation with per-node retry logic (spec 75)
feat(engine): tolerant review parser with fuzzy wrapper key matching and field normalization (spec 74)
feat(engine): auto-reset blocked tasks on engine resume; clear attempt tracker on reset
feat(nightshift): AI critic for finding consolidation, batch triage, post-fix staleness check (spec 73)
feat(nightshift): reference parsing, dependency graph, and edge merging
feat(reporting): active tasks in status command (spec 72)
feat(platform): sort/direction params for list_issues_by_label
feat(ui): show agent archetype in spinner line
feat(config): timeout retry configuration fields in RoutingConfig

Fixes

fix(cli): enforce CLI separation by delegating to backing modules (fixes #210)
fix(barrier): run knowledge compaction during sync barriers (fixes #211)
fix(reporting): display agent archetype in status and standup output (fixes #216)
fix(retry): retry on network-level transport errors (fixes #208)
fix: handle SIGTERM gracefully and prune stale worktrees before branch deletion
fix(tests): reset agent_fox logger between tests to fix xdist flakiness
fix(tests): mock _setup_infrastructure in run_code tests to prevent MagicMock directory leak

Documentation

docs(watch): add --watch and --watch-interval to CLI reference
docs(caching): add [caching] section to configuration reference
docs(spec-78): update AGENTS.md for local-only feature branch workflow

Chores

Archived completed specs (59–76)
New specs: 77 (prompt caching), 78 (local-only feature branches)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.6.2...v2.7.0

View release on GitHub

v2.6.2 Maintenance 3mo

Minor fixes and improvements.

Full changelog

What's Changed

refactor: simplify engine, platform factory, and package re-exports
specs: add spec 74 (review parse resilience), spec 75 (timeout-aware escalation), spec 76 (fix progress display)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.6.1...v2.6.2

View release on GitHub

v2.6.1 New feature 3mo

Notable features

Simplified config template generation with visible sections and quality defaults (spec 68)
Round‑robin spec‑fair scheduling across specs (spec 69)
Watch mode added via watch_interval config field and WATCH_POLL audit event type (spec 70)

Full changelog

What's New in 2.6.1

Config simplification (spec 68): simplified config template generation with visible sections and quality defaults
Spec-fair scheduling (spec 69): round-robin scheduling across specs
Watch mode (spec 70): watch_interval config field and WATCH_POLL audit event type
Fix ordering (spec 71): spec and task ordering improvements
Status command: show active agents in status output
Develop sync fix: use update-ref instead of branch -f to avoid failures when develop is checked out in a worktree

View release on GitHub

v2.6.0 Breaking 3mo

Breaking changes

Removal of the Coordinator (spec 62) breaks existing workflows.

Notable features

Night-shift engine, CLI command, and audit events (spec 61)
Plan always-rebuild behavior (spec 63)
CLI separation and logging improvements (spec 59)

Full changelog

Release 2.6.0

Highlights:

Night-shift engine, CLI command, and audit events (spec 61)
Coordinator removal (spec 62)
Plan always-rebuild (spec 63)
CLI separation and logging improvements (spec 59)
End-of-run discovery (spec 60)
Steering document spec (spec 64)
Various archived specs (52–58) moved to archive

View release on GitHub

v2.5.2 Bug fix 3mo

Fixed session failures caused by identical main and fallback models and corrected CLI flag naming for extra_args.

Full changelog

What's Changed

Bug Fixes

fix(engine): skip fallback model when it equals the main model — The default fallback model (claude-sonnet-4-6) is the same as the STANDARD tier model used for coder sessions. The Claude CLI rejects --fallback-model when it matches the primary model, causing sessions to fail with "Fallback model cannot be the same as the main model." Now the fallback is omitted when it equals the session's model ID.
fix(session): use hyphenated CLI flag names for Claude SDK extra_args — --max_budget_usd and --fallback_model were passed with underscores instead of hyphens, causing "unknown option" errors when budget or fallback model were configured.

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.5.1...v2.5.2

View release on GitHub

v2.5.1 Bug fix 3mo

Fixed session CLI flag naming to use hyphens instead of underscores, correcting unknown option errors.

Full changelog

What's Changed

Bug Fixes

fix(session): use hyphenated CLI flag names for Claude SDK extra_args — --max_budget_usd and --fallback_model were passed with underscores instead of hyphens, causing "unknown option" errors when budget or fallback model were configured.
fix(engine): reset sub-task checkboxes during hard reset (fixes #163)
fix(workspace,platform): sanitize error messages to prevent path and API detail leakage (fixes #192)
fix(knowledge): escape markdown special characters in rendered output (fixes #193)
fix(cli): restrict config file and directory permissions to owner-only (fixes #191)
fix(core): validate LLM JSON responses with field-level constraints (fixes #186)
fix(knowledge): add SQL table allowlist for query safety (fixes #188)
fix(workspace): validate git ref names to prevent command injection (fixes #189)
fix(session): validate and cap review parser output sizes (fixes #187)
fix(core): add prompt content sanitization for injection defense (fixes #190)
fix(cli): add Claude CLI version and settings validation (fixes #185)

Refactoring

refactor: consolidate duplicated utilities and review parser logic — extracted shared audit helper, decomposed engine.py god-class, extracted SDK parameter resolution, inlined single-consumer blocking.py, split knowledge/query.py into focused modules. Net reduction of ~736 lines.

New Specs

Spec 59: CLI Separation and Logging Improvements
Spec 60: End-of-Run Discovery

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.5.0...v2.5.1

View release on GitHub

v2.5.0 Breaking risk 3mo

Breaking changes

Removed multi-backend abstraction; system now Claude‑only (spec 55).
Deleted entire `tools/` package including fox tools and MCP server.

Notable features

Knowledge feedback loop with automated extraction, causal context, fallback inputs, and embedding generation (spec 52)
Review persistence: structured parsing and DB storage of review findings across skeptic, verifier, oracle archetypes; retry context injection for coder retries; review‑only CLI mode (spec 53)
Quality gate complexity assessment with feature vector enrichment and heuristic evaluation (spec 54)

Full changelog

What's New in v2.5.0

New Features

Knowledge feedback loop (spec 52) — automated knowledge extraction with causal context, fallback inputs, and embedding generation
Review persistence (spec 53) — structured parsing and DB storage of review findings from skeptic, verifier, and oracle archetypes; retry context injection for coder retries; review-only CLI mode
Quality gate complexity assessment (spec 54) — quality gate execution with feature vector enrichment and heuristic assessment
SDK feature adoption (spec 56) — max_turns, max_budget_usd, fallback_model, and thinking configuration with hierarchical defaults and archetype overrides
Archetype model tiers (spec 57) — per-archetype default model tiers (ADVANCED for review archetypes, STANDARD for coder) with config overrides and ADVANCED escalation ceiling
Predecessor escalation (spec 58) — escalation ladder awareness in retry logic; predecessors only block after exhausting all ladder levels

Architecture

Claude-only commitment (spec 55) — removed multi-backend abstraction; simplified to Claude-exclusive backend with ADR documentation
Removed fox tools and MCP server — deleted the entire tools/ package (server, registry, edit, read, search, outline) in favor of Claude Code's native tooling
Code simplification — eliminated backward-compat shims, consolidated duplicated review parsing logic, extracted AssessmentManager and sync barrier sequence from Orchestrator, split long methods into focused units

Bug Fixes

Fixed missing severity normalization in engine review parser (accepted invalid severity values)
Fixed redundant archetype resolution in launch preparation

View release on GitHub

v2.4.6 Security relevant 3mo

Security fixes

Block shell metacharacters in command allowlist (fixes #178)
Harden spec name, improve error redaction and migration handling (fixes #179)

Full changelog

What's Changed

fix(security): block shell metacharacters in command allowlist (fixes #178) by @mickume in https://github.com/agent-fox-dev/agent-fox/pull/180
fix(security): harden spec name, merge lock, error redaction, migration dim (fixes #179) by @mickume in https://github.com/agent-fox-dev/agent-fox/pull/181

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.4.5...v2.4.6

View release on GitHub

v2.4.5 Breaking risk 4mo

Breaking changes

Removed functions: resolve_tier_ceiling, ensure_blocking_tables, create_circuit_breaker_event and unused tools/_file_io.py re-export shim.
Circular dependency between cli and knowledge broken by moving path constants to core/paths.

Full changelog

What's Changed

Refactoring & Simplification

Remove dead code: resolve_tier_ceiling, ensure_blocking_tables, create_circuit_breaker_event, unused tools/_file_io.py re-export shim
Consolidate duplicated insert/query logic in review_store.py via shared helpers (_insert_with_supersession, _query_active)
Consolidate three identical missing-section fixers via _append_missing_section
Consolidate hard_reset_all/hard_reset_task shared logic via _perform_hard_reset
Replace archetype if/elif dispatch with dict lookup in session_lifecycle.py
Unify coverage matrix and traceability table validators via _check_section_with_table
Extract _count_node_status helper in orchestrator engine
Break cli ↔ knowledge circular dependency by moving path constants to core/paths

Sync Barrier Hardening (spec 51)

Worktree verification and orphan detection
Bidirectional develop branch sync with merge lock
Hot-load gate pipeline (tracking → completeness → linting)
Parallel drain before barrier entry
Comprehensive test coverage (unit, property, integration)

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.4.4...v2.4.5

View release on GitHub

v2.4.4 Bug fix 4mo

Improved blocked task error messages with clearer root‑cause details.

Full changelog

What's Changed

Code Simplification & Refactoring

Centralized node ID parsing: New core/node_id.py module with parse_node_id() and spec_name_of(), replacing 8 scattered node_id.split(":") patterns across the codebase
Consolidated path constants: Expanded cli/paths.py with PLAN_PATH, STATE_PATH, MEMORY_PATH, AUDIT_DIR — replacing inline path construction
Merged tool utilities: Combined tools/hashing.py + tools/_file_io.py into tools/_utils.py
Merged routing data+persistence: Combined routing/types.py + routing/storage.py into routing/core.py
Extracted blocking logic: New engine/blocking.py (145 lines) — skeptic/oracle blocking evaluation is now a pure function with BlockDecision return type, reducing engine.py from 1622 to 1522 lines

Bug Fixes (from v2.4.3)

Fixed log/spinner interference with LiveAwareHandler
Suppressed third-party warnings during orchestrator runs
Improved blocked task error messages with root cause

View release on GitHub

v2.4.3 Bug fix 4mo

Fixed blocked task errors to include the root cause of the last failed attempt.

Full changelog

What's Changed

Fix log/spinner interference: Log messages now route through Rich's Live console when the progress spinner is active, preventing corrupted display output
Suppress third-party warnings: HF Hub and sentence-transformers warnings no longer leak into the spinner display
Better blocked task messages: Blocked task errors now include the root cause from the last failed attempt
Status report accuracy: Fixed status filtering for archetype nodes with state overlay

View release on GitHub

v2.4.2 New feature 4mo

Notable features

`reset --spec <spec>`: resets all tasks for a single spec to pending, cleans worktrees/branches, and synchronizes `tasks.md` and `plan.json` without rolling back git or compacting knowledge

Full changelog

New Features

reset --spec <spec_name> — Spec-scoped reset command that resets all tasks (coder + archetype nodes) belonging to a single spec to pending, cleans worktrees/branches, and synchronizes tasks.md and plan.json. No git rollback or knowledge compaction — safe for re-executing one spec without affecting others. Mutually exclusive with --hard and positional <task_id>.

Bug Fixes (from v2.4.1)

plan/status: exclude archetype nodes from task counts — Injected archetype nodes were inflating totals; now only coder nodes are counted with review nodes shown separately.
status: honour tasks.md checkbox state — Status now seeds from graph statuses (reflecting [x] checkboxes) before overlaying orchestrator state.
plan: propagate completion to archetype nodes — Archetype nodes for completed specs no longer appear in the execution order.
git: prevent infinite hang on expired PAT — run_git() now sets GIT_TERMINAL_PROMPT=0 and enforces timeouts (60s/120s) to prevent credential prompt hangs.

View release on GitHub

v2.4.1 Bug fix 4mo

Fixed task count inflation by excluding archetype nodes and honored manual checkbox completions in status.

Full changelog

Bug Fixes

plan/status: exclude archetype nodes from task counts — Injected archetype nodes (skeptic, oracle, verifier, auditor) were counted alongside real task groups, inflating totals and making progress appear lower than actual. Plan and status now report only coder nodes in task counts, with review nodes shown separately.
status: honour tasks.md checkbox state — When state.jsonl existed, the status command ignored tasks.md [x] checkboxes for manually completed work. Now seeds from graph statuses first (reflecting checkboxes), then overlays orchestrator state.
plan: propagate completion to archetype nodes — Archetype nodes were always shown as pending in the execution order, even when all coder tasks in their spec were completed. Now marks them as completed when all coder nodes in their spec are done.
git: prevent infinite hang on expired PAT — run_git() had no timeout and did not suppress interactive credential prompts. When a PAT expired, git commands would hang indefinitely. Now sets GIT_TERMINAL_PROMPT=0 and enforces timeouts (60s default, 120s for remote operations).

View release on GitHub

v2.4.0 Bug fix 4mo

Fixed single-class calibration crash and suppressed HuggingFace Hub auth warning.

Full changelog

What's Changed

Fixes

Suppress HuggingFace Hub auth warning and fix single-class calibration crash

Documentation

Slim down root README to a concise project hook with install and quick start
Move detailed documentation (archetypes, model routing, fox tools, spec-driven development) into docs/
Add docs/README.md as the central documentation index

Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.3.3...v2.4.0

View release on GitHub

v2.3.3 New feature 4mo

⚠ Upgrade required

Add `orchestrator.max_blocked_fraction` to TOML configuration to enable early stop on high block rates.
Review and adjust `archetypes.skeptic_settings.block_threshold` and `archetypes.oracle_settings.block_threshold` as needed for desired blocking behavior.

Notable features

Skeptic & Oracle archetypes enforce blocking when critical findings exceed configurable `block_threshold` (default Skeptic = 3, Oracle advisory-only if omitted).
New orchestrator config `max_blocked_fraction` stops runs early if blocked nodes reach the set fraction (e.g., 0.4 for 40%).

Full changelog

What's New

Skeptic & Oracle Blocking Enforcement

Skeptic and oracle archetypes now enforce blocking decisions in the engine. When a skeptic or oracle session completes, the engine queries its persisted review findings, counts critical findings against the configured block_threshold, and cascade-blocks the downstream coder task (and all its dependents) when the threshold is exceeded.

Skeptic: blocks when critical findings exceed archetypes.skeptic_settings.block_threshold (default: 3)
Oracle: blocks when critical drift findings exceed archetypes.oracle_settings.block_threshold; remains advisory-only when threshold is omitted (the default)
Blocking decisions are recorded to the blocking_history table for threshold learning

Block Budget

New orchestrator.max_blocked_fraction config option (default: disabled). When set, the engine stops the run early if the fraction of blocked nodes reaches the configured threshold, preventing wasted cost on doomed sessions when a systemic failure blocks a significant portion of the task graph.

[orchestrator]
max_blocked_fraction = 0.4  # stop if 40%+ of tasks are blocked

Other Changes

refactor: decompose engine.py and workspace.py god-modules into focused submodules (circuit.py, serial.py, injection.py, develop.py, git.py, worktree.py)
refactor: replace dict-based plan data with typed TaskGraph in engine
feat: add retry with backoff to all Anthropic API calls
fix: remove broken audit and serve-tools CLI commands (#174)
fix: update dump_knowledge.py to discover all DuckDB tables dynamically
refactor: consolidate duplicated API usage tracking and findings rendering
refactor: remove redundant DuckDB writes and double-sort

View release on GitHub

v2.3.2 Bug fix 4mo

Notable features

Introduced read_all_facts() with a 3-tier fallback strategy for resilient fact reading

Full changelog

What's Changed

Bug Fixes

Resilient fact reading with automatic fallback — Introduced read_all_facts() with a 3-tier fallback strategy (provided connection → read-only DuckDB → JSONL file), so reading facts always works regardless of DB availability.
Fixed empty docs/memory.md — render_summary() was called without a DuckDB connection in the engine, producing an empty file every time. Now uses the fallback pipeline and receives the active connection from the Orchestrator.
Simplified status command — Replaced manual DuckDB/JSONL fallback in agent-fox status with the unified read_all_facts() function.

View release on GitHub

v2.3.1 Bug fix 4mo

Fixed progress spinner to reflect operation type and removed untracked files blocking harvest merges.

Full changelog

Fixes

Progress spinner: Extract tool-use blocks from SDK AssistantMessage content so the spinner shows "Reading…", "Editing…", etc. instead of always "Thinking…"
Harvest merge: Remove untracked files that would block fast-forward merge during harvest

View release on GitHub

v2.3.0 New feature 4mo

Notable features

New `auditor` archetype (spec 46) validates test code against `test_spec.md` contracts, checking coverage, assertion strength, edge‑case rigor and independence; opt‑in via `[archetypes] auditor = true`.
`agent-fox init` scaffolds Claude Code skill files alongside project configuration (spec 47).

Full changelog

What's New

Test Auditor Archetype (spec 46)

New auditor archetype that validates test code against test_spec.md contracts before implementation begins
Checks coverage, assertion strength, precondition fidelity, edge case rigor, and test independence
auto_mid injection mode (after test-writing group, before implementation)
Conservative convergence (union semantics — worst verdict wins)
Retry-predecessor with configurable circuit breaker
Disabled by default, opt-in via [archetypes] auditor = true

Init Skills (spec 47)

agent-fox init now scaffolds Claude Code skill files alongside project config

Token Counting & Cache Pricing Fix

Fixed cache token tracking: cache_read_input_tokens and cache_creation_input_tokens now flow through the full pipeline (SDK → ResultMessage → SessionOutcome → audit events → status report)
Fixed audit event payload storing combined tokens instead of separate input_tokens/output_tokens
Fixed build_status_report_from_audit always reporting output_tokens = 0
Added cache pricing to ModelPricing (cache read at 10%, cache creation at 125% of input price)

Other Changes

Merge lock and agent fallback for harvest/workspace operations
Merge agent for AI-based conflict resolution
Various test and infrastructure improvements

View release on GitHub

v2.2.2 Bug fix 4mo

Fixed harvest checkout failure when untracked runtime files existed.

Full changelog

Bug Fix

Fix harvest checkout failure with untracked files: When agent-fox runtime files (.agent-fox/config.toml, .agent-fox/state.jsonl, .claude/settings.local.json, docs/memory.md) existed as untracked files in the working directory but were also tracked on the develop branch, git checkout develop during harvest would fail, blocking all subsequent tasks in the same spec. Fixed by using force checkout in the harvest step, which is safe because all coding work happens in an isolated worktree.

View release on GitHub

v2.2.1 Bug fix 4mo

Notable features

Structured finding persistence for skeptic/verifier/oracle sessions

Full changelog

What's Changed

Bug Fixes

fix: align __version__ with 2.2.0 — runtime version was still reporting 2.1.2 after the 2.2.0 release
fix: use numeric confidence in DuckDB ingestion — two INSERT statements used string 'high' for the confidence column (migrated to DOUBLE in v5), causing ConversionException during background knowledge ingestion
fix: resolve model tier to model ID for pricing lookups — NodeSessionRunner._resolved_model_id stored tier names (e.g. "ADVANCED") instead of model IDs (e.g. "claude-opus-4-6"), causing pricing config misses and zero-cost estimates

Features

feat: wire structured finding persistence for skeptic/verifier/oracle — the review parsers and DB insert functions existed but were never called from the session lifecycle; skeptic, verifier, and oracle sessions now persist their structured JSON output (findings, verdicts, drift reports) to DuckDB, enabling downstream context rendering for coders and blocking/convergence logic

Internal

Version bump to 2.2.1

View release on GitHub

v2.2.0 Breaking risk 4mo

Breaking changes

DuckDB is now a hard requirement; `open_knowledge_store()` raises RuntimeError instead of returning None.
Removed all Optional connection parameters from session lifecycle, knowledge harvest, memory store, context assembly, and routing.

Full changelog

What's New in v2.2.0

Predictive Planning & Knowledge (Spec 39)

Duration-based task ordering — ready tasks sorted by predicted duration (longest first) to minimize wall-clock time, with regression model, historical median, and configurable presets as fallback chain
Causal graph + review findings — review/drift/verification findings integrated into causal traversal for richer downstream context
Confidence-aware fact selection — facts below a configurable confidence threshold are excluded from session context
Pre-computed ranked facts — fact rankings cached at plan time for faster context assembly
Cross-group finding propagation — critical findings from earlier task groups visible to downstream groups under "Prior Group Findings"
Project model — aggregate spec outcomes, module stability scores, and archetype effectiveness via agent-fox status --model
Critical path forecasting — identifies the longest-duration path through the task graph with tied-path detection
File conflict detection — predicts file overlaps between parallel tasks and serializes conflicting pairs (opt-in)
Learned blocking thresholds — adapts skeptic/oracle block thresholds from historical precision (opt-in)

Confidence Normalization (Spec 37)

Unified confidence representation as float [0.0, 1.0] across memory, knowledge, and routing
parse_confidence() function handles string enum → float conversion with canonical mapping
DuckDB migration v5: TEXT → DOUBLE for confidence columns
JSONL backward compatibility preserved

DuckDB Hardening (Spec 38)

DuckDB is now a hard requirement — open_knowledge_store() raises RuntimeError instead of returning None
Removed all Optional connection parameters from session lifecycle, knowledge harvest, memory store, context assembly, and routing
DuckDB errors propagate instead of being silently swallowed
Added knowledge_conn / knowledge_db test fixtures for isolated in-memory DuckDB

Other Changes

Hard reset (Spec 35) — agent-fox reset --hard with commit SHA tracking
Config generation (Spec 33) — agent-fox init generates config.toml from schema
Token tracking (Spec 34) — per-archetype and per-spec cost breakdowns in status
Oracle archetype (Spec 32) — drift detection agent with blocking logic
Prompt rewrites — oracle, librarian, cartographer, coordinator prompts rewritten to gold standard pattern
AGENTS.md rewrite — project-specific conventions documented
Harvest reconciliation (Spec 36) — post-harvest develop branch reconciliation

View release on GitHub

All releases

What's Changed

Features

Bug Fixes

Tests

What's Changed

Refactoring

Bug Fixes

What's Changed

Bug Fixes

What's Changed

Bug Fixes

Documentation

Knowledge Retrieval Fixes (Spec 120)

Bug Fixes

Stats

What's Changed

Features

Fixes

Refactoring

Stats

What's Changed

Features

Fixes

What's Changed

Bug Fixes

Docs

What's Changed

Bug Fixes

Features

Documentation

Chores

What's Changed

Fixes

Chores

What's New

Features

What's New

Features

Bug Fixes

Refactoring

What's Changed

Features

Refactors

Chores

What's Changed

Bug Fixes

Features

Other

What's Changed

Bug Fixes

Features

Other

Bug Fixes

Recovery for affected runs

Bug Fixes

What's New

Sleep-Time Compute (Spec 112)

Full Changelog

What's Changed

Bug Fixes

Other

What's Changed

What's Changed

Features

Performance

Bug Fixes

Refactoring

Bug Fixes

Other

What's Changed

Bug Fixes

Improvements

agent-fox v3.0.0

Highlights

Archetype Consolidation

Adaptive Knowledge Retrieval

Agent Profiles

DuckDB Plan Persistence

Configurable Spec Root