Release history
Agent-fox releases
All releases
60 shown
- _push_with_retry mechanism with error classification, audit events, and lock reentrancy for develop sync
- Atomic push integrated into harvest and session lifecycle
Full changelog
What's Changed
Features
- Atomic push with retry — new
_push_with_retrymechanism with error classification, audit events, and lock reentrancy for develop sync (#121) - Wired atomic push into harvest and session lifecycle (#121)
Bug Fixes
- Findings reporting: replaced legacy archetype labels with current names (#591)
- Insights command: added
--dismissflag for manual finding invalidation (#592) - Config: fixed
retries_before_escalationconfig path and added deprecation warning (#589) - Fixed double push in fix_pipeline harvest flow (#121)
Tests
- Added failing spec tests for atomic push with retry (#121)
- Removed `fix` command from CLI
- Removed `--output` flag from `standup` command
- Deprecated `[models]` section; moved `fallback_model` to `[routing]` and removed obsolete config options
- Added transport‑level DNS re‑validation in nightshift to close SSRF TOCTOU (#580)
- Sanitized exception content in fix session failure comment (#583)
- Added path containment check before file deletion (#579)
Full changelog
What's Changed
Refactoring
- engine: Consolidate 6 parallel tracking dicts in
SessionResultHandlerinto a single_NodeRetryStatedataclass - engine: Inline
assessment.py(single-consumer module) intoengine.py - Remove dead code:
_estimate_tokens,_table_exists,_column_exists - Add
run_git_sync()toworkspace/git.pyand migrate sync callers - Extract
_resolve_github_remote()helper innightshift/platform_factory.py - Deduplicate comment formatting in
nightshift/fix_pipeline.py
Bug Fixes
- nightshift: Add transport-level DNS re-validation to close SSRF TOCTOU (#580)
- nightshift: Sanitize exception in fix session failure comment (#583)
- nightshift: Add path containment check before file deletion (#579)
- nightshift: Reject path traversal in archetype/mode/name parameters (#585)
- nightshift: Add symlink checks to
profiles.pyandanalyzer.py(#586) - nightshift: Replace
shutil.rmtreewith_safe_rmtreeto avoid symlink traversal (#587) - nightshift: Reject reserved, multicast, and unspecified addresses in SSRF check (#581)
- nightshift: Deprecate
[models]section, movefallback_modelto[routing](#577) - nightshift: Remove fix command from CLI (#575)
- nightshift: Remove
--outputflag from standup command (#573) - nightshift: Remove obsolete config options and dead code (#574)
- nightshift: Apply ruff format to
barrier.pyandtest_barrier.py(#588)
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.6...v3.6.0
Fixed nightshift audit‑review blockage by filtering future‑group findings.
Full changelog
What's Changed
Bug Fixes
- nightshift: Filter deferred-to-future-group findings from audit-review blocking (#572)
- nightshift: Collapse ternary expression to satisfy ruff format (#571)
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.5...v3.5.6
Audit-reviewer now grades test design quality instead of execution results, fixing misclassification.
Full changelog
What's Changed
Bug Fixes
-
Wire
audit_max_retriesinto audit-review retry logic (#567, #569)ReviewerConfig.audit_max_retrieswas defined but never read by the retry logic. Added a dedicated per-coder-node counter (_audit_retry_counts) so audit-review retries are tracked independently of the genericEscalationLadder, preventing infinite retry loops.
-
Audit-reviewer grades test design quality, not execution results (#568, #570)
- The audit-review profile was conflating test pass/fail status with test design quality, marking well-designed tests as
WEAKwhen they failed due to unimplemented upstream specs. Updated the template to grade design quality only — a correctly designed test that cannot pass yet isPASS, notWEAK. Added anti-pattern examples and multi-spec dependency guidance.
- The audit-review profile was conflating test pass/fail status with test design quality, marking well-designed tests as
Documentation
- Updated CLI and config reference to match current codebase
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.4...v3.5.5
Fixed four bugs that prevented stored knowledge from being retrieved in downstream sessions.
Full changelog
Knowledge Retrieval Fixes (Spec 120)
Fixes four bugs in the knowledge system's read side that caused stored knowledge to never reach downstream sessions.
Bug Fixes
-
Wire
run_idto FoxKnowledgeProvider —_run_idwas initialized toNoneand never set by the engine, causing_query_same_spec_summaries()and_query_cross_spec_summaries()to always return empty. Session summaries were stored but never retrieved. Everyfox_providerlog line showed0 context + 0 cross-spec items. -
Elevate pre-review findings to tracked context — Group 0 (skeptic pre-review) findings were served as untracked
[CROSS-GROUP]items instead of tracked[REVIEW]items. They are now included in primary review results, tracked infinding_injections, and properly superseded on session completion. -
All-archetype summary storage — Only coder sessions produced summaries. Reviewer and verifier sessions now generate structured summaries (finding counts, pass/fail ratios) that are stored and served to downstream sessions.
-
Cross-run finding carry-forward — Active findings orphaned by stalled runs are now surfaced as
[PRIOR-RUN]context items at the start of a new run, capped at 5 per spec.
Stats
- 4658 tests passing
- 26 files changed, +3944 / -53 lines
- Git stack hardening: workspace health checks, force-clean option, non-retryable error classification, pre-session guards, run lifecycle management, idempotent cascade blocking, and develop sync audit trail
Full changelog
What's Changed
Features
- Session summary storage (spec 119): Session summaries are now stored, retrieved, and integrated into the knowledge provider and lifecycle audit events
- Git stack hardening (spec 118): Workspace health checks, force-clean option, non-retryable error classification, pre-session workspace guards, run lifecycle management, idempotent cascade blocking, and develop sync audit trail
Fixes
- Night shift: Raised
stale_timeoutdefault to 3600s and added heartbeat (#561)
Refactoring
- Eliminated single-file
routing/andsecurity/packages — moved tocore/ - Merged
knowledge/provider.pyprotocol intoknowledge/fox_provider.py
Stats
- 4621 tests passing
- 97 files changed across the release
- Cross‑group knowledge retrieval with [CROSS-GROUP] prefix, limited by max_cross_group_items (default 3), ranked by relevance and excluded from injection tracking
Full changelog
What's Changed
Features
- #559: Add cross-group knowledge retrieval — sessions now see findings and FAIL verdicts from other task groups in the same spec via
[CROSS-GROUP]prefix, capped atmax_cross_group_items(default 3), ranked by keyword relevance, and excluded from injection tracking
Fixes
__init__.py: Sync__version__withpyproject.toml(was stuck at 3.5.0)
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.1...v3.5.2
Fixed knowledge retrieval to surface FAIL verdicts from verification results.
Full changelog
What's Changed
Bug Fixes
- #553: Collapse list comprehension to single line for ruff format
- #554: Gate audit-review on active findings to trigger coder retry
- #555: Surface FAIL verdicts from verification_results in knowledge retrieval
- #556: Filter knowledge findings by task_group to avoid redundant injection
- #557: Rank findings and verdicts by task_description keyword overlap
- #558: Track injected findings per session to prevent re-injection after successful completion
Docs
- Rewrite architecture.md to reflect current codebase
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.5.0...v3.5.1
Routine maintenance release for Agent-fox.
Changelog
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.4.1...v3.5.0
- ADR ingestion pipeline (MADR parser, validator, DuckDB migration v22 for adr_entries table)
Full changelog
What's Changed
Bug Fixes
- fix(#547): Add errata markdown-to-DuckDB indexing path — errata files in
docs/errata/are now indexed into the DuckDB errata table, closing the write-only gap where errata were created but never retrievable - fix(#548): Fix audit-review task_group partitioning causing supersession silos — audit findings no longer use a hardcoded empty
task_group, enabling proper supersession across review modes - fix(#549): Move
steering.mdfrom.agent-fox/specs/to.agent-fox/ - fix(#546): Fix ruff format violations in harvest warning strings
- fix(#545): Serialize
AuditJsonlSinkwrites withthreading.Lockto prevent interleaved concurrent appends
Features
- feat: ADR ingestion pipeline (spec 117) — MADR parser, validator, DuckDB migration v22 for
adr_entriestable, and integration intoFoxKnowledgeProviderfor retrieval during coder sessions
Documentation
- docs: ADR 07 — Define audit JSONL event format (envelope schema + complete event type catalog)
- docs: Code quality audit (specs 7–9)
- docs: Parking service 3.4.0 audit
Chores
- Bump version to 3.4.1
- Renamed CLI command `findings` → `insights`
Full changelog
What's Changed
Fixes
- #543: Drop dead knowledge system columns
retrieval_summaryandcoverage_data - #542: Fix ruff format violation in warning string
- #541: Fix ruff format violation in list comprehension
- #539: Add quick-triage bail-out to coder prompt
- #537: Rename CLI command
findingstoinsights - #536: Add AC-4 test and fix ruff format violation
- #534: Add AC-3 integration test for verifier dispatch without phantom task group
Chores
- Upgrade dependency version pins (pydantic, rich, duckdb, sentence-transformers, scikit-learn, pathspec, tree-sitter, pytest, ruff, mypy, and more)
- Update auto-generated errata dates
- Test coverage regression gate blocks tasks on decreased per-file line coverage
- Multi-language coverage tool detection: pytest-cov (Python), cargo-tarpaulin (Rust), go test -cover (Go)
- Coverage data stored in session outcomes for trend tracking (migration v20)
Full changelog
What's New
Features
- Test coverage regression gate — measures per-file line coverage before and after coder sessions; blocks the task if coverage decreases on modified files (#520)
- Multi-language coverage tool detection: pytest-cov (Python), cargo-tarpaulin (Rust), go test -cover (Go)
- Coverage data stored in session outcomes for trend tracking (migration v20)
- Blocking findings emitted via review_findings table on regression
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.3.0...v3.3.1
- Structured verification checklist for spec compliance (#521)
- State transition validation in GraphSync to catch illegal graph moves (#523)
- Eager pre-review with retry on predecessor failure restored (#519)
Full changelog
What's New
Features
- Verification checklist & task completion enforcement — structured verification checklist for spec compliance (#521)
- State transition validation in GraphSync — validates engine state transitions to catch illegal graph moves (#523)
- Eager pre-review with retry-predecessor — restores eager pre-review behavior with retry on predecessor failure (#519)
- Lightweight errata generation from blocking — reinstates errata generation when issues are blocked (#522)
- Knowledge system pruning — migration v18 removes causal links and dead knowledge modules (spec 116)
Bug Fixes
- Fix
max_itemsin property test to avoid retrieval cap masking failures - Use Path-typed
specs_pathvariable in plan_cmd (#516) - Fix ruff format violation in RuntimeError f-string (#515)
- Add proper type annotations for embedder and backend variables (#514)
Refactoring
- Extract strategy classes from engine, fix_pipeline, and result_handler (#518)
- Inline single-consumer modules and deduplicate review parser
- Remove dead code and consolidate single-consumer modules (2 passes)
- Remove dead code and consolidate near-identical abstractions
- Delete dead knowledge modules (blocking_history, errata_store, gotcha_extraction, gotcha_store) and simplify provider
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.2.0...v3.3.0
- Removed `KnowledgeProvider` protocol decoupling of legacy knowledge pipeline modules
- Removed obsolete knowledge pipeline configuration options
- Removed onboard CLI command and legacy nightshift streams
- Decouple knowledge subsystem via `KnowledgeProvider` protocol (spec 114)
- Pluggable knowledge provider with gotcha extraction, errata store, and content hashing (spec 115)
- Wire `FoxKnowledgeProvider` into engine startup
Full changelog
What's Changed
Features
- knowledge: Decouple knowledge subsystem via
KnowledgeProviderprotocol (spec 114) - knowledge: Pluggable knowledge provider with gotcha extraction, errata store, and content hashing (spec 115)
- engine: Wire
FoxKnowledgeProviderinto engine startup
Refactors
- knowledge: Delete 40+ legacy knowledge pipeline modules (lang analyzers, retrieval, consolidation, embeddings, etc.)
- config: Remove obsolete knowledge pipeline configuration options
- cli: Remove onboard command and legacy nightshift streams
Chores
- Supersede specs 112 (sleep time compute) and 113 (knowledge effectiveness)
- Fix Unicode edge case in content hash determinism property test
- Clean up leftover
__pycache__directories in deleted knowledge subdirectories
- Pre-flight check to skip coder sessions when work is already done
Full changelog
What's Changed
Bug Fixes
- engine: Close AsyncAnthropic clients to prevent event loop shutdown crash (fixes #506)
- engine: Skip redundant cleanup ingestion when barrier already ran (fixes #505)
- knowledge: Always write agent trace JSONL for transcript reconstruction (fixes #507)
- Guard trace reconstruction behind debug flag to suppress spurious warning
Features
- engine: Add pre-flight check to skip coder sessions when work is done (fixes #511)
Other
- New specs 114 (knowledge decoupling), 115 (pluggable knowledge)
- Coding-session architecture documentation
- General cleanup
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.3...v3.1.4
- Knowledge system effectiveness improvements (spec 113) covering transcript reconstruction, compaction, entity signals, cold-start handling, git extraction, audit consumption, retrieval validation, and prompt injection
Full changelog
What's Changed
Bug Fixes
- Budget exhaustion detection: Sessions that hit the SDK
max-budget-usdlimit are now detected and blocked immediately instead of being wastefully retried. The SDK returnsis_error=Truewith no message on budget exhaustion — previously mapped to "Unknown error" and retried through the escalation ladder. - AssessmentManager config: Pass
full_config(notOrchestratorConfig) toAssessmentManager, fixing missing attribute errors. - Escalation ladder starting tier: The escalation ladder now respects
config.models.codingfor the starting tier instead of always defaulting to STANDARD. - Timed-out session metrics: Emit descriptive error messages and metrics for sessions that time out.
Features
- Knowledge system effectiveness (spec 113): Transcript reconstruction, compaction improvements, entity signal activation, cold-start handling, git extraction, audit consumption, retrieval quality validation, and audit prompt injection.
Other
- Parking service audit report
- Session budget increased for lengthy tasks
- If a run is stuck with audit-review tasks blocked by "Retry limit exceeded", clear the stale state using `agent-fox reset --spec`.
Full changelog
Bug Fixes
-
engine: Move review concurrency cap before
_prepare_launchto prevent phantom retry exhaustion (fixes #503)The review concurrency cap in
_fill_parallel_poolwas checked after_prepare_launch(), which increments the attempt tracker on "allowed" verdicts. When the single review slot was occupied, audit-review tasks were skipped but their attempt counter was already incremented. Aftermax_retries + 1(default 3) such pool-refill cycles, the circuit breaker permanently blocked the task with "Retry limit exceeded" — without ever starting a session. This cascade-blocked all downstream coding and verifier tasks, exceeding the block budget and halting the entire run.
Recovery for affected runs
If you have a stuck run with audit-review tasks blocked by "Retry limit exceeded", clear the stale state:
agent-fox reset --spec <affected_spec_name>
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.1...v3.1.2
Fixed death-loop caused by stale session‑scoped DB rows after reset.
Full changelog
Bug Fixes
-
reset: clear session-scoped tables on reset to prevent
block_limitdeath-loop (#501)After a
block_limitrun,reset --hard(and softreset) left stale data in six session-scoped DB tables (runs,session_outcomes,review_findings,verification_results,drift_findings,blocking_history). The staleruns.status='block_limit'causedload_state_from_db()to load a terminal status, making the engine loop exit immediately on every subsequentagent-fox codeinvocation — a self-perpetuating death-loop with no CLI recovery path.All four reset paths (
reset_all,reset_task,reset_spec,hard_reset_all/hard_reset_task) now clear session-scoped tables so thatplanandcodestart from a clean state.
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.1.0...v3.1.1
- Core protocol, orchestrator, and configuration schema for sleep-time tasks
- ContextRewriter task that rewrites and enriches knowledge context during idle periods
- BundleBuilder task that consolidates knowledge into bundles
Full changelog
What's New
Sleep-Time Compute (Spec 112)
A new knowledge-processing pipeline that runs background computation during idle periods:
- Core protocol & orchestrator — schema, configuration, and orchestration layer for sleep-time tasks
- ContextRewriter — sleep task that rewrites and enriches knowledge context
- BundleBuilder — sleep task that builds consolidated knowledge bundles
- Retriever & integration wiring — retrieval layer with full integration into the existing knowledge system
- Wiring verification — end-to-end verification of the sleep-time compute pipeline
Full Changelog
feat(112): implement core protocol, orchestrator, config, and schemafeat(112): implement ContextRewriter sleep taskfeat(112): implement retriever and integration wiringtest(112): failing spec tests, checkpoint, and wiring verification
- --specs-dir flag added to plan and night-shift commands
- Progress spinner added to onboard command
Full changelog
What's Changed
Bug Fixes
- nightshift: exclude
.agent-fox/from onboard file scanning (#499) - nightshift: add
--specs-dirflag toplanandnight-shiftcommands (#498) - nightshift: add progress spinner to
onboardcommand (#497)
Other
- Updated config
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.4...v3.0.5
Fixed triage task prompt causing parse failures.
Full changelog
What's Changed
- fix(nightshift): Triage agent now receives a triage-specific task prompt instead of the coder's "Fix the issue" prompt. This was the root cause of all triage parse failures — the agent would implement the fix instead of producing a JSON triage report.
- fix(tests): Knowledge wiring tests no longer leak
.specs/directories into the working tree.
- Removed deprecated `extract_spec_name` wrapper
- Deleted backward-compatibility re-export shims: `session/archetypes`, `nightshift/config`, `knowledge/query`
- Three-tier priority scheduling places coders before reviews for better throughput
- Deferred review injection lazily promotes review nodes when slots are idle
- Review concurrency cap limits parallel pool size
Full changelog
What's Changed
Features
- Three-tier priority scheduling — coders scheduled before reviews for better throughput (#490)
- Deferred review injection — lazy promotion of review nodes when slots are idle (#491)
- Review concurrency cap in parallel pool (#489)
Performance
- Pre-review scheduling optimization for critical-path specs (#476)
- Skip LLM extraction for reviewer archetypes and short transcripts (#475)
Bug Fixes
- Cascade blocking through in_progress nodes to prevent downstream dispatch (#481)
- Use
datetime.now(UTC)for run timestamps (#480) - Remove duplicate
harvest.completeemission (#482) - Populate
commit_shaingit.mergeaudit events (#484) - Classify review findings into multiple categories (#485)
- Generate embeddings for consolidated and pattern facts
- Correct return type annotation in
_sort_key - Drop dead tables and remove orphaned code (#460)
- Annotate test-file errors and add false-positive guidance to hunt prompts (#493)
Refactoring
- Consolidate file-only language analyzers (HTML, JSON, regex) into
SimpleAnalyzerbase class - Replace 17 try/except blocks in language registry with data-driven loop
- Merge
WorkStreamprotocol intostreams.py - Remove deprecated
extract_spec_namewrapper - Delete backward-compatibility re-export shims (
session/archetypes,nightshift/config,knowledge/query) - Update ~75 import sites to canonical module paths
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.2...v3.0.3
Fixed duplicate session outcomes rows and validated UUIDs in causal link parsing to prevent data loss.
Full changelog
Bug Fixes
-
fix(engine): Remove duplicate
session_outcomesrows — every session was writing two DB entries, one with incomplete data (cost=0, model=NULL). The redundant sink-based insertion path has been removed; session outcomes are now written exclusively bySessionResultHandler.process(). (#473) -
fix(knowledge): Validate UUIDs in causal link parsing — the LLM sometimes returned truncated UUIDs or git SHAs instead of valid fact UUIDs, causing
ConversionExceptionin DuckDB and silently dropping all causal links for the session.parse_causal_links()now validates UUID format before returning. (#474)
Other
- Updated README
- Updated default config.toml template for v3 in init
Full changelog
What's Changed
Bug Fixes
- nightshift: Eliminate contradictory 'skipping'/'Applied' log messages for migrations v5 and v10
- nightshift: Leave issue open when coder produces no commits (#466)
- nightshift: Prevent re-processing of closed issues in drain loop (#465)
- nightshift: Increment scan counter in
_run_issue_check(#469) - nightshift: Propagate
fix_run_idfrom engine intoprocess_issue(#468) - nightshift: Populate runs and
session_outcomesfrom fix pipeline (#467) - nightshift: Add
run_idto empty-body rejection comment and GitHub issue comments (#464) - nightshift: Remove obsolete
memory.jsonlcreation from init (#461) - nightshift: Move rev-list checks outside merge lock in
_sync_develop_with_remote(#458) - nightshift: Pass project root (not
.agent-foxdir) asrepo_rootto barrier (#454) - nightshift: Clean up stale running runs on orchestrator startup (#456)
- nightshift: Stop per-file row explosion in
session_outcomes(#457) - nightshift: Add AC-5 test for
record_tool_errorsink failure resilience (#459) - harvest: Pass
embeddertoextract_and_store_knowledge(#453) - harvest: Use feature branch commit message for squash merges
Improvements
- init: Update default
config.tomltemplate for v3 - refactor: Rename
agent_base.mdprofile toagent.md
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v3.0.0rc4...v3.0.1
- Legacy configuration keys for Skeptic, Oracle, and Auditor archetypes are automatically migrated but emit deprecation warnings.
- Projects using the default `.specs/` directory will see a deprecation warning; update `[paths] spec_root` in `config.toml` to the new location.
- Removed Skeptic, Oracle, and Auditor archetypes; replaced with unified Reviewer archetype having modes `pre-review`, `drift-review`, `audit-review`, and `fix-review`.
- Plan state storage moved from `plan.json` to DuckDB tables (`plan_meta`, `plan_nodes`).
- Spec root directory is now configurable via `[paths] spec_root` in `config.toml`; projects using `.specs/` receive deprecation warnings.
- Unified retriever with weighted Reciprocal Rank Fusion (RRF) combining keyword, vector, entity graph, and causal chain signals.
- First‑class customizable markdown agent profiles located in `.agent-fox/profiles/` with mode‑specific resolution; `agent-fox init --profiles` installs defaults.
Full changelog
agent-fox v3.0.0
The first stable release of agent-fox v3. This release completes the transition
from the v2 architecture to a consolidated, mode-based archetype system with
DuckDB-backed state management, adaptive knowledge retrieval, and comprehensive
documentation.
Highlights
Archetype Consolidation
The former Skeptic, Oracle, and Auditor archetypes are now unified into a
single Reviewer archetype with four modes: pre-review, drift-review,
audit-review, and fix-review. Legacy configuration keys are automatically
migrated with deprecation warnings. The archetype registry now contains four
entries: Coder, Reviewer, Verifier, and Maintainer.
Adaptive Knowledge Retrieval
A new unified retriever fuses four signals — keyword, vector, entity graph,
and causal chain — via weighted Reciprocal Rank Fusion (RRF). Intent profiles
adjust signal weights per archetype and task status. Salience-based token
budgeting ensures the most relevant facts get full detail while staying within
context limits.
Agent Profiles
Profiles are now first-class, customizable markdown files that define agent
behavioral guidance. Projects can override any profile via
.agent-fox/profiles/ with mode-specific resolution. Run
agent-fox init --profiles to install defaults for customization.
DuckDB Plan Persistence
Plan state is now stored in DuckDB tables (plan_meta, plan_nodes) instead
of plan.json, consolidating all persistent state in a single store.
Configurable Spec Root
The spec root directory is now configurable via [paths] spec_root in
config.toml (default: .agent-fox/specs). Projects using .specs/ are
auto-detected with a deprecation warning.
What's Changed (since v3.0.0-rc6)
Features
- Configurable spec root directory (#371)
- Bash, HTML, JSON, CSS, regex, and Swift language analyzers (#426)
- Agent base profile replaces CLAUDE.md in Layer 1 (#430)
- Mode-specific reviewer profiles to prevent schema cross-contamination
- Templates field on ModeConfig and ArchetypeEntry
- Schema, data models, and entity store for spec 95
Fixes
- Embedding dimension assertion in allowlist before SQL interpolation (#346)
- Hot-load queries plan_nodes DB table instead of plan.json (#444)
- Blocking history and learned thresholds migration (#449)
- Wire config.models.coding into resolve_model_tier for coder archetype
- Squash merge in harvest fallback to eliminate double-commit pattern
- Night-shift: replace af:fix removal with af:fixed label on issue closure (#429)
- Hollow generate_status test and production bug (#428)
- Include archived specs in dependency validation
Documentation
- Complete documentation audit and update for v3
- New profiles guide (
docs/profiles.md) - Expanded prompt generation section in architecture docs
- All legacy Skeptic/Oracle/Auditor references updated to mode-based terminology
- CLI reference: added
--profiles,findings, andonboardcommands - Config reference: removed stale hooks section, added missing pricing entries
- Architecture docs verified against source code
Dependencies
- Upgraded anthropic to 0.96 and claude-agent-sdk to 0.1.60
Installation
uv tool install agent-fox
Full Changelog
https://github.com/agent-fox-dev/agent-fox/compare/v3.0.0-rc6...v3.0.0
- Nightshift cost tracking (spec 91) with SinkDispatcher plumbing, auxiliary and quality‑gate paths
- Transient audit reports moved to .agent-fox/audit/ with PASS deletion and spec completion cleanup
Full changelog
What's Changed
Features
- Nightshift cost tracking (spec 91): Wire SinkDispatcher plumbing, auxiliary cost tracking, and quality gate cost tracking path
- Transient audit reports (spec 92): Move audit reports to
.agent-fox/audit/, add PASS deletion and spec completion cleanup
Bug Fixes
- #330: Guard
fetchone()result againstNonebefore indexing in nightshift - #329: Correct return type annotation of
_default_configfromobjecttoAgentFoxConfig
Housekeeping
- Moved implemented specs to
.specs/archive/
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.9.0...v2.9.1
- Scope guard subsystem (parser, validator, detector, checker, builder, classifier, telemetry)
- fix_coder archetype with dedicated `fix_coding.md` template integrated into fix pipeline
- Fact lifecycle management (deduplication, decay, cleanup, LLM contradiction detection) in harvest pipeline
Full changelog
What's New
Features
- Scope guard subsystem (spec 87) — source parser, stub validator, overlap detector, preflight checker, prompt builder, session classifier, and telemetry persistence
- fix_coder archetype (spec 88) — dedicated archetype with
fix_coding.mdtemplate, wired into the fix pipeline - Fact lifecycle management (spec 90) — dedup, decay, cleanup, and LLM-based contradiction detection wired into the harvest pipeline and sync barrier
- Simplified model routing (spec 89) — removed prediction pipeline, feature enrichment, duration estimation, and calibration modules; routing now uses ladder-based assessment only
Fixes
- Wire SDK Notification hook for activity progress events (#320)
- Guard
fetchone()results againstNoneinrun_cleanup - Add
routing_assessmentsandrouting_pipelineparams toSessionResultHandler(#325) - Wire
fix_coderarchetype into fix pipeline - Fix type errors and stale test assertions across nightshift tests
Maintenance
- Updated dependencies:
claude-agent-sdk0.1.52 → 0.1.58,anthropic0.84.0 → 0.93.0,ruff0.15.4 → 0.15.10, and all transitive deps - Removed ~2,500 lines of dead prediction/routing code (assessor, calibration, duration, features modules)
- Added scope guard and SDK improvement documentation
- Rewrite of fix pipeline using triage and fix_reviewer archetypes (spec 82)
- Added triage and fix_reviewer archetype registration, prompt templates, data types, and parse functions
Full changelog
What's Changed
Bug Fixes
- fix: map SDK TextBlock to AssistantMessage in ClaudeBackend —
_map_message()silently droppedTextBlockcontent blocks, so the agent's actual text response (including JSON findings/verdicts from review archetypes) was never captured. Skeptic, verifier, and oracle always fell back to parsing markdown metadata, producing 100% parse failures. - fix: capture review archetype response text for parsing — Added
responsefield toSessionOutcomeand wired it through_extract_knowledge_and_findings()so review parsers receive the agent's actual output instead of a fallback transcript.
Features
- feat: rewrite fix pipeline with triage/reviewer archetypes (spec 82) — Replaced skeptic/verifier in the fix pipeline with purpose-built
triageandfix_reviewerarchetypes. Triage produces structured acceptance criteria from GitHub issues; fix_reviewer verifies coder changes against those criteria with per-criterion PASS/FAIL verdicts. Includes retry loop with escalation ladder. - feat: add triage and fix_reviewer archetype registration and prompt templates
- feat: add triage and fix-review data types and parse functions
Tests
- Added unit, property, and integration smoke tests for the new fix pipeline (spec 82)
Other
- Multiple type annotation and test fixture fixes
- New specs: 82 (fix pipeline triage/reviewer), 83 (lint-spec coverage gaps)
- Night‑shift issue‑first gate drains `af:fix` labeled issues before/after hunt scans with fail‑open semantics
- Added `activity_callback`, `task_callback`, and `status_callback` to NightShiftEngine and FixPipeline with per‑archetype TaskEvent emission
- Integrated ProgressDisplay for phase/idle status rendering in the Night‑shift CLI
Full changelog
What's Changed
Features
- Night-shift issue-first gate: Issues with
af:fixlabel are now drained before and after hunt scans, with fail-open semantics for platform API failures - Callback plumbing: Added
activity_callback,task_callback, andstatus_callbackto NightShiftEngine and FixPipeline, with per-archetype TaskEvent emission - Night-shift CLI display: Integrated ProgressDisplay with phase/idle status rendering
Improvements
- Consolidated duplicated utilities and decoupled cross-module imports
Documentation
- Added architecture documentation suite (spec authoring, planning, execution, night-shift)
- Added coding harness analysis comparing agent-fox to Raschka's framework
Tests
- Added integration smoke tests for night-shift wiring verification
- Added unit tests for issue-first gate, callbacks, and display integration
Fixed broken AI analysis import, isolated DuckDB tests, closed SDK response stream leaks.
Full changelog
Quality gate and type safety fixes
This release fixes issues surfaced by the night-shift daemon's quality gate scan.
Bug fixes
- Broken import in quality gate AI analysis:
quality_gate.pyimported nonexistentget_client— fixed to usecreate_async_anthropic_client. AI-powered finding analysis now works instead of falling back to mechanical findings. - Test isolation for DuckDB:
test_cost_limit_terminateshit the real knowledge database during parallel test execution, causing lock contention failures. Now properly mocks oracle context to avoid shared state. - SDK response stream leak: Close SDK response stream before client teardown to prevent
ProcessErrorduring async generator cleanup (#215).
Type safety improvements
- Replaced
objectparameter types withPlatformProtocolindedup.pyandfinding.py, removing staletype: ignorecomments. - Added proper type casts in
config_schema.pyfor nested model extraction. - Fixed
classmethoddecorator typing in config validator factory. - Added
assert match is not Noneguards in prompt safety tests. - Typed
_tgdtest helpers and task tuple lists across 10 test files.
Lint fixes
- Resolved all 8 ruff errors: sorted imports in
resolver.py, addednoqa: E402for intentional late imports intests/conftest.py, and auto-fixed import ordering in steering and graph test files.
- Spec 79: Hunt scan cross-iteration deduplication
- Spec 80: Worktree cleanup hardening
Full changelog
Night-shift daemon fixes
This release fixes critical wiring gaps in the night-shift autonomous maintenance daemon that prevented it from operating correctly.
Bug fixes
- Fix branch creation:
_create_fix_branch()was defined but never called — archetype sessions ran on whatever branch was checked out instead of a dedicated fix branch. Issue closure now gated on successful harvest. - Scheduled re-polling: Issue checks and hunt scans only ran once at startup, then the daemon spin-looped idle. Now repeats at configured intervals (
issue_check_interval,hunt_scan_interval). - Cost/session tracking:
state.total_cost,state.total_sessions, andstate.issues_createdwere never updated, making cost limits ineffective. Sessions now report token usage back to the engine for cost calculation. - Session limit enforcement:
orchestrator.max_sessionswas never checked in the night-shift engine (61-REQ-9.3 compliance). - Develop checkout restoration: After fix sessions, the repo stayed on the fix branch. Now restores
developafter harvest so the next issue starts clean.
Other fixes
- Resolved 8 pre-existing test failures across CLI override handling, review-only graph construction, and worktree hardening property tests.
- Fixed flaky parallel test execution caused by deprecated
asyncio.get_event_loop()usage.
Specs
- Spec 79: Hunt scan cross-iteration deduplication
- Spec 80: Worktree cleanup hardening
Fixed flaky test failures by disabling hypothesis deadline.
Full changelog
What's Changed
- fix: disable hypothesis deadline globally to eliminate flaky test failures
- fix: disable hypothesis deadline on flaky property tests
- chore: bump version to 2.7.2
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.7.1...v2.7.2
Increased default session limits to prevent premature failures.
Full changelog
What's Changed
- fix: correct staleness fallback when AI evaluation fails
- fix: resolve four night-shift integration gaps (#226 #227 #228 #229)
- fix: harvest branch and close issue after night-shift fix pipeline (#225)
- fix: increase default session limits to prevent premature failures (#205)
- fix: include enriched feature vector fields in StatisticalAssessor (#206)
- docs+chore: integration gap analysis and strengthened af-spec template (#230)
- style: auto-format code with ruff
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.7.0...v2.7.1
- Implement watch loop core with `--watch` and `--watch-interval` flags, watch gate, stall detection (spec 70)
- Add CachePolicy config and `cached_messages_create()` helper; migrate auxiliary modules to cached API (spec 77)
- Make feature branches local‑only, push only develop (spec 78)
Full changelog
What's Changed
Features
- feat(watch): implement watch loop core —
--watchand--watch-intervalCLI flags, watch gate, stall detection (spec 70) - feat(caching): add CachePolicy config and
cached_messages_create()helper; migrate all auxiliary modules to cached API (spec 77) - feat(harvest): make feature branches local-only, push only develop (spec 78)
- feat(fix): add FixProgressEvent/CheckEvent types, wire ProgressDisplay and callbacks (spec 76)
- feat(engine): timeout-aware escalation with per-node retry logic (spec 75)
- feat(engine): tolerant review parser with fuzzy wrapper key matching and field normalization (spec 74)
- feat(engine): auto-reset blocked tasks on engine resume; clear attempt tracker on reset
- feat(nightshift): AI critic for finding consolidation, batch triage, post-fix staleness check (spec 73)
- feat(nightshift): reference parsing, dependency graph, and edge merging
- feat(reporting): active tasks in status command (spec 72)
- feat(platform): sort/direction params for
list_issues_by_label - feat(ui): show agent archetype in spinner line
- feat(config): timeout retry configuration fields in RoutingConfig
Fixes
- fix(cli): enforce CLI separation by delegating to backing modules (fixes #210)
- fix(barrier): run knowledge compaction during sync barriers (fixes #211)
- fix(reporting): display agent archetype in status and standup output (fixes #216)
- fix(retry): retry on network-level transport errors (fixes #208)
- fix: handle SIGTERM gracefully and prune stale worktrees before branch deletion
- fix(tests): reset agent_fox logger between tests to fix xdist flakiness
- fix(tests): mock
_setup_infrastructurein run_code tests to prevent MagicMock directory leak
Documentation
- docs(watch): add
--watchand--watch-intervalto CLI reference - docs(caching): add
[caching]section to configuration reference - docs(spec-78): update AGENTS.md for local-only feature branch workflow
Chores
- Archived completed specs (59–76)
- New specs: 77 (prompt caching), 78 (local-only feature branches)
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.6.2...v2.7.0
Minor fixes and improvements.
Full changelog
What's Changed
- refactor: simplify engine, platform factory, and package re-exports
- specs: add spec 74 (review parse resilience), spec 75 (timeout-aware escalation), spec 76 (fix progress display)
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.6.1...v2.6.2
- Simplified config template generation with visible sections and quality defaults (spec 68)
- Round‑robin spec‑fair scheduling across specs (spec 69)
- Watch mode added via watch_interval config field and WATCH_POLL audit event type (spec 70)
Full changelog
What's New in 2.6.1
- Config simplification (spec 68): simplified config template generation with visible sections and quality defaults
- Spec-fair scheduling (spec 69): round-robin scheduling across specs
- Watch mode (spec 70): watch_interval config field and WATCH_POLL audit event type
- Fix ordering (spec 71): spec and task ordering improvements
- Status command: show active agents in status output
- Develop sync fix: use
update-refinstead ofbranch -fto avoid failures when develop is checked out in a worktree
- Removal of the Coordinator (spec 62) breaks existing workflows.
- Night-shift engine, CLI command, and audit events (spec 61)
- Plan always-rebuild behavior (spec 63)
- CLI separation and logging improvements (spec 59)
Full changelog
Release 2.6.0
Highlights:
- Night-shift engine, CLI command, and audit events (spec 61)
- Coordinator removal (spec 62)
- Plan always-rebuild (spec 63)
- CLI separation and logging improvements (spec 59)
- End-of-run discovery (spec 60)
- Steering document spec (spec 64)
- Various archived specs (52–58) moved to archive
Fixed session failures caused by identical main and fallback models and corrected CLI flag naming for extra_args.
Full changelog
What's Changed
Bug Fixes
-
fix(engine): skip fallback model when it equals the main model — The default fallback model (
claude-sonnet-4-6) is the same as the STANDARD tier model used for coder sessions. The Claude CLI rejects--fallback-modelwhen it matches the primary model, causing sessions to fail with "Fallback model cannot be the same as the main model." Now the fallback is omitted when it equals the session's model ID. -
fix(session): use hyphenated CLI flag names for Claude SDK extra_args —
--max_budget_usdand--fallback_modelwere passed with underscores instead of hyphens, causing "unknown option" errors when budget or fallback model were configured.
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.5.1...v2.5.2
Fixed session CLI flag naming to use hyphens instead of underscores, correcting unknown option errors.
Full changelog
What's Changed
Bug Fixes
- fix(session): use hyphenated CLI flag names for Claude SDK extra_args —
--max_budget_usdand--fallback_modelwere passed with underscores instead of hyphens, causing "unknown option" errors when budget or fallback model were configured. - fix(engine): reset sub-task checkboxes during hard reset (fixes #163)
- fix(workspace,platform): sanitize error messages to prevent path and API detail leakage (fixes #192)
- fix(knowledge): escape markdown special characters in rendered output (fixes #193)
- fix(cli): restrict config file and directory permissions to owner-only (fixes #191)
- fix(core): validate LLM JSON responses with field-level constraints (fixes #186)
- fix(knowledge): add SQL table allowlist for query safety (fixes #188)
- fix(workspace): validate git ref names to prevent command injection (fixes #189)
- fix(session): validate and cap review parser output sizes (fixes #187)
- fix(core): add prompt content sanitization for injection defense (fixes #190)
- fix(cli): add Claude CLI version and settings validation (fixes #185)
Refactoring
- refactor: consolidate duplicated utilities and review parser logic — extracted shared audit helper, decomposed engine.py god-class, extracted SDK parameter resolution, inlined single-consumer blocking.py, split knowledge/query.py into focused modules. Net reduction of ~736 lines.
New Specs
- Spec 59: CLI Separation and Logging Improvements
- Spec 60: End-of-Run Discovery
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.5.0...v2.5.1
- Removed multi-backend abstraction; system now Claude‑only (spec 55).
- Deleted entire `tools/` package including fox tools and MCP server.
- Knowledge feedback loop with automated extraction, causal context, fallback inputs, and embedding generation (spec 52)
- Review persistence: structured parsing and DB storage of review findings across skeptic, verifier, oracle archetypes; retry context injection for coder retries; review‑only CLI mode (spec 53)
- Quality gate complexity assessment with feature vector enrichment and heuristic evaluation (spec 54)
Full changelog
What's New in v2.5.0
New Features
- Knowledge feedback loop (spec 52) — automated knowledge extraction with causal context, fallback inputs, and embedding generation
- Review persistence (spec 53) — structured parsing and DB storage of review findings from skeptic, verifier, and oracle archetypes; retry context injection for coder retries; review-only CLI mode
- Quality gate complexity assessment (spec 54) — quality gate execution with feature vector enrichment and heuristic assessment
- SDK feature adoption (spec 56) — max_turns, max_budget_usd, fallback_model, and thinking configuration with hierarchical defaults and archetype overrides
- Archetype model tiers (spec 57) — per-archetype default model tiers (ADVANCED for review archetypes, STANDARD for coder) with config overrides and ADVANCED escalation ceiling
- Predecessor escalation (spec 58) — escalation ladder awareness in retry logic; predecessors only block after exhausting all ladder levels
Architecture
- Claude-only commitment (spec 55) — removed multi-backend abstraction; simplified to Claude-exclusive backend with ADR documentation
- Removed fox tools and MCP server — deleted the entire
tools/package (server, registry, edit, read, search, outline) in favor of Claude Code's native tooling - Code simplification — eliminated backward-compat shims, consolidated duplicated review parsing logic, extracted
AssessmentManagerand sync barrier sequence from Orchestrator, split long methods into focused units
Bug Fixes
- Fixed missing severity normalization in engine review parser (accepted invalid severity values)
- Fixed redundant archetype resolution in launch preparation
- Block shell metacharacters in command allowlist (fixes #178)
- Harden spec name, improve error redaction and migration handling (fixes #179)
Full changelog
What's Changed
- fix(security): block shell metacharacters in command allowlist (fixes #178) by @mickume in https://github.com/agent-fox-dev/agent-fox/pull/180
- fix(security): harden spec name, merge lock, error redaction, migration dim (fixes #179) by @mickume in https://github.com/agent-fox-dev/agent-fox/pull/181
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.4.5...v2.4.6
- Removed functions: resolve_tier_ceiling, ensure_blocking_tables, create_circuit_breaker_event and unused tools/_file_io.py re-export shim.
- Circular dependency between cli and knowledge broken by moving path constants to core/paths.
Full changelog
What's Changed
Refactoring & Simplification
- Remove dead code:
resolve_tier_ceiling,ensure_blocking_tables,create_circuit_breaker_event, unusedtools/_file_io.pyre-export shim - Consolidate duplicated insert/query logic in
review_store.pyvia shared helpers (_insert_with_supersession,_query_active) - Consolidate three identical missing-section fixers via
_append_missing_section - Consolidate
hard_reset_all/hard_reset_taskshared logic via_perform_hard_reset - Replace archetype if/elif dispatch with dict lookup in
session_lifecycle.py - Unify coverage matrix and traceability table validators via
_check_section_with_table - Extract
_count_node_statushelper in orchestrator engine - Break
cli ↔ knowledgecircular dependency by moving path constants tocore/paths
Sync Barrier Hardening (spec 51)
- Worktree verification and orphan detection
- Bidirectional develop branch sync with merge lock
- Hot-load gate pipeline (tracking → completeness → linting)
- Parallel drain before barrier entry
- Comprehensive test coverage (unit, property, integration)
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.4.4...v2.4.5
Improved blocked task error messages with clearer root‑cause details.
Full changelog
What's Changed
Code Simplification & Refactoring
- Centralized node ID parsing: New
core/node_id.pymodule withparse_node_id()andspec_name_of(), replacing 8 scatterednode_id.split(":")patterns across the codebase - Consolidated path constants: Expanded
cli/paths.pywithPLAN_PATH,STATE_PATH,MEMORY_PATH,AUDIT_DIR— replacing inline path construction - Merged tool utilities: Combined
tools/hashing.py+tools/_file_io.pyintotools/_utils.py - Merged routing data+persistence: Combined
routing/types.py+routing/storage.pyintorouting/core.py - Extracted blocking logic: New
engine/blocking.py(145 lines) — skeptic/oracle blocking evaluation is now a pure function withBlockDecisionreturn type, reducingengine.pyfrom 1622 to 1522 lines
Bug Fixes (from v2.4.3)
- Fixed log/spinner interference with
LiveAwareHandler - Suppressed third-party warnings during orchestrator runs
- Improved blocked task error messages with root cause
Fixed blocked task errors to include the root cause of the last failed attempt.
Full changelog
What's Changed
- Fix log/spinner interference: Log messages now route through Rich's Live console when the progress spinner is active, preventing corrupted display output
- Suppress third-party warnings: HF Hub and sentence-transformers warnings no longer leak into the spinner display
- Better blocked task messages: Blocked task errors now include the root cause from the last failed attempt
- Status report accuracy: Fixed status filtering for archetype nodes with state overlay
- `reset --spec <spec>`: resets all tasks for a single spec to pending, cleans worktrees/branches, and synchronizes `tasks.md` and `plan.json` without rolling back git or compacting knowledge
Full changelog
New Features
reset --spec <spec_name>— Spec-scoped reset command that resets all tasks (coder + archetype nodes) belonging to a single spec topending, cleans worktrees/branches, and synchronizestasks.mdandplan.json. No git rollback or knowledge compaction — safe for re-executing one spec without affecting others. Mutually exclusive with--hardand positional<task_id>.
Bug Fixes (from v2.4.1)
- plan/status: exclude archetype nodes from task counts — Injected archetype nodes were inflating totals; now only coder nodes are counted with review nodes shown separately.
- status: honour tasks.md checkbox state — Status now seeds from graph statuses (reflecting
[x]checkboxes) before overlaying orchestrator state. - plan: propagate completion to archetype nodes — Archetype nodes for completed specs no longer appear in the execution order.
- git: prevent infinite hang on expired PAT —
run_git()now setsGIT_TERMINAL_PROMPT=0and enforces timeouts (60s/120s) to prevent credential prompt hangs.
Fixed task count inflation by excluding archetype nodes and honored manual checkbox completions in status.
Full changelog
Bug Fixes
-
plan/status: exclude archetype nodes from task counts — Injected archetype nodes (skeptic, oracle, verifier, auditor) were counted alongside real task groups, inflating totals and making progress appear lower than actual. Plan and status now report only coder nodes in task counts, with review nodes shown separately.
-
status: honour tasks.md checkbox state — When state.jsonl existed, the status command ignored tasks.md
[x]checkboxes for manually completed work. Now seeds from graph statuses first (reflecting checkboxes), then overlays orchestrator state. -
plan: propagate completion to archetype nodes — Archetype nodes were always shown as pending in the execution order, even when all coder tasks in their spec were completed. Now marks them as completed when all coder nodes in their spec are done.
-
git: prevent infinite hang on expired PAT —
run_git()had no timeout and did not suppress interactive credential prompts. When a PAT expired, git commands would hang indefinitely. Now setsGIT_TERMINAL_PROMPT=0and enforces timeouts (60s default, 120s for remote operations).
Fixed single-class calibration crash and suppressed HuggingFace Hub auth warning.
Full changelog
What's Changed
Fixes
- Suppress HuggingFace Hub auth warning and fix single-class calibration crash
Documentation
- Slim down root README to a concise project hook with install and quick start
- Move detailed documentation (archetypes, model routing, fox tools, spec-driven development) into
docs/ - Add
docs/README.mdas the central documentation index
Full Changelog: https://github.com/agent-fox-dev/agent-fox/compare/v2.3.3...v2.4.0
- Add `orchestrator.max_blocked_fraction` to TOML configuration to enable early stop on high block rates.
- Review and adjust `archetypes.skeptic_settings.block_threshold` and `archetypes.oracle_settings.block_threshold` as needed for desired blocking behavior.
- Skeptic & Oracle archetypes enforce blocking when critical findings exceed configurable `block_threshold` (default Skeptic = 3, Oracle advisory-only if omitted).
- New orchestrator config `max_blocked_fraction` stops runs early if blocked nodes reach the set fraction (e.g., 0.4 for 40%).
Full changelog
What's New
Skeptic & Oracle Blocking Enforcement
Skeptic and oracle archetypes now enforce blocking decisions in the engine. When a skeptic or oracle session completes, the engine queries its persisted review findings, counts critical findings against the configured block_threshold, and cascade-blocks the downstream coder task (and all its dependents) when the threshold is exceeded.
- Skeptic: blocks when critical findings exceed
archetypes.skeptic_settings.block_threshold(default: 3) - Oracle: blocks when critical drift findings exceed
archetypes.oracle_settings.block_threshold; remains advisory-only when threshold is omitted (the default) - Blocking decisions are recorded to the
blocking_historytable for threshold learning
Block Budget
New orchestrator.max_blocked_fraction config option (default: disabled). When set, the engine stops the run early if the fraction of blocked nodes reaches the configured threshold, preventing wasted cost on doomed sessions when a systemic failure blocks a significant portion of the task graph.
[orchestrator]
max_blocked_fraction = 0.4 # stop if 40%+ of tasks are blocked
Other Changes
- refactor: decompose
engine.pyandworkspace.pygod-modules into focused submodules (circuit.py,serial.py,injection.py,develop.py,git.py,worktree.py) - refactor: replace dict-based plan data with typed
TaskGraphin engine - feat: add retry with backoff to all Anthropic API calls
- fix: remove broken
auditandserve-toolsCLI commands (#174) - fix: update
dump_knowledge.pyto discover all DuckDB tables dynamically - refactor: consolidate duplicated API usage tracking and findings rendering
- refactor: remove redundant DuckDB writes and double-sort
- Introduced read_all_facts() with a 3-tier fallback strategy for resilient fact reading
Full changelog
What's Changed
Bug Fixes
- Resilient fact reading with automatic fallback — Introduced
read_all_facts()with a 3-tier fallback strategy (provided connection → read-only DuckDB → JSONL file), so reading facts always works regardless of DB availability. - Fixed empty
docs/memory.md—render_summary()was called without a DuckDB connection in the engine, producing an empty file every time. Now uses the fallback pipeline and receives the active connection from the Orchestrator. - Simplified status command — Replaced manual DuckDB/JSONL fallback in
agent-fox statuswith the unifiedread_all_facts()function.
Fixed progress spinner to reflect operation type and removed untracked files blocking harvest merges.
Full changelog
Fixes
- Progress spinner: Extract tool-use blocks from SDK
AssistantMessagecontent so the spinner shows "Reading…", "Editing…", etc. instead of always "Thinking…" - Harvest merge: Remove untracked files that would block fast-forward merge during harvest
- New `auditor` archetype (spec 46) validates test code against `test_spec.md` contracts, checking coverage, assertion strength, edge‑case rigor and independence; opt‑in via `[archetypes] auditor = true`.
- `agent-fox init` scaffolds Claude Code skill files alongside project configuration (spec 47).
Full changelog
What's New
Test Auditor Archetype (spec 46)
- New
auditorarchetype that validates test code againsttest_spec.mdcontracts before implementation begins - Checks coverage, assertion strength, precondition fidelity, edge case rigor, and test independence
auto_midinjection mode (after test-writing group, before implementation)- Conservative convergence (union semantics — worst verdict wins)
- Retry-predecessor with configurable circuit breaker
- Disabled by default, opt-in via
[archetypes] auditor = true
Init Skills (spec 47)
agent-fox initnow scaffolds Claude Code skill files alongside project config
Token Counting & Cache Pricing Fix
- Fixed cache token tracking:
cache_read_input_tokensandcache_creation_input_tokensnow flow through the full pipeline (SDK → ResultMessage → SessionOutcome → audit events → status report) - Fixed audit event payload storing combined
tokensinstead of separateinput_tokens/output_tokens - Fixed
build_status_report_from_auditalways reportingoutput_tokens = 0 - Added cache pricing to
ModelPricing(cache read at 10%, cache creation at 125% of input price)
Other Changes
- Merge lock and agent fallback for harvest/workspace operations
- Merge agent for AI-based conflict resolution
- Various test and infrastructure improvements
Fixed harvest checkout failure when untracked runtime files existed.
Full changelog
Bug Fix
- Fix harvest checkout failure with untracked files: When agent-fox runtime files (
.agent-fox/config.toml,.agent-fox/state.jsonl,.claude/settings.local.json,docs/memory.md) existed as untracked files in the working directory but were also tracked on thedevelopbranch,git checkout developduring harvest would fail, blocking all subsequent tasks in the same spec. Fixed by using force checkout in the harvest step, which is safe because all coding work happens in an isolated worktree.
- Structured finding persistence for skeptic/verifier/oracle sessions
Full changelog
What's Changed
Bug Fixes
- fix: align
__version__with 2.2.0 — runtime version was still reporting 2.1.2 after the 2.2.0 release - fix: use numeric confidence in DuckDB ingestion — two INSERT statements used string
'high'for theconfidencecolumn (migrated to DOUBLE in v5), causingConversionExceptionduring background knowledge ingestion - fix: resolve model tier to model ID for pricing lookups —
NodeSessionRunner._resolved_model_idstored tier names (e.g."ADVANCED") instead of model IDs (e.g."claude-opus-4-6"), causing pricing config misses and zero-cost estimates
Features
- feat: wire structured finding persistence for skeptic/verifier/oracle — the review parsers and DB insert functions existed but were never called from the session lifecycle; skeptic, verifier, and oracle sessions now persist their structured JSON output (findings, verdicts, drift reports) to DuckDB, enabling downstream context rendering for coders and blocking/convergence logic
Internal
- Version bump to 2.2.1
- DuckDB is now a hard requirement; `open_knowledge_store()` raises RuntimeError instead of returning None.
- Removed all Optional connection parameters from session lifecycle, knowledge harvest, memory store, context assembly, and routing.
Full changelog
What's New in v2.2.0
Predictive Planning & Knowledge (Spec 39)
- Duration-based task ordering — ready tasks sorted by predicted duration (longest first) to minimize wall-clock time, with regression model, historical median, and configurable presets as fallback chain
- Causal graph + review findings — review/drift/verification findings integrated into causal traversal for richer downstream context
- Confidence-aware fact selection — facts below a configurable confidence threshold are excluded from session context
- Pre-computed ranked facts — fact rankings cached at plan time for faster context assembly
- Cross-group finding propagation — critical findings from earlier task groups visible to downstream groups under "Prior Group Findings"
- Project model — aggregate spec outcomes, module stability scores, and archetype effectiveness via
agent-fox status --model - Critical path forecasting — identifies the longest-duration path through the task graph with tied-path detection
- File conflict detection — predicts file overlaps between parallel tasks and serializes conflicting pairs (opt-in)
- Learned blocking thresholds — adapts skeptic/oracle block thresholds from historical precision (opt-in)
Confidence Normalization (Spec 37)
- Unified confidence representation as
float [0.0, 1.0]across memory, knowledge, and routing parse_confidence()function handles string enum → float conversion with canonical mapping- DuckDB migration v5:
TEXT → DOUBLEfor confidence columns - JSONL backward compatibility preserved
DuckDB Hardening (Spec 38)
- DuckDB is now a hard requirement —
open_knowledge_store()raisesRuntimeErrorinstead of returningNone - Removed all
Optionalconnection parameters from session lifecycle, knowledge harvest, memory store, context assembly, and routing - DuckDB errors propagate instead of being silently swallowed
- Added
knowledge_conn/knowledge_dbtest fixtures for isolated in-memory DuckDB
Other Changes
- Hard reset (Spec 35) —
agent-fox reset --hardwith commit SHA tracking - Config generation (Spec 33) —
agent-fox initgeneratesconfig.tomlfrom schema - Token tracking (Spec 34) — per-archetype and per-spec cost breakdowns in status
- Oracle archetype (Spec 32) — drift detection agent with blocking logic
- Prompt rewrites — oracle, librarian, cartographer, coordinator prompts rewritten to gold standard pattern
- AGENTS.md rewrite — project-specific conventions documented
- Harvest reconciliation (Spec 36) — post-harvest develop branch reconciliation