AutoResearchClaw releases

No immediate action

v0.5.0 Breaking risk 2mo

Multi-domain expansion + ARC-Bench

Open

v0.4.0 New feature 3mo

Notable features

6+ intervention modes
Idea Workshop
Paper Co-Writer

Full changelog

v0.4.0 — Human-in-the-Loop Co-Pilot System

AutoResearchClaw is no longer purely autonomous. The new HITL Co-Pilot system transforms the pipeline into a human-AI collaborative research engine.

Highlights

6+ Intervention Modes: full-auto, gate-only, checkpoint, step-by-step, co-pilot, custom, express
Idea Workshop: Brainstorm and refine hypotheses collaboratively (Stages 7-8)
Baseline Navigator: Review and customize experiment designs (Stage 9)
Paper Co-Writer: Section-by-section collaborative drafting (Stages 16-19)
SmartPause: Confidence-driven dynamic intervention
ALHF Intervention Learning: Learns from your review patterns
Claim Verification: Inline fact-checking against collected literature
Cost Guardrails: Budget monitoring with threshold alerts
Pipeline Branching: Fork to explore multiple research directions
CLI Commands: attach, status, approve, reject, guide
3 Adapters: CLI, WebSocket, MCP

New Files

researchclaw/hitl/ — 34 modules (7,500+ lines)
tests/test_hitl_*.py — 9 test files (242 tests)
docs/HITL_GUIDE.md — 620-line guide
3 new builtin skills

Testing

2,753 tests passed, 0 failures

Full Changelog: https://github.com/aiming-lab/AutoResearchClaw/compare/v0.3.2...v0.4.0

View release on GitHub

v0.3.2 Mixed 4mo

Notable features

VerifiedRegistry ground-truth whitelist
Experiment diagnosis & repair
ACP-compatible agent backends

Full changelog

What's New

Cross-Platform Support

ACP-compatible agent backends: Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI
OpenClaw bridge: messaging platform integration (Discord, Telegram, Lark, WeChat)
CLI-agent code generation backend: delegates Stages 10 & 13 to external CLI agents with budget control and timeout management

Anti-Fabrication System

VerifiedRegistry: ground-truth whitelist from experiment results with tolerance matching
Experiment diagnosis & repair loop: 13 deficiency categories, auto-repair with best-result selection
Always-on sanitization: unverified numbers replaced in paper tables

Stability & Quality

100+ bug fixes across 8 deep audit rounds
Modular executor refactoring (10K → 400-line facade)
--resume auto-detection for interrupted runs
LLM retry hardening with exponential backoff
Community-reported fixes (macOS M3, math/theoretical topics)

New Subsystems

Assessor (paper quality scoring + venue recommendation)
Calendar (conference deadline tracking)
Collaboration (multi-user research coordination)
Copilot (interactive steering modes)
Dashboard (real-time metrics broadcasting)
Knowledge Graph (entity extraction + visualization)
Memory (cross-run experiment/ideation/writing memory)
MCP (Model Context Protocol server)
Overleaf (live sync with conflict resolution)
Project Manager (multi-project scheduling)
Remote Servers (SSH/SLURM/cloud execution)
Skills Library (12 built-in domain/tooling skills)
Trends (daily arXiv digest + opportunity finder)
Voice (speech-to-text commands)
Wizard (guided project setup)

Testing

1,935 tests passing

Full Changelog: https://github.com/aiming-lab/AutoResearchClaw/compare/v0.3.1...v0.3.2

View release on GitHub

v0.3.1 New feature 4mo

Notable features

Beast Mode for complex code generation with 6-signal complexity scoring and CodeAgent fallback
Cross-domain support for ML, physics, chemistry, economics, math, biology, and security
Web integration with Google Scholar, PDF extraction, and crawling capabilities

View release on GitHub

v0.3.0 New feature 4mo

Notable features

CodeAgent v2 with sequential file generation and hard validation gates
MetaClaw cross-run learning integration with skill injection and +18.3% robustness improvement
50+ pipeline bug fixes covering metrics, citations, LaTeX escaping, and Docker sandbox

View release on GitHub

v0.2.0 New feature 4mo

Notable features

CodeAgent 4-phase architecture
BenchmarkAgent with dataset selection
FigureAgent with chart generation

Full changelog

Highlights

This release introduces three multi-agent subsystems, a hardened Docker sandbox, and 4 rounds of paper quality auditing — significantly improving the end-to-end quality of generated research papers.

New Multi-Agent Subsystems

CodeAgent (4-phase architecture)

LLM generates multi-file experiment code (main.py + setup.py + requirements.txt)
Static analysis & deep validation (AST-based class/method checks)
LLM-guided code review with structured JSON feedback
Iterative repair loop (up to 3 rounds) with automatic UnboundLocalError fix

BenchmarkAgent (4 sub-agents: Surveyor → Selector → Acquirer → Validator)

Domain-aware dataset and baseline selection from 13-domain knowledge base
Automatic benchmark acquisition with Docker compatibility validation
Integrated at Stage 9 (experiment_design), output injected into Stage 10

FigureAgent (5 sub-agents: Planner → CodeGen → Renderer → Critic → Integrator)

Academic-quality chart generation with SciencePlots, 300 DPI, colorblind-safe palette
6 built-in chart templates + LLM fallback for custom visualizations
Tri-modal critic review (data accuracy, aesthetics, academic convention)

Docker Sandbox Enhancements

Network-policy-aware code generation: none | setup_only | pip_only | full
Dynamic dependency installation via requirements.txt
Pre-cached datasets: CIFAR-10/100, MNIST, FashionMNIST, STL-10, SVHN
Extended ML stack: torch, torchvision, timm, einops, transformers, etc.

Paper Quality Hardening (4-round audit)

Post-compilation quality checks, weasel/duplicate word lint
7-dimension AI-Scientist-style review scoring
AI-slop detection (50+ phrases), statistical rigor validator
Cross-discipline support for 7 research domains (ML/physics/chem/econ/math/eng/bio)
NeurIPS checklist integration

Bug Fixes (15+)

Fix baselines dict-to-list crash in BenchmarkAgent
Fix Gymnasium environment versions (v4 → v5)
Fix experiment condition drift in iterative refinement (anchor to exp_plan.yaml)
Fix compute budget constraint for experiment design
Fix metric direction mismatch, citation verification batching
Fix LaTeX output sanitization, figure plan format handling
Add RL stability guidance (gradient clipping, NaN guard)
And more — see full commit message for details

Compatibility

All changes are backward-compatible with v0.1.0 configuration files.

Full Changelog: https://github.com/aiming-lab/AutoResearchClaw/compare/v0.1.0...v0.2.0

View release on GitHub

v0.1.0 New feature 4mo

Notable features

23-stage pipeline
Multi-agent debate system
Self-healing code executor

Full changelog

AutoResearchClaw v0.1.0

Fully autonomous research pipeline: one message in, full conference paper out. 🦞

Highlights

23-stage pipeline: Research Scoping → Literature Discovery → Knowledge Synthesis → Hypothesis Generation → Experiment Design → Self-Healing Execution → Analysis & Decision → Paper Writing → Citation Verification
Multi-agent debate: 3 agents (Innovator, Pragmatist, Contrarian) argue over hypotheses; adversarial analysis panel reviews results
Self-healing executor: autonomous crash diagnosis, code repair, and Pivot/Refine decisions
Cross-run evolution: time-decayed lesson store that improves future runs
Citation verification: 4-layer pipeline (arXiv, DOI, Semantic Scholar, LLM relevance check)
OpenClaw integration: trigger full runs from a chat message

Results (6 end-to-end runs)

100% pipeline completion (124/124 steps)
94.3% citation integrity
Mean quality 6.2/10 on conference review scale

Requirements

Python 3.9+
OpenAI-compatible LLM API

View release on GitHub

All releases

v0.4.0 — Human-in-the-Loop Co-Pilot System

Highlights

New Files

Testing

What's New

Cross-Platform Support

Anti-Fabrication System

Stability & Quality

New Subsystems

Testing

Highlights

New Multi-Agent Subsystems

CodeAgent (4-phase architecture)

BenchmarkAgent (4 sub-agents: Surveyor → Selector → Acquirer → Validator)

FigureAgent (5 sub-agents: Planner → CodeGen → Renderer → Critic → Integrator)

Docker Sandbox Enhancements

Paper Quality Hardening (4-round audit)

Bug Fixes (15+)

Compatibility

AutoResearchClaw v0.1.0

Highlights

Results (6 end-to-end runs)

Requirements