Release history
AutoResearchClaw releases
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper.
All releases
7 shown
- 6+ intervention modes
- Idea Workshop
- Paper Co-Writer
Full changelog
v0.4.0 — Human-in-the-Loop Co-Pilot System
AutoResearchClaw is no longer purely autonomous. The new HITL Co-Pilot system transforms the pipeline into a human-AI collaborative research engine.
Highlights
- 6+ Intervention Modes:
full-auto,gate-only,checkpoint,step-by-step,co-pilot,custom,express - Idea Workshop: Brainstorm and refine hypotheses collaboratively (Stages 7-8)
- Baseline Navigator: Review and customize experiment designs (Stage 9)
- Paper Co-Writer: Section-by-section collaborative drafting (Stages 16-19)
- SmartPause: Confidence-driven dynamic intervention
- ALHF Intervention Learning: Learns from your review patterns
- Claim Verification: Inline fact-checking against collected literature
- Cost Guardrails: Budget monitoring with threshold alerts
- Pipeline Branching: Fork to explore multiple research directions
- CLI Commands:
attach,status,approve,reject,guide - 3 Adapters: CLI, WebSocket, MCP
New Files
researchclaw/hitl/— 34 modules (7,500+ lines)tests/test_hitl_*.py— 9 test files (242 tests)docs/HITL_GUIDE.md— 620-line guide- 3 new builtin skills
Testing
- 2,753 tests passed, 0 failures
Full Changelog: https://github.com/aiming-lab/AutoResearchClaw/compare/v0.3.2...v0.4.0
- VerifiedRegistry ground-truth whitelist
- Experiment diagnosis & repair
- ACP-compatible agent backends
Full changelog
What's New
Cross-Platform Support
- ACP-compatible agent backends: Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI
- OpenClaw bridge: messaging platform integration (Discord, Telegram, Lark, WeChat)
- CLI-agent code generation backend: delegates Stages 10 & 13 to external CLI agents with budget control and timeout management
Anti-Fabrication System
- VerifiedRegistry: ground-truth whitelist from experiment results with tolerance matching
- Experiment diagnosis & repair loop: 13 deficiency categories, auto-repair with best-result selection
- Always-on sanitization: unverified numbers replaced in paper tables
Stability & Quality
- 100+ bug fixes across 8 deep audit rounds
- Modular executor refactoring (10K → 400-line facade)
--resumeauto-detection for interrupted runs- LLM retry hardening with exponential backoff
- Community-reported fixes (macOS M3, math/theoretical topics)
New Subsystems
- Assessor (paper quality scoring + venue recommendation)
- Calendar (conference deadline tracking)
- Collaboration (multi-user research coordination)
- Copilot (interactive steering modes)
- Dashboard (real-time metrics broadcasting)
- Knowledge Graph (entity extraction + visualization)
- Memory (cross-run experiment/ideation/writing memory)
- MCP (Model Context Protocol server)
- Overleaf (live sync with conflict resolution)
- Project Manager (multi-project scheduling)
- Remote Servers (SSH/SLURM/cloud execution)
- Skills Library (12 built-in domain/tooling skills)
- Trends (daily arXiv digest + opportunity finder)
- Voice (speech-to-text commands)
- Wizard (guided project setup)
Testing
- 1,935 tests passing
Full Changelog: https://github.com/aiming-lab/AutoResearchClaw/compare/v0.3.1...v0.3.2
- Beast Mode for complex code generation with 6-signal complexity scoring and CodeAgent fallback
- Cross-domain support for ML, physics, chemistry, economics, math, biology, and security
- Web integration with Google Scholar, PDF extraction, and crawling capabilities
- CodeAgent v2 with sequential file generation and hard validation gates
- MetaClaw cross-run learning integration with skill injection and +18.3% robustness improvement
- 50+ pipeline bug fixes covering metrics, citations, LaTeX escaping, and Docker sandbox
- CodeAgent 4-phase architecture
- BenchmarkAgent with dataset selection
- FigureAgent with chart generation
Full changelog
Highlights
This release introduces three multi-agent subsystems, a hardened Docker sandbox, and 4 rounds of paper quality auditing — significantly improving the end-to-end quality of generated research papers.
New Multi-Agent Subsystems
CodeAgent (4-phase architecture)
- LLM generates multi-file experiment code (main.py + setup.py + requirements.txt)
- Static analysis & deep validation (AST-based class/method checks)
- LLM-guided code review with structured JSON feedback
- Iterative repair loop (up to 3 rounds) with automatic UnboundLocalError fix
BenchmarkAgent (4 sub-agents: Surveyor → Selector → Acquirer → Validator)
- Domain-aware dataset and baseline selection from 13-domain knowledge base
- Automatic benchmark acquisition with Docker compatibility validation
- Integrated at Stage 9 (experiment_design), output injected into Stage 10
FigureAgent (5 sub-agents: Planner → CodeGen → Renderer → Critic → Integrator)
- Academic-quality chart generation with SciencePlots, 300 DPI, colorblind-safe palette
- 6 built-in chart templates + LLM fallback for custom visualizations
- Tri-modal critic review (data accuracy, aesthetics, academic convention)
Docker Sandbox Enhancements
- Network-policy-aware code generation:
none|setup_only|pip_only|full - Dynamic dependency installation via requirements.txt
- Pre-cached datasets: CIFAR-10/100, MNIST, FashionMNIST, STL-10, SVHN
- Extended ML stack: torch, torchvision, timm, einops, transformers, etc.
Paper Quality Hardening (4-round audit)
- Post-compilation quality checks, weasel/duplicate word lint
- 7-dimension AI-Scientist-style review scoring
- AI-slop detection (50+ phrases), statistical rigor validator
- Cross-discipline support for 7 research domains (ML/physics/chem/econ/math/eng/bio)
- NeurIPS checklist integration
Bug Fixes (15+)
- Fix baselines dict-to-list crash in BenchmarkAgent
- Fix Gymnasium environment versions (v4 → v5)
- Fix experiment condition drift in iterative refinement (anchor to exp_plan.yaml)
- Fix compute budget constraint for experiment design
- Fix metric direction mismatch, citation verification batching
- Fix LaTeX output sanitization, figure plan format handling
- Add RL stability guidance (gradient clipping, NaN guard)
- And more — see full commit message for details
Compatibility
All changes are backward-compatible with v0.1.0 configuration files.
Full Changelog: https://github.com/aiming-lab/AutoResearchClaw/compare/v0.1.0...v0.2.0
- 23-stage pipeline
- Multi-agent debate system
- Self-healing code executor
Full changelog
AutoResearchClaw v0.1.0
Fully autonomous research pipeline: one message in, full conference paper out. 🦞
Highlights
- 23-stage pipeline: Research Scoping → Literature Discovery → Knowledge Synthesis → Hypothesis Generation → Experiment Design → Self-Healing Execution → Analysis & Decision → Paper Writing → Citation Verification
- Multi-agent debate: 3 agents (Innovator, Pragmatist, Contrarian) argue over hypotheses; adversarial analysis panel reviews results
- Self-healing executor: autonomous crash diagnosis, code repair, and Pivot/Refine decisions
- Cross-run evolution: time-decayed lesson store that improves future runs
- Citation verification: 4-layer pipeline (arXiv, DOI, Semantic Scholar, LLM relevance check)
- OpenClaw integration: trigger full runs from a chat message
Results (6 end-to-end runs)
- 100% pipeline completion (124/124 steps)
- 94.3% citation integrity
- Mean quality 6.2/10 on conference review scale
Requirements
- Python 3.9+
- OpenAI-compatible LLM API