This release fixes issues for SREs watching stability and regressions.
✓ No known CVEs patched in this version
Topics
+12 more
Affected surfaces
Summary
AI summaryFixed redundant worker spawns per session and added stale lock recovery to prevent permanent mining halts.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Low |
Writes current timestamp to lock file on acquire for staleness detection. Writes current timestamp to lock file on acquire for staleness detection. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
| Dependency | Low |
No external dependencies added; uses existing lock file mechanism with timestamping. No external dependencies added; uses existing lock file mechanism with timestamping. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low |
— |
| Performance | Medium |
Reduces unnecessary worker spawns, cutting ~50ms Node cold-start and ~200ms DB I/O per session. Reduces unnecessary worker spawns, cutting ~50ms Node cold-start and ~200ms DB I/O per session. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
| Bugfix | High |
Detects and removes stale lock files older than 10 minutes, preventing permanent mining halt. Detects and removes stale lock files older than 10 minutes, preventing permanent mining halt. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
| Bugfix | High |
Prevents redundant worker spawns in long sessions via runtime deduplication set. Prevents redundant worker spawns in long sessions via runtime deduplication set. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low |
— |
| Refactor | Low |
Introduces module-level Set to track spawned sessions within a gateway runtime. Introduces module-level Set to track spawned sessions within a gateway runtime. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
Full changelog
Fixes #100 and #110.
Why
Two spawn-lifecycle bugs in openclaw/src/index.ts:
#100 — Wasted re-spawns: agent_end fires on every turn. The on-disk lock at ~/.deeplake/state/skillify/<projectKey>.worker.lock prevents overlapping workers, but as soon as a worker exits and releases its lock, the NEXT agent_end re-acquires it and spawns a fresh worker. The fresh worker does one watermark-check SQL roundtrip, sees nothing new to mine, and exits — but each spawn costs ~50ms Node cold-start + ~200ms DB I/O. A 50-turn session ends up doing 2-5 spawns instead of 1.
#110 — Stale locks halt mining permanently: tryAcquireOpenclawSkillifyLock does O_CREAT | O_EXCL | O_WRONLY and treats any pre-existing lock as "live worker, skip." There's no staleness check. If a worker dies abnormally (host kill, OOM, segfault) before its finally releases the lock, the lock persists forever and every subsequent agent_end silently no-ops mining for that project_key permanently. Hit live during the 2026-05-07 PR #98 E2E — a manual rm <lockfile> was needed to recover.
What changed
Per-runtime dedup (#100)
- New module-level
const skillifySpawnedFor = new Set<string>(). Tracks which session IDs have already triggered a spawn in this gateway runtime. agent_endhandler now wraps thespawnOpenclawSkillifyWorker(...)call inif (!skillifySpawnedFor.has(sid)) { skillifySpawnedFor.add(sid); … }.- The on-disk lock stays authoritative across processes (e.g. multiple gateway restarts). The new in-memory Set only suppresses within-runtime redundancy.
Stale-lock recovery (#110)
- Lock file now writes
String(Date.now())on acquire (was an empty file). - On
O_EXCLfailure, reads the existing lock body, parses it as a ms timestamp. IfDate.now() - ts > 10 minutesOR the body is unparseable (NaN), the lock is treated as stale → unlinked → retry acquire. - Mirrors the staleness logic in
src/skillify/state.ts:tryAcquireWorkerLockfor the non-openclaw agents. - Migration: empty pre-existing lock files (from earlier code) parse as
NaNand are treated as immediately stale on the first patched run — no manual cleanup needed. - 10-minute max age is generous vs typical worker runtime (<30s + buffer). Pathological hangs longer than that release the spawn slot to the next
agent_end, instead of leaking mining for the rest of the gateway's lifetime.
Tests
npm run typecheck— cleannpm test— 2380/2380 passing (one bundle-scan regex distance bumped 500→1500 to accommodate the new dedup comment block betweenAuto-capturedand the spawn site; same assertion intent)
Test plan after merge
- [ ] Long-running openclaw session (50+ turns).
grep -c "Auto-captured" /tmp/openclaw/openclaw-*.logshould be many;ls ~/.deeplake/state/skillify/*.worker.lockshould show at most one mtime-bump per session (one spawn, not 2-5). - [ ] Kill a worker mid-mine (
kill -9 $WORKER_PID). Wait 11 minutes. Nextagent_endshould successfully re-acquire the lock (stale-recovery path).
Summary by CodeRabbit
-
Bug Fixes
- Improved reliability of background worker spawning in extended agent sessions by preventing redundant spawn attempts
- Enhanced detection and cleanup of stale worker states
- Added error handling to gracefully manage worker startup failures
-
Tests
- Updated test validations for worker spawning behavior
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
Track Hivemind turns agent traces into skills and shares with your team
Get notified when new releases ship.
Sign up freeAbout Hivemind turns agent traces into skills and shares with your team
All releases →Related context
Related tools
Earlier breaking changes
- v0.7.52 Removes `hivemind tasks` CLI and related code surfaces.
- v0.7.51 Removes `hivemind tasks` CLI and related code surfaces.
- v0.7.19 Module name skilify replaced with skillify; affects all imports
- v0.7.19 CLI command skilify removed; renamed to skillify without deprecation alias
- v0.7.18 CLI subcommand renamed from `skilify` to `skillify`; no deprecation alias.
Beta — feedback welcome: [email protected]