This release includes 1 breaking change for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+12 more
Affected surfaces
Summary
AI summary--hardware is now required on profile, benchmark, and run-all CLI commands.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Breaking | High |
`--hardware` is now required on profile, benchmark, and run-all commands. `--hardware` is now required on profile, benchmark, and run-all commands. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
| Feature | Low |
Added `profine telemetry doctor` to probe telemetry endpoint status and latency. Added `profine telemetry doctor` to probe telemetry endpoint status and latency. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
| Feature | Low |
Update-check nudge prints version lag warning on CLI startup, silenced via PROFINE_NO_UPDATE_CHECK. Update-check nudge prints version lag warning on CLI startup, silenced via PROFINE_NO_UPDATE_CHECK. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
| Feature | Low |
Added low‑sample warning when fewer than 10 step samples survive warmup stripping. Added low‑sample warning when fewer than 10 step samples survive warmup stripping. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Feature | Low |
Introduced `PROFINE_TELEMETRY_RETRY_BACKOFF` env var to control telemetry retry backoff (default 2.0s). Introduced `PROFINE_TELEMETRY_RETRY_BACKOFF` env var to control telemetry retry backoff (default 2.0s). Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Feature | Low |
Reader now feeds sibling modules to the analyzer LLM for accurate default detection. Reader now feeds sibling modules to the analyzer LLM for accurate default detection. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Performance | Medium |
Telemetry HTTP timeout increased from 5s to 15s with one retry and 2s backoff. Telemetry HTTP timeout increased from 5s to 15s with one retry and 2s backoff. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Performance | Medium |
Added exponential‑backoff retry for LLM backends with env‑tunable attempts (max 3). Added exponential‑backoff retry for LLM backends with env‑tunable attempts (max 3). Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Bugfix | Medium |
Fixed divide‑by‑zero in `_projected_savings` when speedup approached 100%. Fixed divide‑by‑zero in `_projected_savings` when speedup approached 100%. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Bugfix | Medium |
Corrected step‑time estimate poisoning by torch.compile cold start. Corrected step‑time estimate poisoning by torch.compile cold start. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Bugfix | Medium |
Prevented `_strip_warmup` from stripping more samples than exist, preserving at least 3 samples. Prevented `_strip_warmup` from stripping more samples than exist, preserving at least 3 samples. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Bugfix | Medium |
Fixed `--edit-dir` outside `--output` resolution to correctly apply BF16 tolerance widening. Fixed `--edit-dir` outside `--output` resolution to correctly apply BF16 tolerance widening. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Bugfix | Medium |
Ensured `_resolve_hardware` prefers explicit hardware argument over stored profile record. Ensured `_resolve_hardware` prefers explicit hardware argument over stored profile record. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Bugfix | Low |
Filtered benign Inductor autotune log spam in Modal executor. Filtered benign Inductor autotune log spam in Modal executor. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Bugfix | Low |
Wrapped stacked edits in try/except to surface individual LLM candidate failures without losing prior edits. Wrapped stacked edits in try/except to surface individual LLM candidate failures without losing prior edits. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Bugfix | Low |
File‑not‑found errors now hint to run `prepare.py` when missing tokenized dataset paths are detected. File‑not‑found errors now hint to run `prepare.py` when missing tokenized dataset paths are detected. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Refactor | Low |
Removed `auto_select_hardware()` helper and param‑bucket preset table. Removed `auto_select_hardware()` helper and param‑bucket preset table. Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Refactor | Low |
Deleted six empty package directories (heuristics, modifiers, output, preflight, search, resources). Deleted six empty package directories (heuristics, modifiers, output, preflight, search, resources). Source: granite4.1:30b@2026-05-19-audit Confidence: low |
— |
| Other | Low |
affected_surface affected_surface Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low |
— |
Full changelog
Multi-rep mingpt benchmark surfaced four product bugs + a breaking CLI change + a telemetry-resilience overhaul. All bugs fixed, 9 regression tests added, telemetry no longer silently drops rows when the backend is cold.
pip install -U profine
⚠️ Breaking change
--hardwareis now required onprofile,benchmark, andrun-all. The previousautodefault silently chose a "smallest preset that fits" using a heuristic that mis-sized GPUs for unknown architectures; making it explicit prevents that footgun. Pick one of:1x_t4,1x_l4,1x_a10g,1x_a100,1x_h100. Theauto_select_hardware()helper and the param-bucket preset table have been removed.
If you were running profine run-all train.py, change it to profine run-all train.py --hardware 1x_a100 (or your preferred preset).
Added
profine telemetry doctor. Synchronous probe of the telemetry endpoint that reports consent state, endpoint URL, HTTP status code, and per-attempt latency. Use this to verify the round-trip works (or to warm a sleeping Render dyno before a real run).- Update-check nudge on CLI startup. Profine now checks PyPI for the latest release once every 24 hours (cached in
~/.profine/) and prints a one-line nudge if your installed version is behind. Silenced viaPROFINE_NO_UPDATE_CHECK=1. - Low-sample warning. Benchmark reports surface a warning when fewer than 10 step samples survive warmup stripping — so users notice when the median is built on thin data.
PROFINE_TELEMETRY_RETRY_BACKOFFenv var. Test-and-CI knob for the telemetry retry backoff. Defaults to 2.0s in production.
Changed
- Telemetry HTTP transport: timeout 5s → 15s, one retry with 2s backoff. The anon endpoint is hosted on Render's free/starter tier, where the first request after idle takes ~9s to wake the dyno. Under the old 5s timeout that first POST was always silently dropped. Final-attempt failures now log at WARNING (was DEBUG) so silent data loss is no longer invisible.
- Verdict string for fast-but-wrong runs now reads
FAIL (correctness; speedup measured but loss diverged)instead of leading withPASS. A run that ships incorrect numerics is not a pass, regardless of its step time. - README results section replaced with a median-of-3 multi-GPU table (A10G + A100). Honest framing of variance + range rather than a single fast-run headline.
Fixed
_projected_savingsdivide-by-zero when speedup approached 100% (zero-sample candidate). Clampedfraction_savedto 0.99._maybe_adaptstep-time estimate poisoned by torch.compile cold-start. The adaptive step controller previously usedelapsed / steps_completed, which is dominated by a 2.8s first-step compile when the steady state is ~17ms. Now uses median of recorded step times when available._strip_warmupcould strip more samples than existed, producing a zero-sample comparison with a bogus "100% faster / ∞× speedup" result. Capped to keep at least 3 samples on bothbenchmarker.benchmarkerandprofiler.orchestrator.--edit-diroutside--outputnow correctly resolves the suggest report viaedit_dir.parent / "suggest". Without this, the BF16-aware tolerance widening never fired on standalonebenchmarkinvocations, and every BF16-stack benchmark spuriously failed correctness._resolve_hardwareintelemetry/emit.pynow prefers the explicithardware_nameargument overprofile_record.hardware_name. Batch / replay callers re-emitting from on-disk artifacts for a different GPU than the one that produced the profile record were having their rows mis-tagged.
Internal
- 9 new regression tests pinning each surface bug above; 584 tests total.
- Six empty package directories deleted (
heuristics/,modifiers/,output/,preflight/,search/,resources/) — vestigial scaffolding from a past refactor. - LLM backends (
profine/llm/backend.py) gained exponential-backoff retry for transient API errors (timeouts, 5xx, rate limits), bounded at 3 attempts and env-tunable. - Modal executor (
profine/modal/executor.py) filters benign Inductor autotune log spam (No valid triton configs,OutOfMemoryError: out of resource: triton_mm) so successful autotune sweeps don't read as crashes; also wiresPROFINE_WALL_CLOCK_LIMITso the script'sStepControllerstays below Modal's container timeout. - Stacked edits in
profine/editor/editor.pyare wrapped in try/except so one bad LLM candidate surfaces as a non-appliedEditResultinstead of blowing away previously-successful edits. - Reader feeds sibling modules to the analyzer LLM, so defaults defined in imported files (e.g.
mingpt/model.py) no longer come back as "guessed" zeros. - File-not-found errors now hint that a sibling
prepare.pyneeds to run when the missing path looks like a tokenized dataset (nanoGPT/minGPT layout).
Breaking Changes
- --hardware flag is now required on `profile`, `benchmark`, and `run-all` commands; the previous `auto` default has been removed.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Profine
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]