Profine

v0.5.0 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 2mo Model Serving & MLOps

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-agents automated-optimization benchmark cli machine-learning gpu

+12 more

gpu-profiling llm-agents mingpt mixed-precision mlops modal model-training performance-optimization profiling python pytorch torch-compile

Affected surfaces

breaking_upgrade

Summary

AI summary

--hardware is now required on profile, benchmark, and run-all CLI commands.

Changes in this release

Type	Severity	Summary	CVE
Breaking	High	`--hardware` is now required on profile, benchmark, and run-all commands. `--hardware` is now required on profile, benchmark, and run-all commands. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature
Feature	Low	Added `profine telemetry doctor` to probe telemetry endpoint status and latency. Added `profine telemetry doctor` to probe telemetry endpoint status and latency. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Low	Update-check nudge prints version lag warning on CLI startup, silenced via PROFINE_NO_UPDATE_CHECK. Update-check nudge prints version lag warning on CLI startup, silenced via PROFINE_NO_UPDATE_CHECK. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high	—
Feature	Low	Added low‑sample warning when fewer than 10 step samples survive warmup stripping. Added low‑sample warning when fewer than 10 step samples survive warmup stripping. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Feature	Low	Introduced `PROFINE_TELEMETRY_RETRY_BACKOFF` env var to control telemetry retry backoff (default 2.0s). Introduced `PROFINE_TELEMETRY_RETRY_BACKOFF` env var to control telemetry retry backoff (default 2.0s). Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Feature	Low	Reader now feeds sibling modules to the analyzer LLM for accurate default detection. Reader now feeds sibling modules to the analyzer LLM for accurate default detection. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Performance	Medium	Telemetry HTTP timeout increased from 5s to 15s with one retry and 2s backoff. Telemetry HTTP timeout increased from 5s to 15s with one retry and 2s backoff. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Performance	Medium	Added exponential‑backoff retry for LLM backends with env‑tunable attempts (max 3). Added exponential‑backoff retry for LLM backends with env‑tunable attempts (max 3). Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Bugfix
Bugfix	Medium	Fixed divide‑by‑zero in `_projected_savings` when speedup approached 100%. Fixed divide‑by‑zero in `_projected_savings` when speedup approached 100%. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Bugfix	Medium	Corrected step‑time estimate poisoning by torch.compile cold start. Corrected step‑time estimate poisoning by torch.compile cold start. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Bugfix	Medium	Prevented `_strip_warmup` from stripping more samples than exist, preserving at least 3 samples. Prevented `_strip_warmup` from stripping more samples than exist, preserving at least 3 samples. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Bugfix	Medium	Fixed `--edit-dir` outside `--output` resolution to correctly apply BF16 tolerance widening. Fixed `--edit-dir` outside `--output` resolution to correctly apply BF16 tolerance widening. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Bugfix	Medium	Ensured `_resolve_hardware` prefers explicit hardware argument over stored profile record. Ensured `_resolve_hardware` prefers explicit hardware argument over stored profile record. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Bugfix	Low	Filtered benign Inductor autotune log spam in Modal executor. Filtered benign Inductor autotune log spam in Modal executor. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Bugfix	Low	Wrapped stacked edits in try/except to surface individual LLM candidate failures without losing prior edits. Wrapped stacked edits in try/except to surface individual LLM candidate failures without losing prior edits. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Bugfix	Low	File‑not‑found errors now hint to run `prepare.py` when missing tokenized dataset paths are detected. File‑not‑found errors now hint to run `prepare.py` when missing tokenized dataset paths are detected. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Refactor	Low	Removed `auto_select_hardware()` helper and param‑bucket preset table. Removed `auto_select_hardware()` helper and param‑bucket preset table. Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Refactor	Low	Deleted six empty package directories (heuristics, modifiers, output, preflight, search, resources). Deleted six empty package directories (heuristics, modifiers, output, preflight, search, resources). Source: granite4.1:30b@2026-05-19-audit Confidence: low	—
Other	Low	affected_surface affected_surface Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low	—

Full changelog

Multi-rep mingpt benchmark surfaced four product bugs + a breaking CLI change + a telemetry-resilience overhaul. All bugs fixed, 9 regression tests added, telemetry no longer silently drops rows when the backend is cold.

pip install -U profine

⚠️ Breaking change

--hardware is now required on profile, benchmark, and run-all. The previous auto default silently chose a "smallest preset that fits" using a heuristic that mis-sized GPUs for unknown architectures; making it explicit prevents that footgun. Pick one of: 1x_t4, 1x_l4, 1x_a10g, 1x_a100, 1x_h100. The auto_select_hardware() helper and the param-bucket preset table have been removed.

If you were running profine run-all train.py, change it to profine run-all train.py --hardware 1x_a100 (or your preferred preset).

Added

profine telemetry doctor. Synchronous probe of the telemetry endpoint that reports consent state, endpoint URL, HTTP status code, and per-attempt latency. Use this to verify the round-trip works (or to warm a sleeping Render dyno before a real run).
Update-check nudge on CLI startup. Profine now checks PyPI for the latest release once every 24 hours (cached in ~/.profine/) and prints a one-line nudge if your installed version is behind. Silenced via PROFINE_NO_UPDATE_CHECK=1.
Low-sample warning. Benchmark reports surface a warning when fewer than 10 step samples survive warmup stripping — so users notice when the median is built on thin data.
PROFINE_TELEMETRY_RETRY_BACKOFF env var. Test-and-CI knob for the telemetry retry backoff. Defaults to 2.0s in production.

Changed

Telemetry HTTP transport: timeout 5s → 15s, one retry with 2s backoff. The anon endpoint is hosted on Render's free/starter tier, where the first request after idle takes ~9s to wake the dyno. Under the old 5s timeout that first POST was always silently dropped. Final-attempt failures now log at WARNING (was DEBUG) so silent data loss is no longer invisible.
Verdict string for fast-but-wrong runs now reads FAIL (correctness; speedup measured but loss diverged) instead of leading with PASS. A run that ships incorrect numerics is not a pass, regardless of its step time.
README results section replaced with a median-of-3 multi-GPU table (A10G + A100). Honest framing of variance + range rather than a single fast-run headline.

Fixed

_projected_savings divide-by-zero when speedup approached 100% (zero-sample candidate). Clamped fraction_saved to 0.99.
_maybe_adapt step-time estimate poisoned by torch.compile cold-start. The adaptive step controller previously used elapsed / steps_completed, which is dominated by a 2.8s first-step compile when the steady state is ~17ms. Now uses median of recorded step times when available.
_strip_warmup could strip more samples than existed, producing a zero-sample comparison with a bogus "100% faster / ∞× speedup" result. Capped to keep at least 3 samples on both benchmarker.benchmarker and profiler.orchestrator.
--edit-dir outside --output now correctly resolves the suggest report via edit_dir.parent / "suggest". Without this, the BF16-aware tolerance widening never fired on standalone benchmark invocations, and every BF16-stack benchmark spuriously failed correctness.
_resolve_hardware in telemetry/emit.py now prefers the explicit hardware_name argument over profile_record.hardware_name. Batch / replay callers re-emitting from on-disk artifacts for a different GPU than the one that produced the profile record were having their rows mis-tagged.

Internal

9 new regression tests pinning each surface bug above; 584 tests total.
Six empty package directories deleted (heuristics/, modifiers/, output/, preflight/, search/, resources/) — vestigial scaffolding from a past refactor.
LLM backends (profine/llm/backend.py) gained exponential-backoff retry for transient API errors (timeouts, 5xx, rate limits), bounded at 3 attempts and env-tunable.
Modal executor (profine/modal/executor.py) filters benign Inductor autotune log spam (No valid triton configs, OutOfMemoryError: out of resource: triton_mm) so successful autotune sweeps don't read as crashes; also wires PROFINE_WALL_CLOCK_LIMIT so the script's StepController stays below Modal's container timeout.
Stacked edits in profine/editor/editor.py are wrapped in try/except so one bad LLM candidate surfaces as a non-applied EditResult instead of blowing away previously-successful edits.
Reader feeds sibling modules to the analyzer LLM, so defaults defined in imported files (e.g. mingpt/model.py) no longer come back as "guessed" zeros.
File-not-found errors now hint that a sibling prepare.py needs to run when the missing path looks like a tokenized dataset (nanoGPT/minGPT layout).

Breaking Changes

--hardware flag is now required on `profile`, `benchmark`, and `run-all` commands; the previous `auto` default has been removed.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Profine

Get notified when new releases ship.

About Profine

All releases →