Skip to content

Profine

v0.5.0 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 16d Model Serving & MLOps
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-agents automated-optimization benchmark cli machine-learning gpu
+12 more
gpu-profiling llm-agents mingpt mixed-precision mlops modal model-training performance-optimization profiling python pytorch torch-compile

Affected surfaces

breaking_upgrade

Summary

AI summary

--hardware is now required on profile, benchmark, and run-all CLI commands.

Changes in this release

Breaking High

`--hardware` is now required on profile, benchmark, and run-all commands.

`--hardware` is now required on profile, benchmark, and run-all commands.

Source: granite4.1:8b-q6_K@2026-05-19

Confidence: high

Feature Low

Added `profine telemetry doctor` to probe telemetry endpoint status and latency.

Added `profine telemetry doctor` to probe telemetry endpoint status and latency.

Source: granite4.1:8b-q6_K@2026-05-19

Confidence: high

Feature Low

Update-check nudge prints version lag warning on CLI startup, silenced via PROFINE_NO_UPDATE_CHECK.

Update-check nudge prints version lag warning on CLI startup, silenced via PROFINE_NO_UPDATE_CHECK.

Source: granite4.1:8b-q6_K@2026-05-19

Confidence: high

Feature Low

Added low‑sample warning when fewer than 10 step samples survive warmup stripping.

Added low‑sample warning when fewer than 10 step samples survive warmup stripping.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Feature Low

Introduced `PROFINE_TELEMETRY_RETRY_BACKOFF` env var to control telemetry retry backoff (default 2.0s).

Introduced `PROFINE_TELEMETRY_RETRY_BACKOFF` env var to control telemetry retry backoff (default 2.0s).

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Feature Low

Reader now feeds sibling modules to the analyzer LLM for accurate default detection.

Reader now feeds sibling modules to the analyzer LLM for accurate default detection.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Performance Medium

Telemetry HTTP timeout increased from 5s to 15s with one retry and 2s backoff.

Telemetry HTTP timeout increased from 5s to 15s with one retry and 2s backoff.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Performance Medium

Added exponential‑backoff retry for LLM backends with env‑tunable attempts (max 3).

Added exponential‑backoff retry for LLM backends with env‑tunable attempts (max 3).

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Bugfix Medium

Fixed divide‑by‑zero in `_projected_savings` when speedup approached 100%.

Fixed divide‑by‑zero in `_projected_savings` when speedup approached 100%.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Bugfix Medium

Corrected step‑time estimate poisoning by torch.compile cold start.

Corrected step‑time estimate poisoning by torch.compile cold start.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Bugfix Medium

Prevented `_strip_warmup` from stripping more samples than exist, preserving at least 3 samples.

Prevented `_strip_warmup` from stripping more samples than exist, preserving at least 3 samples.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Bugfix Medium

Fixed `--edit-dir` outside `--output` resolution to correctly apply BF16 tolerance widening.

Fixed `--edit-dir` outside `--output` resolution to correctly apply BF16 tolerance widening.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Bugfix Medium

Ensured `_resolve_hardware` prefers explicit hardware argument over stored profile record.

Ensured `_resolve_hardware` prefers explicit hardware argument over stored profile record.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Bugfix Low

Filtered benign Inductor autotune log spam in Modal executor.

Filtered benign Inductor autotune log spam in Modal executor.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Bugfix Low

Wrapped stacked edits in try/except to surface individual LLM candidate failures without losing prior edits.

Wrapped stacked edits in try/except to surface individual LLM candidate failures without losing prior edits.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Bugfix Low

File‑not‑found errors now hint to run `prepare.py` when missing tokenized dataset paths are detected.

File‑not‑found errors now hint to run `prepare.py` when missing tokenized dataset paths are detected.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Refactor Low

Removed `auto_select_hardware()` helper and param‑bucket preset table.

Removed `auto_select_hardware()` helper and param‑bucket preset table.

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Refactor Low

Deleted six empty package directories (heuristics, modifiers, output, preflight, search, resources).

Deleted six empty package directories (heuristics, modifiers, output, preflight, search, resources).

Source: granite4.1:30b@2026-05-19-audit

Confidence: low

Other Low

affected_surface

affected_surface

Source: granite4.1:8b-q6_K@2026-05-19

Confidence: low

Full changelog

Multi-rep mingpt benchmark surfaced four product bugs + a breaking CLI change + a telemetry-resilience overhaul. All bugs fixed, 9 regression tests added, telemetry no longer silently drops rows when the backend is cold.

pip install -U profine

⚠️ Breaking change

  • --hardware is now required on profile, benchmark, and run-all. The previous auto default silently chose a "smallest preset that fits" using a heuristic that mis-sized GPUs for unknown architectures; making it explicit prevents that footgun. Pick one of: 1x_t4, 1x_l4, 1x_a10g, 1x_a100, 1x_h100. The auto_select_hardware() helper and the param-bucket preset table have been removed.

If you were running profine run-all train.py, change it to profine run-all train.py --hardware 1x_a100 (or your preferred preset).

Added

  • profine telemetry doctor. Synchronous probe of the telemetry endpoint that reports consent state, endpoint URL, HTTP status code, and per-attempt latency. Use this to verify the round-trip works (or to warm a sleeping Render dyno before a real run).
  • Update-check nudge on CLI startup. Profine now checks PyPI for the latest release once every 24 hours (cached in ~/.profine/) and prints a one-line nudge if your installed version is behind. Silenced via PROFINE_NO_UPDATE_CHECK=1.
  • Low-sample warning. Benchmark reports surface a warning when fewer than 10 step samples survive warmup stripping — so users notice when the median is built on thin data.
  • PROFINE_TELEMETRY_RETRY_BACKOFF env var. Test-and-CI knob for the telemetry retry backoff. Defaults to 2.0s in production.

Changed

  • Telemetry HTTP transport: timeout 5s → 15s, one retry with 2s backoff. The anon endpoint is hosted on Render's free/starter tier, where the first request after idle takes ~9s to wake the dyno. Under the old 5s timeout that first POST was always silently dropped. Final-attempt failures now log at WARNING (was DEBUG) so silent data loss is no longer invisible.
  • Verdict string for fast-but-wrong runs now reads FAIL (correctness; speedup measured but loss diverged) instead of leading with PASS. A run that ships incorrect numerics is not a pass, regardless of its step time.
  • README results section replaced with a median-of-3 multi-GPU table (A10G + A100). Honest framing of variance + range rather than a single fast-run headline.

Fixed

  • _projected_savings divide-by-zero when speedup approached 100% (zero-sample candidate). Clamped fraction_saved to 0.99.
  • _maybe_adapt step-time estimate poisoned by torch.compile cold-start. The adaptive step controller previously used elapsed / steps_completed, which is dominated by a 2.8s first-step compile when the steady state is ~17ms. Now uses median of recorded step times when available.
  • _strip_warmup could strip more samples than existed, producing a zero-sample comparison with a bogus "100% faster / ∞× speedup" result. Capped to keep at least 3 samples on both benchmarker.benchmarker and profiler.orchestrator.
  • --edit-dir outside --output now correctly resolves the suggest report via edit_dir.parent / "suggest". Without this, the BF16-aware tolerance widening never fired on standalone benchmark invocations, and every BF16-stack benchmark spuriously failed correctness.
  • _resolve_hardware in telemetry/emit.py now prefers the explicit hardware_name argument over profile_record.hardware_name. Batch / replay callers re-emitting from on-disk artifacts for a different GPU than the one that produced the profile record were having their rows mis-tagged.

Internal

  • 9 new regression tests pinning each surface bug above; 584 tests total.
  • Six empty package directories deleted (heuristics/, modifiers/, output/, preflight/, search/, resources/) — vestigial scaffolding from a past refactor.
  • LLM backends (profine/llm/backend.py) gained exponential-backoff retry for transient API errors (timeouts, 5xx, rate limits), bounded at 3 attempts and env-tunable.
  • Modal executor (profine/modal/executor.py) filters benign Inductor autotune log spam (No valid triton configs, OutOfMemoryError: out of resource: triton_mm) so successful autotune sweeps don't read as crashes; also wires PROFINE_WALL_CLOCK_LIMIT so the script's StepController stays below Modal's container timeout.
  • Stacked edits in profine/editor/editor.py are wrapped in try/except so one bad LLM candidate surfaces as a non-applied EditResult instead of blowing away previously-successful edits.
  • Reader feeds sibling modules to the analyzer LLM, so defaults defined in imported files (e.g. mingpt/model.py) no longer come back as "guessed" zeros.
  • File-not-found errors now hint that a sibling prepare.py needs to run when the missing path looks like a tokenized dataset (nanoGPT/minGPT layout).

Breaking Changes

  • --hardware flag is now required on `profile`, `benchmark`, and `run-all` commands; the previous `auto` default has been removed.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Profine

Get notified when new releases ship.

Sign up free

About Profine

All releases →

Related context

Beta — feedback welcome: [email protected]