This release adds 3 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Summary
AI summaryUpdates What's changed since v0.1.50, store, and Requirements across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Low |
Adds 4‑bit quantized scan replica with exact rerank and budget‑driven operation. Adds 4‑bit quantized scan replica with exact rerank and budget‑driven operation. Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Performance | Medium |
Cross-file IMAGE batching in index improves full pass and live updates. Cross-file IMAGE batching in index improves full pass and live updates. Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Performance | Medium |
bf16-i/o SDPA in vision achieves 9.9 images per second. bf16-i/o SDPA in vision achieves 9.9 images per second. Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Performance | Medium |
GPU top‑C candidate fast path in search reduces host work to O(C). GPU top‑C candidate fast path in search reduces host work to O(C). Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Performance | Medium |
Pageable host vectors using unlinked‑scratch mmap in quant mode optimize memory usage. Pageable host vectors using unlinked‑scratch mmap in quant mode optimize memory usage. Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Performance | Medium |
Slim resident metadata with canonical strings and lazy snippets reduces memory footprint. Slim resident metadata with canonical strings and lazy snippets reduces memory footprint. Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Performance | Low |
Releases old base before rebuilding store, halving transient GPU burst during rebuild. Releases old base before rebuilding store, halving transient GPU burst during rebuild. Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Refactor | Low |
Adds optional explicit-version input for CI releases (minor/major cuts). Adds optional explicit-version input for CI releases (minor/major cuts). Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Refactor | Low |
Regenerate Omni.xcodeproj when project.yml is newer. Regenerate Omni.xcodeproj when project.yml is newer. Source: llm_adapter@2026-06-11 Confidence: high |
— |
| Refactor | Low |
Preserve dev team entries when auto‑regenerating xcodeproj. Preserve dev team entries when auto‑regenerating xcodeproj. Source: llm_adapter@2026-06-11 Confidence: high |
— |
Full changelog
What's changed since v0.1.50
- ci: optional explicit-version input for releases (minor/major cuts)
- perf(index): cross-file IMAGE batching in both the full pass and live updates
- perf(vision): bf16-i/o SDPA - 9.9 images/s; custom-kernel project settled by data
- perf: silu-gate fusion, pooled-token gather, host-built audio windows
- perf(index): cross-file batching for live updates; addMM + weight-cast cache in towers
- perf(search): GPU top-C candidate fast path - O(C) host work after the scan
- build: preserve the dev team when auto-regenerating the xcodeproj
- build: regenerate Omni.xcodeproj when project.yml is newer
- perf(store): pageable host vectors - unlinked-scratch mmap in quant mode
- feat(store): 4-bit quantized scan replica with exact rerank, budget-driven
- perf(store): release the old base before rebuilding - halves the rebuild's transient GPU burst
- perf(store): slim resident metadata - canonical strings, lazy snippets
Install
Open the DMG and drag Omni onto Applications. No Gatekeeper prompt - the build is notarized.
Requirements
- Apple Silicon Mac (M1 or later), macOS 14+.
- First launch downloads the on-device search model (~1.9 GB), then runs offline.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Omni
All releases →Related context
Beta — feedback welcome: [email protected]