This release includes breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+5 more
Summary
AI summaryBroad release touches Added — hardware-adaptive setup, Added — tray, Documented, and Added — providers.
Full changelog
0.2.0 — 2026-04-27
Single-binary local stack: STT (whisper.cpp) and LLM cleanup
(llama.cpp) now ship together in one statically-linked fono binary,
out of the box, with hardware-accelerated CPU SIMD selected at runtime.
Added — single-binary local STT + LLM
llama-localis now part of thedefaultfeatures set. The previous
compile_error!guard incrates/fono/src/lib.rsis gone — both
whisper-rsandllama-cpp-2link into the same ELF..cargo/config.tomladds-Wl,--allow-multiple-definitionto
deduplicate the otherwise-collidingggmlsymbols vendored by both sys
crates. Both copies originate from the sameggerganovupstream and
are ABI-compatible; the linker keeps one set, no UB at runtime.- New
accel-cuda/accel-metal/accel-vulkan/accel-rocm/
accel-coreml/accel-openblasfeatures oncrates/fonothat
forward to matchingwhisper-rs/llama-cpp-2features for opt-in
GPU acceleration. - Startup banner prints a new
hw accel : <accelerators> + CPU <SIMD>
line (runtime SIMD probe: AVX512 / AVX2 / AVX / SSE4.2 + FMA + F16C on
x86; NEON + DotProd + FP16 on aarch64). LlamaLocal::run_inferenceredirects llama.cpp / ggml's internal
printf-style logging throughtracing(matches the existing
whisper_rs::install_logging_hookspattern). Default verbosity now
emits a singleLLM ready: <model> (<MB>, <threads> threads, ctx=<n>) in <ms>line; cosmetic load-time warnings (control-token type,
n_ctx_seq < n_ctx_train) are silenced. Re-enable on demand with
FONO_LOG=llama-cpp-2=info.- New smoke test
crates/fono/tests/local_backends_coexist.rsboots
WhisperLocalandLlamaLocalin the same process to lock in the
no-collision contract.
Added — wizard local LLM path
- First-run wizard now offers
Local LLM cleanup (qwen2.5, private, offline)as a top-level option in both the Local and Mixed paths, in
addition toSkipandCloud. Newconfigure_local_llmhelper picks
a tier-aware model:qwen2.5-3b-instruct(HighEnd),
qwen2.5-1.5b-instruct(Recommended/Comfortable),
qwen2.5-0.5b-instruct(Minimum/Unsuitable). All Apache-2.0 per
ADR 0004. - The wizard's auto-download now fires for either local STT or local
LLM (was STT-only).
Added — tray UX
- Tray STT and LLM submenus now show a
●marker beside the active
backend (was missing —active_backends()returned the traitname()
while the comparison logic expected the canonical config-string
identifier). - Switching to the local STT or LLM backend from the tray now ensures
the corresponding model file is on disk first, with a "downloading…"
notification, a "ready" notification on completion, and a clear error
notification on failure (with the orchestrator reload skipped to keep
the user on a working backend).
Changed — hotkey defaults
toggle = "F9"(wasCtrl+Alt+Space). Single key, no default
binding on any major desktop, easy to fire blind.hold = "F8"(wasCtrl+Alt+Grave). Adjacent to F9 for natural
push-to-talk muscle memory.cancel = "Escape"unchanged (only grabbed while recording).paste_lasthotkey removed. The tray's "Recent transcriptions"
submenu and thefono paste-lastCLI cover the same need with a
better UX (re-paste any of the last 10, not just the newest).
Request::PasteLastIPC andCmd::PasteLastCLI are preserved and
now route directly toorch.on_paste_last().
Changed — release profile size
[profile.release]now setsstrip = "symbols"andlto = "thin",
trimming the devcargo build --releaseartifact from ~23 MB → ~19 MB
(no code removal — only.symtab/.strtabdeduplication).
release-slim(used by packaging CI) is unchanged at ~15 MB.
Documented
docs/status.md— new entries for hotkey ergonomics and the
single-binary local-stack resolution.docs/troubleshooting.md,docs/wayland.md,README.mdupdated for
the new default hotkeys.- New plans:
plans/2026-04-27-shared-ggml-static-binary-v1.md(the
shared-ggml strategy that informed the linker-dedupe shortcut),
plans/2026-04-27-llama-dynamic-link-sota-v1.md,
plans/2026-04-27-candle-backend-benchmark-v1.md,
plans/2026-04-27-local-stt-llm-resolution-v1.md.
0.1.0 — 2026-04-25
First public release. Pipeline (audio → STT → LLM → inject) is fully wired
end-to-end; default release ships local whisper.cpp out of the box.
Added — pipeline
SessionOrchestrator(crates/fono/src/session.rs) glues hotkey FSM →
cpal capture → silence trim → STT → optional LLM cleanup → text injection
→ SQLite history. Hot-swappable backends behindRwLock<Arc<dyn …>>.fono record— one-shot CLI dictation (microphone → stdout / inject).fono transcribe <wav>— runs a WAV file through the same pipeline; useful
for verifying API keys without a microphone.
Added — providers
- STT: local whisper.cpp (small / base / medium models), Groq cloud
(whisper-large-v3-turbo), OpenAI cloud, optional Deepgram / AssemblyAI /
Cartesia stubs. - LLM cleanup: optional, off-by-default. OpenAI-compatible endpoints
(Cerebras, Groq, OpenAI, OpenRouter, Ollama) and Anthropic. STTandTextFormattertraits withprewarm()so the first dictation
after daemon start is not cold (latency plan L2/L3).fono use {stt,llm,cloud,local,show}— one-command provider switching;
rewrites config atomically and hot-reloads the orchestrator (no restart).fono keys {list,add,remove,check}— multi-provider API-key vault with
reachability probes.- Per-call overrides:
fono record --stt openai --llm anthropic.
Added — hardware-adaptive setup
crates/fono-core/src/hwcheck.rs— pure-Rust probe of physical/logical
cores, RAM, free disk, and CPU features (AVX2/NEON/FMA). Maps to a
five-levelLocalTier(Unsuitable,Minimum,Comfortable,
Recommended,High-end).- Wizard prints the live tier and steers the user toward local vs cloud
based on what the machine can sustain. fono hwprobe [--json]exposes the snapshot for scripts.fono doctorshows the active hardware tier alongside provider
reachability and the chosen injector.
Added — input / output
- Default key-injection backend
Injector::XtestPaste— pure-Rust X11 XTEST
paste viax11rb+xsel/wl-copy/xclipclipboard write. No system
dependencies beyond a clipboard tool. Shift+Insert is the default paste
shortcut (universal X11 binding). - Override paste shortcut via
[inject].paste_shortcut = "ctrl-v"in config
orFONO_PASTE_SHORTCUT=ctrl-shift-venv var. - Always-clipboard safety net: every successful dictation also writes to both
CLIPBOARD and PRIMARY selections (general.also_copy_to_clipboard = true). - Always-notify:
notify-rusttoast on every dictation
(general.notify_on_dictation = true). fono test-inject "<text>" [--shortcut <variant>]— smoke-tests injection
and clipboard delivery without speaking.
Added — tray
Recent transcriptions ▸submenu with the last 10 dictations; click to
re-paste.STT: <active> ▸andLLM: <active> ▸submenus for live provider
switching from the tray (same code path asfono use).- Open history folder (was misrouted to Dolphin in pre-release; now opens
the directory itself viaxdg-open).
Added — safety + observability
- Per-stage tracing breadcrumbs at
info:capture=…ms trim=…ms stt=…ms llm=…ms inject=…ms (raw_chars → cleaned_chars). - Pipeline in-flight guard refuses concurrent recordings with a toast.
- Skip-LLM-when-short heuristic (configurable
llm.skip_if_words_lt) saves
150–800 ms per short dictation. - Trim leading/trailing silence pre-STT (
audio.trim_silence); ~30 % faster
STT on 5 s utterances with 1.5 s of tail silence.
Added — benchmark harness
- New
crates/fono-bench/crate: 6-language LibriVox fixture set (en, es,
fr, de, it, ro), Word Error Rate + per-stage latency report, criterion
benchmark, regression gate. CI-fast (network-free) and full-stack modes.
Documented
docs/plans/2026-04-25-fono-pipeline-wiring-v1.md(W1–W22, all landed).docs/plans/2026-04-25-fono-latency-v1.md(L1–L30, 17 landed, 13
deferred-to-v0.2 with rationale).docs/plans/2026-04-25-fono-local-default-v1.md(H1–H25).docs/plans/2026-04-25-fono-provider-switching-v1.md(S1–S27).docs/plans/2026-04-25-fono-roadmap-v2.md(post-v0.1 roadmap).- ADR
docs/decisions/0007-local-models-build.md— glibc-linked default
release vs musl-slim cloud-only artifact.
Models locked in v0.1.0
| Provider | Model | License | First-run download |
|---|---|---|---|
| Whisper local | ggml-small.bin (multilingual) | MIT | ~466 MB |
| Whisper local (light) | ggml-base.bin | MIT | ~142 MB |
| Groq cloud STT | whisper-large-v3-turbo | (cloud, no license) | n/a |
| OpenAI cloud STT | whisper-1 | (cloud) | n/a |
| Cerebras cloud LLM | llama-3.3-70b | (cloud) | n/a |
| Groq cloud LLM | llama-3.3-70b-versatile | (cloud) | n/a |
Local LLM (Qwen2.5 / SmolLM2) is opt-in behind the llama-local Cargo
feature and ships fully wired in v0.2.
Verification
- 86 unit + integration tests; 2 latency-smoke
#[ignore]tests. cargo clippy --workspace --no-deps -- -D warningsclean (pedantic +
nursery).- DCO sign-off enforced on every commit.
Known limitations
- No streaming STT/LLM yet (latency plan L6/L7/L8 deferred to v0.2). Latency
on cloud Groq+Cerebras is ~1 s end-to-end on a 5 s utterance. - Wayland global hotkey requires compositor binding to
fono toggle
(org.freedesktop.portal.GlobalShortcutsnot yet stable in upstream
compositors). - Local LLM cleanup (Qwen / SmolLM) is opt-in / preview.
- Real
winit + softbufferoverlay window is a stub (event channel only).
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Fono
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]