This release includes 1 breaking change for platform teams planning a safe upgrade.
Published 10d
AI Agents & Assistants
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
assistant
dictation
linux
llm
local-first
rust
+5 more
speach-to-text
stt
vulkan
whisper
wyoming
Summary
AI summaryUpdates Breaking, F8, and streaming across a mixed release.
Full changelog
A quality-of-life release: two more cloud providers, polish on the
"Pondering" pause UI, headless servers install themselves, and a handful
of papercuts gone.
Added
- Deepgram speech-to-text now actually works. Picking Deepgram in
fono setup(or runningfono use stt deepgram) had been broken
since v0.8.0 — it offered the option but failed at startup. The full
pipeline is now wired: both the batch endpoint and a real WebSocket
for live dictation, with the newer Nova-3 model as the default
(Nova-2 is still selectable for languages Nova-3 doesn't cover yet). - Cartesia speech-to-text. Same story — was advertised, now
delivered. Batch transcription via theink-whisperfamily;
realtimeink-2will follow in a future release. - Cartesia text-to-speech now picks a native voice per language.
Speak Romanian, hear a Romanian voice; switch to English in the
same session, hear an English voice. No more one-voice-fits-all. - Auto-stop on silence is now wired end-to-end. If you enable
"Auto-stop after pause" in the tray, dictation actually stops once
you've been quiet for the configured time — previously the
PONDERING label appeared but nothing committed. sudo fono installis friendlier on servers. Headless boxes
(no graphical session, multi-user systemd target) are now detected
and the systemd lane runs by default — no--serverflag needed.
A new--desktopflag forces the desktop lane on hosts that just
look headless.- Server installs auto-enable LAN sharing.
fono install --server
now turns on the Wyoming STT listener on port 10300 out of the box,
probes that it actually bound, and prints the address so other
machines on the LAN can discover it immediately.fono uninstall
on a server also cleans up/var/cache/fono(multi-GB model
blobs). - A diagnostic VU bar.
[overlay].volume_bar = "advanced"paints
a dBFS-axis meter with reference ticks for your speaking level and
the silence threshold — useful for tuning auto-stop without
guesswork. The default simple bar is unchanged.
Changed
- The "PONDERING" pause indicator is consistent everywhere.
- It now shows up on the assistant flow (F8) too, in the green
assistant palette, with the same auto-stop behaviour as
dictation. - It only appears when you've actually enabled auto-stop — no
more PONDERING under your finger if you've opted out. - It works in live (streaming) dictation, not just batch.
- It doesn't flicker on a single breath, chair creak, or mouse
click during a real pause.
- It now shows up on the assistant flow (F8) too, in the green
- Tray "Auto-stop after pause" presets reworked from
Off / 0.8 s / 1.5 s / 3 s(chat-app numbers) toOff / 3 s / 5 s
(prose-dictation numbers). Default stays Off. - Tray "Visualization" picker now turns the VU bar on automatically
for the Transcript style and off for the others — sensible default,
still overridable fromconfig.toml. fono hwprobematches what the setup wizard actually picks.
The recommendation table used to promiselarge-v3-turboon
CPU-only boxes that the wizard would then quietly downgrade. Now
the report and the wizard agree.- Hotkey reliability on Wayland. Switching the overlay style from
the tray now takes effect on the very next hotkey press (no
restart). GNOME 47's portal hotkey rejection is detected upfront so
Fono falls back to gsettings/X11 instead of silently dropping
presses. - Local Whisper picks better defaults out of the box. Model names
now resolve through a quality-tested quantization ladder
(tiny → q5_1,small → q5_1,small.en → q8_0,
large-v3-turbo → q8_0); CPU threads default to the physical core
count, which doubles throughput on Zen 3 / Zen 4 SMT systems where
the previous default over-subscribed logical threads.
Fixed
- Wayland overlay no longer steals keyboard focus, paints as an
opaque rectangle, or lands top-left on GNOME / Mutter. The overlay
now runs through a pluggable backend layer: native
wlr-layer-shellon KDE / wlroots / COSMIC / Hyprland; X11 via
Xwayland on GNOME (which doesn't implement layer-shell). Set
FONO_OVERLAY_BACKEND=…to force a specific backend. - PipeWire audio playback (
pw-play) no longer fails on every
assistant reply — the--rawflag was missing. - LAN dictation against a Wyoming peer that advertises IPv6 no
longer fails withEINVALwhen the peer's first-listed address is
a link-local. Discovery now prefers routable IPv4 / IPv6. - History database rebuilds itself when it carries an older
schema, instead of warning on every dictation. - The dictation key held down while pausing no longer flips the
overlay into PONDERING and (with auto-stop on) no longer ends the
session out from under you. - Shipped binaries no longer SIGILL on pre-VNNI / pre-AVX-512 CPUs.
The release build inherited ggml'sGGML_NATIVE=ONdefault, which
appends-march=nativeto the C/C++ compile line. On the GitHub
Actions Linux runner (AMD EPYC 7763, Zen 3) the C compiler's
auto-vectoriser baked VPDPBUSD (AVX-VNNI) into the binary, causing
immediate SIGILLs on users' Kaby Lake, 8th-gen Intel, and earlier
laptops. The shipped binary now pins an explicit
AVX2 / FMA / F16C / BMI2 baseline (Intel Haswell ≥ 2013, AMD
Excavator ≥ 2015) via.cargo/config.toml, so what CI builds is
what users download — regardless of which CPU GitHub puts in its
runner pool. A/B verified on Lunar Lake: zero throughput loss
(±7% noise) because ggml's hand-written VNNI kernels are separately
gated byGGML_AVX_VNNI(also off by default), so-march=native
was costing portability without delivering any actual VNNI speedup. - Hotkeys work immediately after
sudo fono installon
GNOME-Wayland. The post-install autostart spawned the daemon via
runuser -u $SUDO_USER, which inherited the sudo-scrubbed
environment:DISPLAY=:0was preserved butWAYLAND_DISPLAY,
XDG_RUNTIME_DIR, andDBUS_SESSION_BUS_ADDRESSwere not. With
onlyDISPLAYset, the daemon's hotkey-backend detector picked
the X11 listener, the GNOME-gsettings shim never ran, and F7 / F8
fell through in every native-Wayland app — users only saw working
hotkeys after logging out and back in (when the XDG autostart entry
fired with a real session env). The installer now reconstructs the
graphical-session env from/run/user/$(id -u)inside the spawn
command — after the user-switch — so the first daemon launched by
sudo fono installis identical to what the next-login autostart
would have produced. Drive-by:shutdown_existing_daemonno
longer panics with "Cannot start a runtime from within a runtime"
when re-running install while a previous daemon is still alive.
Removed
- 14 inert config keys (the always-warm-mic flag, eight commit-tuning
knobs, three session-budget knobs, and two more) — all of them
were silently ignored at runtime. Defaults are unchanged.
Breaking
[overlay].volume_baris now"off" | "simple" | "advanced"
instead of a boolean, and defaults to"off". Existing configs
need a one-line edit:volume_bar = true→"simple",
volume_bar = false→"off". The tray picker handles new
installs automatically.
Full Changelog: https://github.com/bogdanr/fono/compare/v0.8.0...v0.8.1
Breaking Changes
- [overlay].volume_bar is now "off" | "simple" | "advanced" instead of a boolean; existing configs must be edited accordingly.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Fono
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]