Fono

v0.8.1 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

assistant dictation linux llm local-first rust

+5 more

speach-to-text stt vulkan whisper wyoming

Summary

AI summary

Updates Breaking, F8, and streaming across a mixed release.

Full changelog

A quality-of-life release: two more cloud providers, polish on the
"Pondering" pause UI, headless servers install themselves, and a handful
of papercuts gone.

Added

Deepgram speech-to-text now actually works. Picking Deepgram in
fono setup (or running fono use stt deepgram) had been broken
since v0.8.0 — it offered the option but failed at startup. The full
pipeline is now wired: both the batch endpoint and a real WebSocket
for live dictation, with the newer Nova-3 model as the default
(Nova-2 is still selectable for languages Nova-3 doesn't cover yet).
Cartesia speech-to-text. Same story — was advertised, now
delivered. Batch transcription via the ink-whisper family;
realtime ink-2 will follow in a future release.
Cartesia text-to-speech now picks a native voice per language.
Speak Romanian, hear a Romanian voice; switch to English in the
same session, hear an English voice. No more one-voice-fits-all.
Auto-stop on silence is now wired end-to-end. If you enable
"Auto-stop after pause" in the tray, dictation actually stops once
you've been quiet for the configured time — previously the
PONDERING label appeared but nothing committed.
sudo fono install is friendlier on servers. Headless boxes
(no graphical session, multi-user systemd target) are now detected
and the systemd lane runs by default — no --server flag needed.
A new --desktop flag forces the desktop lane on hosts that just
look headless.
Server installs auto-enable LAN sharing. fono install --server
now turns on the Wyoming STT listener on port 10300 out of the box,
probes that it actually bound, and prints the address so other
machines on the LAN can discover it immediately. fono uninstall
on a server also cleans up /var/cache/fono (multi-GB model
blobs).
A diagnostic VU bar. [overlay].volume_bar = "advanced" paints
a dBFS-axis meter with reference ticks for your speaking level and
the silence threshold — useful for tuning auto-stop without
guesswork. The default simple bar is unchanged.

Changed

The "PONDERING" pause indicator is consistent everywhere.
- It now shows up on the assistant flow (F8) too, in the green
  assistant palette, with the same auto-stop behaviour as
  dictation.
- It only appears when you've actually enabled auto-stop — no
  more PONDERING under your finger if you've opted out.
- It works in live (streaming) dictation, not just batch.
- It doesn't flicker on a single breath, chair creak, or mouse
  click during a real pause.
Tray "Auto-stop after pause" presets reworked from
Off / 0.8 s / 1.5 s / 3 s (chat-app numbers) to Off / 3 s / 5 s
(prose-dictation numbers). Default stays Off.
Tray "Visualization" picker now turns the VU bar on automatically
for the Transcript style and off for the others — sensible default,
still overridable from config.toml.
fono hwprobe matches what the setup wizard actually picks.
The recommendation table used to promise large-v3-turbo on
CPU-only boxes that the wizard would then quietly downgrade. Now
the report and the wizard agree.
Hotkey reliability on Wayland. Switching the overlay style from
the tray now takes effect on the very next hotkey press (no
restart). GNOME 47's portal hotkey rejection is detected upfront so
Fono falls back to gsettings/X11 instead of silently dropping
presses.
Local Whisper picks better defaults out of the box. Model names
now resolve through a quality-tested quantization ladder
(tiny → q5_1, small → q5_1, small.en → q8_0,
large-v3-turbo → q8_0); CPU threads default to the physical core
count, which doubles throughput on Zen 3 / Zen 4 SMT systems where
the previous default over-subscribed logical threads.

Fixed

Wayland overlay no longer steals keyboard focus, paints as an
opaque rectangle, or lands top-left on GNOME / Mutter. The overlay
now runs through a pluggable backend layer: native
wlr-layer-shell on KDE / wlroots / COSMIC / Hyprland; X11 via
Xwayland on GNOME (which doesn't implement layer-shell). Set
FONO_OVERLAY_BACKEND=… to force a specific backend.
PipeWire audio playback (pw-play) no longer fails on every
assistant reply — the --raw flag was missing.
LAN dictation against a Wyoming peer that advertises IPv6 no
longer fails with EINVAL when the peer's first-listed address is
a link-local. Discovery now prefers routable IPv4 / IPv6.
History database rebuilds itself when it carries an older
schema, instead of warning on every dictation.
The dictation key held down while pausing no longer flips the
overlay into PONDERING and (with auto-stop on) no longer ends the
session out from under you.
Shipped binaries no longer SIGILL on pre-VNNI / pre-AVX-512 CPUs.
The release build inherited ggml's GGML_NATIVE=ON default, which
appends -march=native to the C/C++ compile line. On the GitHub
Actions Linux runner (AMD EPYC 7763, Zen 3) the C compiler's
auto-vectoriser baked VPDPBUSD (AVX-VNNI) into the binary, causing
immediate SIGILLs on users' Kaby Lake, 8th-gen Intel, and earlier
laptops. The shipped binary now pins an explicit
AVX2 / FMA / F16C / BMI2 baseline (Intel Haswell ≥ 2013, AMD
Excavator ≥ 2015) via .cargo/config.toml, so what CI builds is
what users download — regardless of which CPU GitHub puts in its
runner pool. A/B verified on Lunar Lake: zero throughput loss
(±7% noise) because ggml's hand-written VNNI kernels are separately
gated by GGML_AVX_VNNI (also off by default), so -march=native
was costing portability without delivering any actual VNNI speedup.
Hotkeys work immediately after sudo fono install on
GNOME-Wayland. The post-install autostart spawned the daemon via
runuser -u $SUDO_USER, which inherited the sudo-scrubbed
environment: DISPLAY=:0 was preserved but WAYLAND_DISPLAY,
XDG_RUNTIME_DIR, and DBUS_SESSION_BUS_ADDRESS were not. With
only DISPLAY set, the daemon's hotkey-backend detector picked
the X11 listener, the GNOME-gsettings shim never ran, and F7 / F8
fell through in every native-Wayland app — users only saw working
hotkeys after logging out and back in (when the XDG autostart entry
fired with a real session env). The installer now reconstructs the
graphical-session env from /run/user/$(id -u) inside the spawn
command — after the user-switch — so the first daemon launched by
sudo fono install is identical to what the next-login autostart
would have produced. Drive-by: shutdown_existing_daemon no
longer panics with "Cannot start a runtime from within a runtime"
when re-running install while a previous daemon is still alive.

Removed

14 inert config keys (the always-warm-mic flag, eight commit-tuning
knobs, three session-budget knobs, and two more) — all of them
were silently ignored at runtime. Defaults are unchanged.

Breaking

[overlay].volume_bar is now "off" | "simple" | "advanced"
instead of a boolean, and defaults to "off". Existing configs
need a one-line edit: volume_bar = true → "simple",
volume_bar = false → "off". The tray picker handles new
installs automatically.

Full Changelog: https://github.com/bogdanr/fono/compare/v0.8.0...v0.8.1

Breaking Changes

[overlay].volume_bar is now "off" | "simple" | "advanced" instead of a boolean; existing configs must be edited accordingly.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Fono

Get notified when new releases ship.

About Fono

All releases →