Fono

v0.3.5 Breaking

This release includes 5 breaking changes for platform teams planning a safe upgrade.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

assistant dictation linux llm local-first rust

+5 more

speach-to-text stt vulkan whisper wyoming

Summary

AI summary

Updates 0.6, 0.2, and 1.0 across a mixed release.

Full changelog

Fixed

Whisper trailing-closer hallucinations ("Thank you", "Bye", "Thanks
for watching") on silent tails. Three layers, root-cause-first:
- Layer A — local whisper-rs now opts in to the four
  hallucination guards that FullParams::new() leaves disabled by
  default: set_no_speech_thold(0.6), set_logprob_thold(-1.0),
  set_compress_thold(2.4), set_temperature_inc(0.2). Matches
  the canonical whisper.cpp CLI defaults.
- Layer B — new [stt.prompts] config: a per-language
  HashMap<bcp47, String> whose entry for the request's resolved
  language is sent as the Whisper initial_prompt (local) or
  prompt (Groq + OpenAI form-data field). When no entry matches
  the resolved language, no prompt is sent — preserving today's
  unbiased behaviour for languages the user hasn't configured.
  English-only Whisper variants (e.g. tiny.en, small.en,
  *-en-q5_1) auto-seed prompts.en with a neutral professional-
  dictation default unless the user already set one.
- Layer C — interactive.hold_release_grace_ms default
  lowered from 300 ms to 150 ms. Halves the silent tail Whisper
  sees on F8 release. Smoke-test: if trailing words get truncated,
  raise back to 300.
LLM cleanup observability: new INFO line llm: cleanup added=N removed=M chars after each successful cleanup so users can see
whether the LLM is doing real work or operating as a near-no-op
pass-through.

Removed

[stt.cloud].streaming config field. Streaming for cloud Groq is
now derived from [interactive].enabled — the master live-
dictation switch — so there is no separate per-backend opt-in. A
user who picks Groq and turns on live mode gets the pseudo-stream
client automatically; cost can be bounded via
interactive.streaming_interval > 3.0 (finalize-only mode) or
interactive.budget_ceiling_per_minute_umicros. Existing configs
with streaming = true parse without warning (serde silently
ignores unknown fields); the value is no longer consulted. Plan:
plans/2026-04-29-streaming-config-collapse-v1.md.
[interactive].overlay config field. The live-dictation overlay
is now always shown when [interactive].enabled = true — it is
the only feedback surface for live previews, so a per-section
toggle was incoherent. The previous warn-and-ignore code path
(added in v0.3.3) is gone. [overlay].enabled continues to
control the passive recording indicator in batch mode.
Wizard's third question on the cloud-STT path ("Enable Groq
streaming dictation?"). Live-mode users on Groq now go straight
through; users who want batch-only Groq just leave
[interactive].enabled = false.
general.notify_on_dictation config field. Redundant with the
existing clipboard-fallback notification: when injection works the
cleaned text is already at the cursor (the actual feedback); when
it falls back to clipboard the dedicated "Fono — copied to clipboard" toast at session.rs:171 fires with a Ctrl+V hint.
The per-dictation toast just duplicated case 1.
"Fono — live dictation active" toast on first F9 toggle-on.
The on-screen overlay is the user-visible indicator.
"Fono — STT switched" / "Fono — LLM switched" tray success toasts.
The user just clicked the tray menu and the tray label updates to
reflect the change. Switch failures still fire critical-urgency
notifications.

Changed

Linux desktop notifications now route through notify-send (libnotify
CLI) instead of notify-rust's pure-Rust zbus path. Fixes a class of
"no notification appeared" bugs in non-canonical environments (root
sessions without XDG_RUNTIME_DIR/DBUS_SESSION_BUS_ADDRESS,
systemd --user units without PassEnvironment=, container
desktops, Flatpak/Snap launchers, etc.) where libnotify's autolaunch
succeeds but zbus fails with "No such file or directory". notify-rust
is retained behind cfg(any(target_os = "macos", target_os = "windows")) for the future cross-platform ports. New
fono_core::notify::send() helper funnels every notification through
one code path; ~40 inline notify_rust::Notification::new() call
sites in daemon.rs/session.rs removed.

Added

interactive.hold_release_grace_ms config (default 300). On F8
release (and F9 toggle-off), the orchestrator now waits this many
milliseconds before signalling the capture thread to stop. Closes a
truncation bug where the last 100–300 ms of audio buffered in the
cpal host callback were abandoned when the user released the hotkey
early on a short utterance.
Desktop notification on cloud STT rate-limit (HTTP 429), deduped to
at most once per dictation session (per F8/F9 press). Surfaces via
notify-rust in the default build; slim builds without the notify
feature still emit a tracing::warn! line. A defensive 120 s
auto-reset re-arms the flag if the orchestrator's reset path is
skipped (e.g. by panic).
60-second preview-lane throttle after any cloud STT 429. The
streaming pseudo-stream loop checks
rate_limit_notify::is_throttled() before each preview tick and
skips it; only VAD-boundary finalize requests fire during the
throttle window. Self-clears after 60 s.
Single-instance guard via the IPC socket. The daemon now probes the
Unix socket on startup with UnixStream::connect; if a previous
daemon answers, we bail before duplicating hotkey grabs and model
loads. Stale sockets from crashed prior runs yield
ConnectionRefused and proceed normally. No PID file parsing, no
process probing — the socket itself is the source of truth.

Changed

Hotkey dispatch and live-dictation start/stop now log at DEBUG —
the existing pipeline ok: capture=… stt=… llm=… inject=…
summary at INFO is enough at default verbosity. Bump
RUST_LOG=fono=debug to see the per-event detail. 429 sites
upgraded from tracing::info! to tracing::warn! so they
appear at default log level, with the verbose JSON body now
compacted to a single human-readable line (model + RPM ceiling
- retry-in seconds) instead of being dumped raw. Streaming
  finalize and preview lanes detect 429 in the closure-error
  string and trip the same warn + notification + throttle path
  the batch backend uses.

Fixed

Hotkey-grab conflicts on X11 no longer print the bare
X Error of failed request: BadAccess … X_GrabKey to stderr.
A custom XSetErrorHandler is installed at daemon startup that
converts BadAccess-on-XGrabKey into an actionable
tracing::error! message naming the conflict and pointing at
[hotkeys].hold / [hotkeys].toggle in the config. Other X11
errors are surfaced at WARN with their numeric codes instead of
being printed by libxlib's default handler.

Breaking Changes

[stt.cloud].streaming config field removed; streaming now derived from [interactive].enabled
[interactive].overlay config field removed; overlay always shown when [interactive].enabled = true
general.notify_on_dictation config field removed (redundant with clipboard-fallback notification)
"Fono — live dictation active" toast on first F9 toggle-on removed
"Fono — STT switched" and "LLM switched" tray success toasts removed

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Fono

Get notified when new releases ship.

About Fono

All releases →

Fono

Summary

Fixed

Removed

Changed

Added

Changed

Fixed

Breaking Changes

Related context

Related tools