Fono

v0.8.0 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

Published 2mo AI Agents & Assistants

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

assistant dictation linux llm local-first rust

+5 more

speach-to-text stt vulkan whisper wyoming

Summary

AI summary

Updates docs/decisions/0026-live-preview-as-overlay-style.md, greyed-out, and docs/decisions/0025-cloud-provider-catalogue.md across a mixed release.

Full changelog

Changed

Live preview is now a waveform style, not a separate toggle. The
tray "Waveform style" submenu gains a fifth entry — Transcript (live preview — more CPU / tokens) — that replaces the old
config-file-only [interactive].enabled flag. Picking Transcript
both swaps the overlay to streaming text and routes the
dictation hotkey through the live pipeline (this is the fix for
"live transcription only worked for the assistant, not for
dictation"). Fft remains the first-run default; live preview stays
opt-in because it costs more CPU on local STT and more tokens on
any cloud backend that bills per-second of streamed audio.
Internally Config::live_preview() is the single source of truth,
defined as overlay.style == Transcript. See
ADR 0026.

Removed

[interactive].enabled config field (Fono has no users yet, so no
migration is provided — the field is just gone). The rest of the
[interactive] block — boundary heuristics, drain grace,
cleanup_on_finalize, prosody/filler vocab, chunk timing — stays
put as streaming-pipeline tuning that applies whenever Transcript is
active.

Added

scripts/capture-overlay.sh — reproducible overlay screencast
helper for the README. Three modes: overlay (tight 640×≤240 crop),
paste (overlay + target-app window for "lands in a real app"
demos), and gallery (records each waveform style — bars,
oscilloscope, FFT, heatmap — labels them, and stitches the clips
via ffmpeg -f concat or a 2×2 xstack grid). Detects
X11 vs Wayland, resolves monitor geometry via xrandr / wlr-randr /
swaymsg, encodes MP4 + GIF (palette pipeline with 5 MB soft / 9.5 MB
hard budget auto-tiering) + animated WebP, and probes deps with
per-distro install hints. Dev-only; not part of the shipped binary.
See docs/troubleshooting.md → "Capturing screencasts".
Onboarding auto-start and contextual tray left-click. Three
small UX changes that turn the first-launch path into a one-command
experience:
1. sudo fono install (and therefore curl -fsSL https://fono.page/install | sh) now starts fono in the
  background as the invoking user — picked up from $SUDO_USER
  and launched via runuser/sudo with setsid detachment — and
  then runs the fono setup wizard interactively in the same
  terminal (also as $SUDO_USER, with stdio inherited so the
  prompts reach the user). Running the installer as bare root (no
  sudo wrapper) is a fully supported path: fono spawns and the
  wizard runs as root, writing under /root/.config/fono/ — fono
  is allowed to run as root if that's what you want.
  packaging/install.sh re-attaches </dev/tty to the install
  invocation under the curl | sh transport so the wizard's
  stdin still has a real terminal when curl is piping the script
  in. The backgrounded daemon's stdout/stderr now append to
  $XDG_STATE_HOME/fono/fono.log (typically
  ~/.local/state/fono/fono.log, or /root/.local/state/fono/fono.log
  for the bare-root install path) — matching Paths::log_file()
  so tail -f and what fono itself considers its log path are the
  same file. Previously the spawn redirected to /dev/null, which
  made post-install troubleshooting needlessly hard. Each step now
  reports a precise outcome (started / setup completed / skipped
  because headless / spawn failed) so users always know exactly
  what happened. Skipped on headless boxes (no
  DISPLAY/WAYLAND_DISPLAY/XDG_RUNTIME_DIR) and bypassable
  with FONO_INSTALL_NO_START=1 for packagers and CI. The XDG
  autostart entry still handles next-login start. The server-mode
  install path is unchanged — systemd's systemctl enable --now
  was already starting the unit (logs via journalctl -u fono.service).
2. The daemon now fires a single low-urgency desktop notification
  on startup when no TTS backend is configured, prompting the user
  to run fono setup. Once per process; suppressed once setup
  completes (the daemon's IPC Reload hook refreshes the
  onboarding snapshot atomically so no restart is required).
3. The tray icon's SNI left-click is now contextual: when TTS is
  not yet configured it nudges toward fono setup; once configured
  it shows the current hotkey cheat sheet (dictation / assistant /
  cancel). The "Show last transcription" menu entry continues to
  work for users who want it; the left-click no longer fires that
  action.
Implemented without adding any config field — the question "is setup
finished?" is answered by the new Config::tts_configured(&Secrets)
helper, which folds the existing configured_tts_backends logic.
packaging/install.sh is now the canonical source for the
https://fono.page/install one-liner and lives next to the binary
it ships.
Unified log file at /var/log/fono.log. Single-user-box
convention: every fono process writes there (world-writable 0666,
pre-created by fono install). Paths::log_file() now points at
that path. The daemon's tracing formatter forces ANSI on, so the
file preserves colors. fono doctor appends the last 10 log lines
to its report; fono doctor -f (or --follow) streams the file in
real time via tail -F, ANSI escapes intact. The background spawn
in fono install falls back to /dev/null if /var/log/fono.log
is not writable, so a permissions hiccup never blocks startup.
Colorized fono doctor output. Section headers in bold cyan,
ready / present / exists in green, FAIL / MISSING /
FAILED TO LOAD / NONE in bold red, disabled / (unset) /
(fallback) dimmed, active-provider * highlighted. Auto-disabled
when stdout is not a TTY (pipes, redirects, CI) and when NO_COLOR
is set, so scripts parsing the output remain unaffected.
Animated "POLISHING" overlay for local STT/LLM. The
standalone-waveform overlay's post-release phase used to show a
static "POLISHING" panel while STT (and optional LLM cleanup) ran;
with a local whisper.cpp backend that's a 1–3 s dead patch where
the user has no signal the dictation is actually progressing. The
overlay now reuses the assistant's per-style thinking animation
(FFT bell sweep, neural-strand heatmap, oscilloscope standing
wave, centre-out bars) during that phase whenever the active STT
backend reports is_local() — or whenever LLM cleanup is enabled
and the LLM is local. Cloud STT+LLM (sub-second) keep the static
panel so it doesn't just flash. Implemented via a new
OverlayState::Polishing variant that shares the amber accent +
"POLISHING" label with the existing Processing state but is
consumed by the same synthetic-frame renderer path as
AssistantThinking. New default is_local() method on both the
SpeechToText and StreamingStt traits (also TextFormatter),
overridden to true only in the whisper-local and llama-local
backends.

Fixed

OpenRouter TTS default swapped from openai/gpt-4o-mini-tts-… to
openai/tts-1 (default voice alloy). The LLM-based
gpt-4o-mini-tts model produced higher-quality voices but its
streaming output was not reliably forwarded by OpenRouter's
/audio/speech proxy: the proxy flushed an ~9.6 KB preamble and
then buffered the rest of the synthesised body until upstream
finished (~30+ s for a typical 200-character reply), exceeding
every reasonable client timeout. Verified via the fono.http
instrumentation's one-shot stall hex dump — bytes were valid PCM,
just never delivered. Classical tts-1 produces audio in
~0.5-2 s regardless of length and the whole body is forwarded in
one go, sidestepping the proxy-buffering problem entirely. Users
who want the LLM-based voice can pin
[tts.cloud] model = "openai/gpt-4o-mini-tts-2025-12-15" in
config.toml and accept the failure mode on long replies, or
switch to OpenAI direct (where streaming works correctly).
OpenRouter TTS second-sentence stalls eliminated by disabling
HTTP/2 connection-pool reuse on the TTS client. Previously, the
first sentence of an assistant turn synthesised correctly but every
subsequent sentence stalled identically (~9.6 KB chunk arrived,
then 15 s of silence, then watchdog fired) — symptomatic of
OpenRouter's /audio/speech proxy mishandling multiplexed HTTP/2
streams. The TTS reqwest client now runs with
pool_max_idle_per_host(0) and http1_only(), forcing a fresh
TCP+TLS handshake per request (~200-400 ms overhead, negligible
against multi-second LLM-based synthesis). Other backends (LLM,
STT, assistant chat) keep their HTTP/2 pooling because no
equivalent stall pattern was observed there.
TTS inter-chunk watchdog set to 20 s. Empirically OpenRouter's
/audio/speech proxy delivers a small preamble (~9.6 KB across ~8
chunks) and then pauses for several seconds before resuming the
audio stream proper. The previous 5 s watchdog tripped during that
pause and produced false-stall failures on otherwise-healthy
synthesis; 20 s keeps headroom for that pause while still catching
genuinely wedged connections far faster than the overall 30 s
request timeout. A one-shot warn!-level hex dump of the partial
body fires on the first TTS stall per process lifetime, surfacing
whether the preamble bytes are SSE framing, JSON metadata, or
genuine PCM — diagnostic data for the next round of investigation.
Structured-log chunks field now reports the truth on stalled
/ transport-error outcomes. Previously hardcoded to 0 in the TTS,
LLM, and STT consumers, which made it impossible to distinguish
"proxy sent one chunk then hung" from "nothing ever arrived" in
fono.http=debug logs. New BodyError::chunks() and
BodyError::after_ms() accessors expose the underlying watchdog
state to all consumers uniformly.
OpenRouter TTS time-to-first-audio collapsed from ~30 s to ~2-4 s
by sending stream_format: "audio" on /audio/speech requests for
models that benefit from it (OpenRouter's gpt-4o-mini-tts and
OpenAI direct). Without this field, OpenAI's LLM-based TTS models
buffer the entire synthesis server-side before opening the response
body — visible in the fono.http instrumentation as a ~30 s
headers_ms followed by a ~200 ms body_ms. With it, the upstream
streams raw audio bytes as they are generated and headers_ms drops
to sub-second. The catalogue gates the new field per provider:
enabled for OpenAI and OpenRouter, intentionally omitted for Groq's
Orpheus deployment (which is conservative about unknown request
fields). Classical models like tts-1 are unaffected — they already
stream by default and accept the field as a no-op.

Added

Structured HTTP instrumentation across every cloud-backed
pipeline (STT transcribe, LLM cleanup chat, voice-assistant
streaming chat, TTS /audio/speech, wizard key validation). A new
fono-http crate provides a single per-stage stopwatch
(RequestTimings), an inter-chunk body watchdog
(read_body_with_watchdog), and one chokepoint
(emit_http_debug) that funnels every consumer through the same
schema (stage, provider, endpoint, status, headers_ms,
ttfb_ms, body_ms, decode_ms, total_ms, body_bytes,
content_length, chunks, request_id, attempt, outcome).
Silent by default; opt in per session with
RUST_LOG=info,fono.http=debug fono daemon. Detects stalled
bodies in 15-30 s (per stage) rather than waiting for the global
60 s reqwest timeout, surfaces the upstream x-request-id /
request-id on every response (success and failure), and on TTS
retries once automatically when the upstream stalls mid-body
(typical OpenRouter proxy hiccup). The improved error surface for
stalled TTS now reads e.g. openrouter TTS body read failed (request_id=or-…, attempt=2) instead of the previous bare
reading openrouter TTS response body. Per-stage chunk watchdogs:
TTS 15 s (overall cap reduced from 60 s to 30 s), STT 30 s, LLM
cleanup 30 s, assistant SSE 20 s inter-event.
OpenRouter app attribution is now sent on every outbound
request to openrouter.ai (STT transcribe + prewarm, LLM chat +
prewarm, voice-assistant chat stream + prewarm, TTS
/audio/speech, and the wizard's validate_cloud_key probe),
not just from the STT backend as before. The three static headers
are HTTP-Referer: https://fono.page,
X-OpenRouter-Title: Fono, and
X-OpenRouter-Categories: personal-agent,writing-assistant —
identical across every install, no per-user or per-machine
identifier embedded, no request body changes. Fono now appears on
https://openrouter.ai/rankings, in the "Apps" tab of each model
it routes through, and gets a public dashboard at
https://openrouter.ai/apps?url=https://fono.page. The previous
STT-only attribution used the GitHub repo URL as the Referer; the
switch to fono.page is a deliberate one-time reset onto the
canonical project homepage. See
https://openrouter.ai/docs/app-attribution and the new
fono_core::openrouter_attribution module.
fono setup now hot-reloads the daemon when it finishes.
Previously, running the wizard while fono was already running
saved the new config but the daemon kept using the old one until
manually restarted. The wizard now sends Request::Reload over
IPC after config.toml / secrets.toml are written, and prints
Daemon reloaded — new settings are live. (or a friendly
fallback hint when no daemon is running).
Desktop notification when a configured backend's API key is
missing at startup or after a config reload. Previously, a
rotated key or a wizard pick whose secret hadn't been added yet
surfaced only as a single tracing::WARN line (e.g. TTS unavailable; assistant replies will be silent: Cartesia TTS API key "CARTESIA_API_KEY" not found in secrets.toml or environment).
A new ErrorClass::MissingKey variant is now classified from
reload errors and fired as a Critical-urgency popup with copy
that names the env var and the fono keys add <KEY> command.
Wired through the LLM / TTS / Assistant reload paths; subject to
the existing session cascade cap.

Changed

OpenRouter TTS default swapped from hexgrad/kokoro-82m to
openai/gpt-4o-mini-tts-2025-12-15 for native multilingual output
(default voice coral, $0.60 / 1 M characters). Kokoro voices are
monolingual and prefixed by language code, so every non-English
synthesis was routed through an American-English voice; OpenAI Mini
TTS speaks French, German, Spanish, Romanian, Mandarin, etc.
natively with no per-call language argument or per-language voice
map needed. Existing users who prefer Kokoro can pin
[tts.cloud] model = "hexgrad/kokoro-82m" and
voice = "af_heart" in config.toml; full Kokoro support is
deferred to a future local+cloud-symmetric backend (see
plans/2026-05-14-kokoro-local-and-cloud-parity-v1.md).
Voice assistant wizard step now renders as an aligned three-
column table (Provider · Model · Key). Model names are
human-readable (GPT-5.4 mini, Claude Haiku 4.5,
GPT-OSS 120B, Qwen 3 235B, …) rather than raw catalogue ids,
and the key-status column reads set / missing instead of the
earlier (key already set) / (will ask for key) parenthetical.
Assistant TTS auto-picked from the same key. When the chosen
assistant chat provider also offers TTS (e.g. OpenAI for both),
the wizard reuses the same provider + key for the spoken reply
and prints TTS: <provider> (same key as the assistant — no extra prompt). instead of running the explicit TTS picker. The
picker still runs when the chat provider has no TTS capability.
Comfortable-tier first-run latency budget bumped from 1500 ms
to 2000 ms. The earlier 1.5 s ceiling tripped first-dictation
warnings on perfectly usable mid-range hardware (laptops on
battery, slower SSDs). 2.0 s reflects measured p50 latency on
the lower end of the Comfortable tier; tiers above it (HighEnd
600 ms / Recommended 1000 ms) are unchanged.
Tray TTS submenu drops the redundant cloud, prefix and greys
out unavailable backends. Every cloud backend was annotated
(cloud, will ask for key) or (cloud, key already set) — but
clicking the entry never asked for a key, so the message was
misleading. The submenu now shows backends whose key is missing
as non-clickable (greyed-out) rows with a plain (no key)
suffix; backends with a configured key remain clickable. A new
DISABLED_SENTINEL prefix in fono-tray lets daemon submenus
opt rows out of activation without per-row plumbing.

Fixed

Groq TTS rejected response_format: pcm with HTTP 400
(response_format must be one of [wav]). Groq's Orpheus
deployment only emits WAV-wrapped audio. The OpenAI-compat TTS
client now reads its response_format from the catalogue
(OpenAiCompat { base_url, response_format }) and strips the
RIFF/WAVE header transparently when the provider returns WAV,
yielding the same raw 24 kHz int16 LE PCM the playback path
expects. OpenAI and OpenRouter keep pcm (lowest latency).
Groq TTS rejected the default voice (tara) with HTTP 400
(voice must be one of the following voices: [autumn diana hannah austin daniel troy]). Fono's catalogue defaulted to tara,
which is part of Canopy Labs' open-source Orpheus voice set but
not part of Groq's hosted six-voice subset for
canopylabs/orpheus-v1-english. The Groq TTS default voice is
now hannah (neutral female, in Groq's curated set). Users with
an explicit [tts.cloud.groq].voice override pinned to a Canopy-
only voice (tara/leah/jess/leo/dan/mia/zac/zoe)
must edit to one of autumn/diana/hannah/austin/daniel/
troy to get audio out of Groq.

Added

Desktop notification when a TTS/STT/LLM/assistant model requires
terms acceptance. Providers like Groq return HTTP 400 with
model_terms_required when an org admin hasn't accepted a model's
terms (e.g. Orpheus, PlayAI). The critical-notify classifier now
recognises that shape as a new TermsRequired class, and the
notification body embeds the acceptance URL extracted from the
provider response so the user can click straight through to the
console. Subject to the existing session cascade cap.

Fixed

Anthropic LLM cleanup 400 stop_sequences: each stop sequence must contain non-whitespace. The client was sending
stop_sequences = ["\n\n"] which Anthropic now rejects. The
blank-line heuristic is dropped; cleanup output length is bounded by
max_tokens = 512 alone.
Groq assistant returned 404 (model_not_found) because the
catalogue advertised llama-4-maverick-17b-128e-instruct as Groq's
multimodal model and the new default of prefer_vision = true
caused the runtime to swap to it. That model isn't available on
Groq today. Groq's multimodal_model is now None; the assistant
uses openai/gpt-oss-120b (the existing text_model) for every
Groq request.
Groq TTS model decommissioned. The previously catalogued
playai-tts model (voice Fritz-PlayAI) was retired by Groq and
now returns model_not_found. Groq's catalogue entry now points
at canopylabs/orpheus-v1-english (Canopy Labs' Orpheus, OpenAI-
compatible audio/speech on Groq) with default voice tara. The
endpoint URL and auth header are unchanged.
OpenAI assistant requests rejected by chat/completions when
prefer_web_search was on (Invalid value: 'web_search_preview').
The web_search_preview tool descriptor is Responses-API-only;
chat/completions rejects unknown tool types with a 400. OpenAI's
catalogue entry now advertises web_search = None; the default of
[assistant].prefer_web_search has been flipped to false.
Anthropic's web_search_20250305 (Messages API) is unaffected. A
future commit will re-enable OpenAI web search via the Responses
API migration. As a defensive belt-and-braces, the OpenAI
chat/completions client now drops any web-search tool descriptor
at request build time and emits a one-shot tracing::warn! so a
hand-edited prefer_web_search = true no longer surfaces a 400 to
the user.
Cloud STT clients (OpenAI, Deepgram) were missing from the default
build. crates/fono/Cargo.toml listed fono-stt and fono-llm
with no feature selection, so the default release shipped only the
per-crate default features (Groq + Wyoming STT, OpenAI-compat +
Groq LLM). A user picking OpenAI as primary in the wizard hit a
STT not compiled in warning at daemon startup. fono-stt is now
built with groq + openai + deepgram + wyoming; fono-llm is
built with cerebras + openai-compat + anthropic. The cloud-all
meta-feature is widened to match. (Cartesia / AssemblyAI STT
clients are not yet wired as fono-stt features — tracked
separately.)

Added

Cloud provider capability catalogue. A single
fono_core::provider_catalog::CLOUD_PROVIDERS table is the source of
truth for which cloud providers offer STT / LLM cleanup / assistant
chat / vision / web search / TTS. The wizard, tray, fono use cloud,
and fono doctor all consume the catalogue, eliminating the five
duplicated match blocks the wizard used to carry. (Phase A, #9; see
ADR 0025.)
Multi-provider TTS for the voice assistant (#11). The assistant
audio path now supports Groq (PlayAI playai-tts), OpenRouter
(Kokoro hexgrad/kokoro-82m), Cartesia (sonic-2), and Deepgram
(aura-2-thalia-en) in addition to OpenAI and Wyoming. Users on a
non-OpenAI primary can run the full record → STT → LLM → TTS loop
without obtaining a second key. CARTESIA_API_KEY and
DEEPGRAM_API_KEY already present in secrets.toml from STT usage
are reused automatically; the wizard's TTS picker orders providers
with stored keys first.
Optional assistant extras. Two new [assistant] toggles surface
in the wizard's Optional extras MultiSelect when the chosen primary
supports them: prefer_vision swaps the assistant chat model for the
provider's multimodal variant (OpenAI / Anthropic / Groq / Gemini),
and prefer_web_search attaches the provider's native web-search
tool to every assistant request (OpenAI's web_search_preview,
Anthropic's web_search_20250305; Gemini's google_search is
catalogued for forward compatibility). Both default to false.
Desktop notifications for critical pipeline failures. Total STT
pipeline failures (auth errors, network errors, 5xx) and LLM-cleanup
auth-class failures now fire a Critical-urgency desktop notification
in addition to the existing error!/warn! log line, so an
expired API key is no longer silently buried in journalctl. Dedup
is per-session and per (stage, provider, error class), so a stuck
key pops exactly once per F8/F9 press and an STT-auth + LLM-auth
failure in the same session each get their own surface. LLM
transient errors (network blips, 5xx) keep the existing silent
fallback to the raw STT transcript — only configuration-class
failures pop a notification.
Critical-failure notification coverage extended (issue #8). TTS
(assistant-mode reply playback), Assistant chat (both stream-open
and mid-stream errors), and text-injection failures now route
through the same critical_notify surface as STT/LLM, so a
rotated API key in any stage produces a Critical-urgency popup
instead of journal-only output. The LLM cleanup path also now
notifies on Network-class failures (previously Auth-only), so
an offline endpoint is visibly surfaced.
Daemon-startup-failure notification. When fono daemon exits
with an error (bad config, locked single-instance socket, hotkey
backend init failure), a one-shot Critical-urgency notification
fires before the process exits, pointing the user at
journalctl --user -u fono and fono doctor. This addresses the
systemd --user autostart case where stderr is invisible.

Changed

Assistant extras default policy. prefer_vision stays
default-on (no API impact — the multimodal model is the same model
on OpenAI/Anthropic, just with image input capability advertised).
prefer_web_search now defaults off: the only provider whose
chat/completions API supports it natively today is Anthropic, and
OpenAI's chat/completions endpoint hard-rejects the
web_search_preview descriptor. The default flips back to true
once the OpenAI client migrates to the Responses API.
Wizard first-run UX corrections (pre-release polish).
- The step-1 path picker is now a fixed-order two-column table
  (Local / Cloud / Customize) instead of a tier-dependent
  paragraph-shaped list. Column padding is computed from the
  longest option name + 2 spaces so future variants stay aligned.
- The language picker is skipped entirely when the OS reports at
  least one detected language; the picker only renders for the
  zero-detection fallback. A one-line info trace records the
  detected codes and points the user at the tray's Languages
  submenu for editing.
- The "Enable live dictation?" question is dropped from every
  branch — the tray's existing toggle is the editing surface, and
  config.interactive.enabled already defaults to false.
- The cloud-assistant fast-path is now automatic: when the chosen
  primary covers chat, the assistant is enabled without a Confirm.
  Two info lines state the configuration; pick_tts_for_assistant
  still runs when no TTS was set, and prompt_assistant_extras
  keeps vision / web-search as explicit opt-ins. The legacy
  Confirm("Enable the voice assistant?") survives only for the
  local-LLM branch where no catalogue primary matches.
Wizard cloud branch collapsed onto a single primary-provider
picker (#9). Picking OpenAI or Groq now configures STT, LLM
cleanup, the voice assistant, and TTS from one API-key prompt;
picking Anthropic / Cerebras / OpenRouter configures LLM + Assistant
and asks an opt-in follow-up only for the capabilities the primary
doesn't cover. The wizard label list shows runtime-derived capability
badges (STT · LLM · Assistant · TTS · Vision · Search), capped at
six per row.
PathChoice::Mixed renamed to PathChoice::Customize. The
advanced wizard branch now appears in the top-level menu as
"Customize each capability (advanced)". Legacy configs that still
carry mixed semantics continue to load — there is no on-disk
enum to migrate.
Re-running the wizard reuses stored keys silently. Every
cloud-key prompt now routes through prompt_or_reuse_key, which
prints a single reusing <KEY> from secrets.toml line instead of
re-asking. A returning user with a populated secrets.toml sees
zero key prompts on a wizard re-run.
Cascade cap on critical notifications (issue #8). When a single
root cause (e.g. a rotated cloud API key) cascade-fails through
STT → LLM → Assistant → TTS in the same dictation session, the
user now sees exactly one notification — the first stage to
fail — instead of one per stage. Downstream failures still go to
the journal at warn!. The cap auto-resets at the start of each
new F8/F9/F10 press and after 120 s of dictation inactivity.
Stage is now #[non_exhaustive] so future stages can be added
without breaking matches.
Hotkeys auto-detect toggle vs push-to-talk per press. A short tap
(under one second) on the dictation or assistant hotkey toggles
recording on; pressing-and-holding for at least a second flips the
same key into push-to-talk and recording stops on release. The
global [hotkeys].mode = "toggle" | "hold" setting is removed —
there is now one consistent behaviour across both keys with no
configuration required.

Removed

[hotkeys].mode configuration field. Old configs that still set
mode = "toggle" or mode = "hold" continue to load (serde
silently ignores the unknown field); the value has no effect. The
HotkeyMode enum is dropped from fono_core::config.

Full Changelog: https://github.com/bogdanr/fono/compare/v0.7.1...v0.8.0

Breaking Changes

[interactive].enabled config field removed — no migration provided; all related logic now driven by overlay.style == Transcript.
`[hotkeys].mode` configuration field removed; hotkey behavior is now auto-detected and cannot be overridden.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Fono

Get notified when new releases ship.

About Fono

All releases →

Fono

Summary

Changed

Removed

Added

Fixed

Added

Changed

Fixed

Added

Fixed

Added

Changed

Removed

Breaking Changes

Related context

Related tools