Skip to content

Fono

v0.8.0 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

assistant dictation linux llm local-first rust
+5 more
speach-to-text stt vulkan whisper wyoming

Summary

AI summary

Updates docs/decisions/0026-live-preview-as-overlay-style.md, greyed-out, and docs/decisions/0025-cloud-provider-catalogue.md across a mixed release.

Full changelog

Changed

  • Live preview is now a waveform style, not a separate toggle. The
    tray "Waveform style" submenu gains a fifth entry — Transcript (live preview — more CPU / tokens) — that replaces the old
    config-file-only [interactive].enabled flag. Picking Transcript
    both swaps the overlay to streaming text and routes the
    dictation hotkey through the live pipeline (this is the fix for
    "live transcription only worked for the assistant, not for
    dictation"). Fft remains the first-run default; live preview stays
    opt-in because it costs more CPU on local STT and more tokens on
    any cloud backend that bills per-second of streamed audio.
    Internally Config::live_preview() is the single source of truth,
    defined as overlay.style == Transcript. See
    ADR 0026.

Removed

  • [interactive].enabled config field (Fono has no users yet, so no
    migration is provided — the field is just gone). The rest of the
    [interactive] block — boundary heuristics, drain grace,
    cleanup_on_finalize, prosody/filler vocab, chunk timing — stays
    put as streaming-pipeline tuning that applies whenever Transcript is
    active.

Added

  • scripts/capture-overlay.sh — reproducible overlay screencast
    helper for the README. Three modes: overlay (tight 640×≤240 crop),
    paste (overlay + target-app window for "lands in a real app"
    demos), and gallery (records each waveform style — bars,
    oscilloscope, FFT, heatmap — labels them, and stitches the clips
    via ffmpeg -f concat or a 2×2 xstack grid). Detects
    X11 vs Wayland, resolves monitor geometry via xrandr / wlr-randr /
    swaymsg, encodes MP4 + GIF (palette pipeline with 5 MB soft / 9.5 MB
    hard budget auto-tiering) + animated WebP, and probes deps with
    per-distro install hints. Dev-only; not part of the shipped binary.
    See docs/troubleshooting.md → "Capturing screencasts".

  • Onboarding auto-start and contextual tray left-click. Three
    small UX changes that turn the first-launch path into a one-command
    experience:

    1. sudo fono install (and therefore curl -fsSL https://fono.page/install | sh) now starts fono in the
      background as the invoking user — picked up from $SUDO_USER
      and launched via runuser/sudo with setsid detachment — and
      then runs the fono setup wizard interactively in the same
      terminal (also as $SUDO_USER, with stdio inherited so the
      prompts reach the user). Running the installer as bare root (no
      sudo wrapper) is a fully supported path: fono spawns and the
      wizard runs as root, writing under /root/.config/fono/ — fono
      is allowed to run as root if that's what you want.
      packaging/install.sh re-attaches </dev/tty to the install
      invocation under the curl | sh transport so the wizard's
      stdin still has a real terminal when curl is piping the script
      in. The backgrounded daemon's stdout/stderr now append to
      $XDG_STATE_HOME/fono/fono.log (typically
      ~/.local/state/fono/fono.log, or /root/.local/state/fono/fono.log
      for the bare-root install path) — matching Paths::log_file()
      so tail -f and what fono itself considers its log path are the
      same file. Previously the spawn redirected to /dev/null, which
      made post-install troubleshooting needlessly hard. Each step now
      reports a precise outcome (started / setup completed / skipped
      because headless / spawn failed) so users always know exactly
      what happened. Skipped on headless boxes (no
      DISPLAY/WAYLAND_DISPLAY/XDG_RUNTIME_DIR) and bypassable
      with FONO_INSTALL_NO_START=1 for packagers and CI. The XDG
      autostart entry still handles next-login start. The server-mode
      install path is unchanged — systemd's systemctl enable --now
      was already starting the unit (logs via journalctl -u fono.service).
    2. The daemon now fires a single low-urgency desktop notification
      on startup when no TTS backend is configured, prompting the user
      to run fono setup. Once per process; suppressed once setup
      completes (the daemon's IPC Reload hook refreshes the
      onboarding snapshot atomically so no restart is required).
    3. The tray icon's SNI left-click is now contextual: when TTS is
      not yet configured it nudges toward fono setup; once configured
      it shows the current hotkey cheat sheet (dictation / assistant /
      cancel). The "Show last transcription" menu entry continues to
      work for users who want it; the left-click no longer fires that
      action.

    Implemented without adding any config field — the question "is setup
    finished?" is answered by the new Config::tts_configured(&Secrets)
    helper, which folds the existing configured_tts_backends logic.
    packaging/install.sh is now the canonical source for the
    https://fono.page/install one-liner and lives next to the binary
    it ships.

  • Unified log file at /var/log/fono.log. Single-user-box
    convention: every fono process writes there (world-writable 0666,
    pre-created by fono install). Paths::log_file() now points at
    that path. The daemon's tracing formatter forces ANSI on, so the
    file preserves colors. fono doctor appends the last 10 log lines
    to its report; fono doctor -f (or --follow) streams the file in
    real time via tail -F, ANSI escapes intact. The background spawn
    in fono install falls back to /dev/null if /var/log/fono.log
    is not writable, so a permissions hiccup never blocks startup.

  • Colorized fono doctor output. Section headers in bold cyan,
    ready / present / exists in green, FAIL / MISSING /
    FAILED TO LOAD / NONE in bold red, disabled / (unset) /
    (fallback) dimmed, active-provider * highlighted. Auto-disabled
    when stdout is not a TTY (pipes, redirects, CI) and when NO_COLOR
    is set, so scripts parsing the output remain unaffected.

  • Animated "POLISHING" overlay for local STT/LLM. The
    standalone-waveform overlay's post-release phase used to show a
    static "POLISHING" panel while STT (and optional LLM cleanup) ran;
    with a local whisper.cpp backend that's a 1–3 s dead patch where
    the user has no signal the dictation is actually progressing. The
    overlay now reuses the assistant's per-style thinking animation
    (FFT bell sweep, neural-strand heatmap, oscilloscope standing
    wave, centre-out bars) during that phase whenever the active STT
    backend reports is_local() — or whenever LLM cleanup is enabled
    and the LLM is local. Cloud STT+LLM (sub-second) keep the static
    panel so it doesn't just flash. Implemented via a new
    OverlayState::Polishing variant that shares the amber accent +
    "POLISHING" label with the existing Processing state but is
    consumed by the same synthetic-frame renderer path as
    AssistantThinking. New default is_local() method on both the
    SpeechToText and StreamingStt traits (also TextFormatter),
    overridden to true only in the whisper-local and llama-local
    backends.

Fixed

  • OpenRouter TTS default swapped from openai/gpt-4o-mini-tts-… to
    openai/tts-1
    (default voice alloy). The LLM-based
    gpt-4o-mini-tts model produced higher-quality voices but its
    streaming output was not reliably forwarded by OpenRouter's
    /audio/speech proxy: the proxy flushed an ~9.6 KB preamble and
    then buffered the rest of the synthesised body until upstream
    finished (~30+ s for a typical 200-character reply), exceeding
    every reasonable client timeout. Verified via the fono.http
    instrumentation's one-shot stall hex dump — bytes were valid PCM,
    just never delivered. Classical tts-1 produces audio in
    ~0.5-2 s regardless of length and the whole body is forwarded in
    one go, sidestepping the proxy-buffering problem entirely. Users
    who want the LLM-based voice can pin
    [tts.cloud] model = "openai/gpt-4o-mini-tts-2025-12-15" in
    config.toml and accept the failure mode on long replies, or
    switch to OpenAI direct (where streaming works correctly).

  • OpenRouter TTS second-sentence stalls eliminated by disabling
    HTTP/2 connection-pool reuse on the TTS client. Previously, the
    first sentence of an assistant turn synthesised correctly but every
    subsequent sentence stalled identically (~9.6 KB chunk arrived,
    then 15 s of silence, then watchdog fired) — symptomatic of
    OpenRouter's /audio/speech proxy mishandling multiplexed HTTP/2
    streams. The TTS reqwest client now runs with
    pool_max_idle_per_host(0) and http1_only(), forcing a fresh
    TCP+TLS handshake per request (~200-400 ms overhead, negligible
    against multi-second LLM-based synthesis). Other backends (LLM,
    STT, assistant chat) keep their HTTP/2 pooling because no
    equivalent stall pattern was observed there.

  • TTS inter-chunk watchdog set to 20 s. Empirically OpenRouter's
    /audio/speech proxy delivers a small preamble (~9.6 KB across ~8
    chunks) and then pauses for several seconds before resuming the
    audio stream proper. The previous 5 s watchdog tripped during that
    pause and produced false-stall failures on otherwise-healthy
    synthesis; 20 s keeps headroom for that pause while still catching
    genuinely wedged connections far faster than the overall 30 s
    request timeout. A one-shot warn!-level hex dump of the partial
    body fires on the first TTS stall per process lifetime, surfacing
    whether the preamble bytes are SSE framing, JSON metadata, or
    genuine PCM — diagnostic data for the next round of investigation.

  • Structured-log chunks field now reports the truth on stalled
    / transport-error outcomes. Previously hardcoded to 0 in the TTS,
    LLM, and STT consumers, which made it impossible to distinguish
    "proxy sent one chunk then hung" from "nothing ever arrived" in
    fono.http=debug logs. New BodyError::chunks() and
    BodyError::after_ms() accessors expose the underlying watchdog
    state to all consumers uniformly.

  • OpenRouter TTS time-to-first-audio collapsed from ~30 s to ~2-4 s
    by sending stream_format: "audio" on /audio/speech requests for
    models that benefit from it (OpenRouter's gpt-4o-mini-tts and
    OpenAI direct). Without this field, OpenAI's LLM-based TTS models
    buffer the entire synthesis server-side before opening the response
    body — visible in the fono.http instrumentation as a ~30 s
    headers_ms followed by a ~200 ms body_ms. With it, the upstream
    streams raw audio bytes as they are generated and headers_ms drops
    to sub-second. The catalogue gates the new field per provider:
    enabled for OpenAI and OpenRouter, intentionally omitted for Groq's
    Orpheus deployment (which is conservative about unknown request
    fields). Classical models like tts-1 are unaffected — they already
    stream by default and accept the field as a no-op.

Added

  • Structured HTTP instrumentation across every cloud-backed
    pipeline
    (STT transcribe, LLM cleanup chat, voice-assistant
    streaming chat, TTS /audio/speech, wizard key validation). A new
    fono-http crate provides a single per-stage stopwatch
    (RequestTimings), an inter-chunk body watchdog
    (read_body_with_watchdog), and one chokepoint
    (emit_http_debug) that funnels every consumer through the same
    schema (stage, provider, endpoint, status, headers_ms,
    ttfb_ms, body_ms, decode_ms, total_ms, body_bytes,
    content_length, chunks, request_id, attempt, outcome).
    Silent by default; opt in per session with
    RUST_LOG=info,fono.http=debug fono daemon. Detects stalled
    bodies in 15-30 s (per stage) rather than waiting for the global
    60 s reqwest timeout, surfaces the upstream x-request-id /
    request-id on every response (success and failure), and on TTS
    retries once automatically when the upstream stalls mid-body
    (typical OpenRouter proxy hiccup). The improved error surface for
    stalled TTS now reads e.g. openrouter TTS body read failed (request_id=or-…, attempt=2) instead of the previous bare
    reading openrouter TTS response body. Per-stage chunk watchdogs:
    TTS 15 s (overall cap reduced from 60 s to 30 s), STT 30 s, LLM
    cleanup 30 s, assistant SSE 20 s inter-event.

  • OpenRouter app attribution is now sent on every outbound
    request to openrouter.ai (STT transcribe + prewarm, LLM chat +
    prewarm, voice-assistant chat stream + prewarm, TTS
    /audio/speech, and the wizard's validate_cloud_key probe),
    not just from the STT backend as before. The three static headers
    are HTTP-Referer: https://fono.page,
    X-OpenRouter-Title: Fono, and
    X-OpenRouter-Categories: personal-agent,writing-assistant
    identical across every install, no per-user or per-machine
    identifier embedded, no request body changes. Fono now appears on
    https://openrouter.ai/rankings, in the "Apps" tab of each model
    it routes through, and gets a public dashboard at
    https://openrouter.ai/apps?url=https://fono.page. The previous
    STT-only attribution used the GitHub repo URL as the Referer; the
    switch to fono.page is a deliberate one-time reset onto the
    canonical project homepage. See
    https://openrouter.ai/docs/app-attribution and the new
    fono_core::openrouter_attribution module.

  • fono setup now hot-reloads the daemon when it finishes.
    Previously, running the wizard while fono was already running
    saved the new config but the daemon kept using the old one until
    manually restarted. The wizard now sends Request::Reload over
    IPC after config.toml / secrets.toml are written, and prints
    Daemon reloaded — new settings are live. (or a friendly
    fallback hint when no daemon is running).

  • Desktop notification when a configured backend's API key is
    missing at startup or after a config reload.
    Previously, a
    rotated key or a wizard pick whose secret hadn't been added yet
    surfaced only as a single tracing::WARN line (e.g. TTS unavailable; assistant replies will be silent: Cartesia TTS API key "CARTESIA_API_KEY" not found in secrets.toml or environment).
    A new ErrorClass::MissingKey variant is now classified from
    reload errors and fired as a Critical-urgency popup with copy
    that names the env var and the fono keys add <KEY> command.
    Wired through the LLM / TTS / Assistant reload paths; subject to
    the existing session cascade cap.

Changed

  • OpenRouter TTS default swapped from hexgrad/kokoro-82m to
    openai/gpt-4o-mini-tts-2025-12-15
    for native multilingual output
    (default voice coral, $0.60 / 1 M characters). Kokoro voices are
    monolingual and prefixed by language code, so every non-English
    synthesis was routed through an American-English voice; OpenAI Mini
    TTS speaks French, German, Spanish, Romanian, Mandarin, etc.
    natively with no per-call language argument or per-language voice
    map needed. Existing users who prefer Kokoro can pin
    [tts.cloud] model = "hexgrad/kokoro-82m" and
    voice = "af_heart" in config.toml; full Kokoro support is
    deferred to a future local+cloud-symmetric backend (see
    plans/2026-05-14-kokoro-local-and-cloud-parity-v1.md).

  • Voice assistant wizard step now renders as an aligned three-
    column table
    (Provider · Model · Key). Model names are
    human-readable (GPT-5.4 mini, Claude Haiku 4.5,
    GPT-OSS 120B, Qwen 3 235B, …) rather than raw catalogue ids,
    and the key-status column reads set / missing instead of the
    earlier (key already set) / (will ask for key) parenthetical.

  • Assistant TTS auto-picked from the same key. When the chosen
    assistant chat provider also offers TTS (e.g. OpenAI for both),
    the wizard reuses the same provider + key for the spoken reply
    and prints TTS: <provider> (same key as the assistant — no extra prompt). instead of running the explicit TTS picker. The
    picker still runs when the chat provider has no TTS capability.

  • Comfortable-tier first-run latency budget bumped from 1500 ms
    to 2000 ms.
    The earlier 1.5 s ceiling tripped first-dictation
    warnings on perfectly usable mid-range hardware (laptops on
    battery, slower SSDs). 2.0 s reflects measured p50 latency on
    the lower end of the Comfortable tier; tiers above it (HighEnd
    600 ms / Recommended 1000 ms) are unchanged.

  • Tray TTS submenu drops the redundant cloud, prefix and greys
    out unavailable backends.
    Every cloud backend was annotated
    (cloud, will ask for key) or (cloud, key already set) — but
    clicking the entry never asked for a key, so the message was
    misleading. The submenu now shows backends whose key is missing
    as non-clickable (greyed-out) rows with a plain (no key)
    suffix; backends with a configured key remain clickable. A new
    DISABLED_SENTINEL prefix in fono-tray lets daemon submenus
    opt rows out of activation without per-row plumbing.

Fixed

  • Groq TTS rejected response_format: pcm with HTTP 400
    (response_format must be one of [wav]).
    Groq's Orpheus
    deployment only emits WAV-wrapped audio. The OpenAI-compat TTS
    client now reads its response_format from the catalogue
    (OpenAiCompat { base_url, response_format }) and strips the
    RIFF/WAVE header transparently when the provider returns WAV,
    yielding the same raw 24 kHz int16 LE PCM the playback path
    expects. OpenAI and OpenRouter keep pcm (lowest latency).

  • Groq TTS rejected the default voice (tara) with HTTP 400
    (voice must be one of the following voices: [autumn diana hannah austin daniel troy]).
    Fono's catalogue defaulted to tara,
    which is part of Canopy Labs' open-source Orpheus voice set but
    not part of Groq's hosted six-voice subset for
    canopylabs/orpheus-v1-english. The Groq TTS default voice is
    now hannah (neutral female, in Groq's curated set). Users with
    an explicit [tts.cloud.groq].voice override pinned to a Canopy-
    only voice (tara/leah/jess/leo/dan/mia/zac/zoe)
    must edit to one of autumn/diana/hannah/austin/daniel/
    troy to get audio out of Groq.

Added

  • Desktop notification when a TTS/STT/LLM/assistant model requires
    terms acceptance.
    Providers like Groq return HTTP 400 with
    model_terms_required when an org admin hasn't accepted a model's
    terms (e.g. Orpheus, PlayAI). The critical-notify classifier now
    recognises that shape as a new TermsRequired class, and the
    notification body embeds the acceptance URL extracted from the
    provider response so the user can click straight through to the
    console. Subject to the existing session cascade cap.

Fixed

  • Anthropic LLM cleanup 400 stop_sequences: each stop sequence must contain non-whitespace. The client was sending
    stop_sequences = ["\n\n"] which Anthropic now rejects. The
    blank-line heuristic is dropped; cleanup output length is bounded by
    max_tokens = 512 alone.

  • Groq assistant returned 404 (model_not_found) because the
    catalogue advertised llama-4-maverick-17b-128e-instruct as Groq's
    multimodal model and the new default of prefer_vision = true
    caused the runtime to swap to it. That model isn't available on
    Groq today. Groq's multimodal_model is now None; the assistant
    uses openai/gpt-oss-120b (the existing text_model) for every
    Groq request.

  • Groq TTS model decommissioned. The previously catalogued
    playai-tts model (voice Fritz-PlayAI) was retired by Groq and
    now returns model_not_found. Groq's catalogue entry now points
    at canopylabs/orpheus-v1-english (Canopy Labs' Orpheus, OpenAI-
    compatible audio/speech on Groq) with default voice tara. The
    endpoint URL and auth header are unchanged.

  • OpenAI assistant requests rejected by chat/completions when
    prefer_web_search was on (Invalid value: 'web_search_preview').

    The web_search_preview tool descriptor is Responses-API-only;
    chat/completions rejects unknown tool types with a 400. OpenAI's
    catalogue entry now advertises web_search = None; the default of
    [assistant].prefer_web_search has been flipped to false.
    Anthropic's web_search_20250305 (Messages API) is unaffected. A
    future commit will re-enable OpenAI web search via the Responses
    API migration. As a defensive belt-and-braces, the OpenAI
    chat/completions client now drops any web-search tool descriptor
    at request build time and emits a one-shot tracing::warn! so a
    hand-edited prefer_web_search = true no longer surfaces a 400 to
    the user.

  • Cloud STT clients (OpenAI, Deepgram) were missing from the default
    build.
    crates/fono/Cargo.toml listed fono-stt and fono-llm
    with no feature selection, so the default release shipped only the
    per-crate default features (Groq + Wyoming STT, OpenAI-compat +
    Groq LLM). A user picking OpenAI as primary in the wizard hit a
    STT not compiled in warning at daemon startup. fono-stt is now
    built with groq + openai + deepgram + wyoming; fono-llm is
    built with cerebras + openai-compat + anthropic. The cloud-all
    meta-feature is widened to match. (Cartesia / AssemblyAI STT
    clients are not yet wired as fono-stt features — tracked
    separately.)

Added

  • Cloud provider capability catalogue. A single
    fono_core::provider_catalog::CLOUD_PROVIDERS table is the source of
    truth for which cloud providers offer STT / LLM cleanup / assistant
    chat / vision / web search / TTS. The wizard, tray, fono use cloud,
    and fono doctor all consume the catalogue, eliminating the five
    duplicated match blocks the wizard used to carry. (Phase A, #9; see
    ADR 0025.)
  • Multi-provider TTS for the voice assistant (#11). The assistant
    audio path now supports Groq (PlayAI playai-tts), OpenRouter
    (Kokoro hexgrad/kokoro-82m), Cartesia (sonic-2), and Deepgram
    (aura-2-thalia-en) in addition to OpenAI and Wyoming. Users on a
    non-OpenAI primary can run the full record → STT → LLM → TTS loop
    without obtaining a second key. CARTESIA_API_KEY and
    DEEPGRAM_API_KEY already present in secrets.toml from STT usage
    are reused automatically; the wizard's TTS picker orders providers
    with stored keys first.
  • Optional assistant extras. Two new [assistant] toggles surface
    in the wizard's Optional extras MultiSelect when the chosen primary
    supports them: prefer_vision swaps the assistant chat model for the
    provider's multimodal variant (OpenAI / Anthropic / Groq / Gemini),
    and prefer_web_search attaches the provider's native web-search
    tool to every assistant request (OpenAI's web_search_preview,
    Anthropic's web_search_20250305; Gemini's google_search is
    catalogued for forward compatibility). Both default to false.
  • Desktop notifications for critical pipeline failures. Total STT
    pipeline failures (auth errors, network errors, 5xx) and LLM-cleanup
    auth-class failures now fire a Critical-urgency desktop notification
    in addition to the existing error!/warn! log line, so an
    expired API key is no longer silently buried in journalctl. Dedup
    is per-session and per (stage, provider, error class), so a stuck
    key pops exactly once per F8/F9 press and an STT-auth + LLM-auth
    failure in the same session each get their own surface. LLM
    transient errors (network blips, 5xx) keep the existing silent
    fallback to the raw STT transcript — only configuration-class
    failures pop a notification.
  • Critical-failure notification coverage extended (issue #8). TTS
    (assistant-mode reply playback), Assistant chat (both stream-open
    and mid-stream errors), and text-injection failures now route
    through the same critical_notify surface as STT/LLM, so a
    rotated API key in any stage produces a Critical-urgency popup
    instead of journal-only output. The LLM cleanup path also now
    notifies on Network-class failures (previously Auth-only), so
    an offline endpoint is visibly surfaced.
  • Daemon-startup-failure notification. When fono daemon exits
    with an error (bad config, locked single-instance socket, hotkey
    backend init failure), a one-shot Critical-urgency notification
    fires before the process exits, pointing the user at
    journalctl --user -u fono and fono doctor. This addresses the
    systemd --user autostart case where stderr is invisible.

Changed

  • Assistant extras default policy. prefer_vision stays
    default-on (no API impact — the multimodal model is the same model
    on OpenAI/Anthropic, just with image input capability advertised).
    prefer_web_search now defaults off: the only provider whose
    chat/completions API supports it natively today is Anthropic, and
    OpenAI's chat/completions endpoint hard-rejects the
    web_search_preview descriptor. The default flips back to true
    once the OpenAI client migrates to the Responses API.

  • Wizard first-run UX corrections (pre-release polish).

    • The step-1 path picker is now a fixed-order two-column table
      (Local / Cloud / Customize) instead of a tier-dependent
      paragraph-shaped list. Column padding is computed from the
      longest option name + 2 spaces so future variants stay aligned.
    • The language picker is skipped entirely when the OS reports at
      least one detected language; the picker only renders for the
      zero-detection fallback. A one-line info trace records the
      detected codes and points the user at the tray's Languages
      submenu for editing.
    • The "Enable live dictation?" question is dropped from every
      branch — the tray's existing toggle is the editing surface, and
      config.interactive.enabled already defaults to false.
    • The cloud-assistant fast-path is now automatic: when the chosen
      primary covers chat, the assistant is enabled without a Confirm.
      Two info lines state the configuration; pick_tts_for_assistant
      still runs when no TTS was set, and prompt_assistant_extras
      keeps vision / web-search as explicit opt-ins. The legacy
      Confirm("Enable the voice assistant?") survives only for the
      local-LLM branch where no catalogue primary matches.
  • Wizard cloud branch collapsed onto a single primary-provider
    picker (#9).
    Picking OpenAI or Groq now configures STT, LLM
    cleanup, the voice assistant, and TTS from one API-key prompt;
    picking Anthropic / Cerebras / OpenRouter configures LLM + Assistant
    and asks an opt-in follow-up only for the capabilities the primary
    doesn't cover. The wizard label list shows runtime-derived capability
    badges (STT · LLM · Assistant · TTS · Vision · Search), capped at
    six per row.

  • PathChoice::Mixed renamed to PathChoice::Customize. The
    advanced wizard branch now appears in the top-level menu as
    "Customize each capability (advanced)". Legacy configs that still
    carry mixed semantics continue to load — there is no on-disk
    enum to migrate.

  • Re-running the wizard reuses stored keys silently. Every
    cloud-key prompt now routes through prompt_or_reuse_key, which
    prints a single reusing <KEY> from secrets.toml line instead of
    re-asking. A returning user with a populated secrets.toml sees
    zero key prompts on a wizard re-run.

  • Cascade cap on critical notifications (issue #8). When a single
    root cause (e.g. a rotated cloud API key) cascade-fails through
    STT → LLM → Assistant → TTS in the same dictation session, the
    user now sees exactly one notification — the first stage to
    fail — instead of one per stage. Downstream failures still go to
    the journal at warn!. The cap auto-resets at the start of each
    new F8/F9/F10 press and after 120 s of dictation inactivity.
    Stage is now #[non_exhaustive] so future stages can be added
    without breaking matches.

  • Hotkeys auto-detect toggle vs push-to-talk per press. A short tap
    (under one second) on the dictation or assistant hotkey toggles
    recording on; pressing-and-holding for at least a second flips the
    same key into push-to-talk and recording stops on release. The
    global [hotkeys].mode = "toggle" | "hold" setting is removed —
    there is now one consistent behaviour across both keys with no
    configuration required.

Removed

  • [hotkeys].mode configuration field. Old configs that still set
    mode = "toggle" or mode = "hold" continue to load (serde
    silently ignores the unknown field); the value has no effect. The
    HotkeyMode enum is dropped from fono_core::config.

Full Changelog: https://github.com/bogdanr/fono/compare/v0.7.1...v0.8.0

Breaking Changes

  • [interactive].enabled config field removed — no migration provided; all related logic now driven by overlay.style == Transcript.
  • `[hotkeys].mode` configuration field removed; hotkey behavior is now auto-detected and cannot be overridden.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Fono

Get notified when new releases ship.

Sign up free

Beta — feedback welcome: [email protected]