Skip to content

Fono

v0.7.0 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

assistant dictation linux llm local-first rust
+5 more
speach-to-text stt vulkan whisper wyoming

Summary

AI summary

Updates configurable, qwen-3-235b-a22b-instruct-2507, and openai/gpt-oss-120b across a mixed release.

Full changelog

Added

  • Voice assistant — F10 hold-to-talk, streaming chat, TTS playback.
    A second push-to-talk key (F10 by default) turns Fono into an
    offline-capable voice assistant. The pipeline diverges after STT:
    instead of cleaning the transcript and injecting it, Fono asks a
    chat-capable LLM, streams the reply sentence-by-sentence into a TTS
    backend, and plays the audio. First sentence starts speaking before
    the model finishes generating, so time-to-first-audio is bounded by
    one sentence's synth latency rather than the full reply.
  • [assistant] and [tts] config blocks. Independent backend
    selection from the [llm] cleanup pipeline — pick a fast local 3B
    for cleanup and a bigger cloud model for the assistant, or any
    mix-and-match. Multi-turn rolling history with a configurable time
    window (default 5 minutes) and max-turn cap (default 12). Pressing
    the dictation key clears assistant context (configurable);
    pressing F10 again mid-reply barges in with history retained;
    Escape stops playback ("shut up") without forgetting.
  • Cloud assistant backends. Anthropic (Claude Haiku 4.5) and the
    full OpenAI-compatible family — OpenAI (gpt-5.4-mini), Cerebras
    (qwen-3-235b-a22b-instruct-2507), Groq (openai/gpt-oss-120b),
    OpenRouter, Ollama. Each
    ships in the default binary; one feature flag per family lets slim
    builds drop unused providers.
  • Cloud cleanup model defaults refreshed to match retired and
    newly-released models: Cerebras llama3.1-8b, Groq
    openai/gpt-oss-20b, OpenAI gpt-5.4-nano, Anthropic
    claude-haiku-4-5-20251001. The OpenAI-compat client now sends
    max_completion_tokens (the new field name newer OpenAI models
    require; older models still accept it).
  • TTS backends. Wyoming protocol client (any
    wyoming-piper-style server on the LAN), the OpenAI
    /v1/audio/speech API (24 kHz PCM stream), and an in-process
    Piper stub that points users at Wyoming-piper for now (the
    static-musl ship build can't yet pull in onnxruntime). Audio
    playback uses paplay on the Linux release variant (no libasound
    link, matches the existing parec capture path) or cpal behind the
    cpal-backend feature.
  • CLI surface. fono use assistant <backend>,
    fono use tts <backend> [--uri tcp://host:port],
    fono assistant {press,release,stop} for scripted end-to-end
    testing.
  • Tray. New Stop assistant and Forget conversation entries;
    Assistant backend and TTS backend submenus mirror the existing
    STT/LLM submenus and switch backends live via Reload. Tray icon
    flips amber while the assistant is active.
  • Wizard. fono setup ends with an opt-in assistant + TTS step;
    reuses any cloud key already entered earlier in the flow so a
    single OPENAI_API_KEY powers both chat and TTS without a second
    prompt.
  • Doctor. fono doctor exercises both factories at startup so a
    missing API key or unreachable Wyoming server surfaces in one
    place; new Providers (assistant) and Providers (TTS) tables
    show key/URI status per backend with an active marker.
  • Overlay feedback for the assistant flow. Recording paints
    green ("ASSISTANT") with the chosen waveform style; the post-
    release thinking + speaking phase paints amber ("THINKING") with
    per-style synthetic animations distinct from the real-audio
    recording shape:
    • FFT — Gaussian "scanner" (σ ≈ 8 bins out of 100) sweeps
      across the panel; per-bin breathing baseline blends in via
      a screen composite so the bell emerges smoothly.
    • Bars — symmetric centre-out, peak at midline rippling
      outward.
    • Oscilloscope — two interfering sine waves with edge taper
      pinning x = 0 / x = 1 to the centerline; central antinode
      reaches ±1.0 without clipping.
    • Heatmap — two anti-phased Gaussian "neural strands"
      crossing over the rolling 6 s window; transitions seamlessly
      from recording-FFT data without clearing the cache.
      Default [overlay].style flipped from Bars → FFT — most active
      visualisation across both phases.
  • Runtime overlay style swap. Changing [overlay].style via
    fono use, the tray Waveform style submenu, or fono config edit now applies on the next frame instead of waiting for a
    daemon restart.
  • Smoke-test binary (cargo run --release --example smoke_assistant -p fono) exercises each cloud assistant + the
    OpenAI TTS path end-to-end. The release CI's new
    cloud-assistant job runs the --ci subset (Groq + Cerebras,
    the providers whose API keys are stored as GitHub Secrets).

Fixed

  • FSM stuck on a sub-300 ms F10 tap. Brief F10 taps released
    before MIN_RECORDING left the orchestrator's
    on_assistant_hold_release early-returning without firing
    ProcessingDone; the FSM sat in AssistantThinking forever and
    silently rejected subsequent F8/F9/F10 presses. Every early-return
    path now emits ProcessingDone; AssistantRecording also accepts
    ProcessingDone as a safety net.
  • Audio playback worker dying after every cancel. pb.stop()
    used to send Cmd::Stop which made the worker break out of its
    loop; the next turn's enqueue then failed with "audio playback
    worker stopped". Cmd::Drain now drains queued items + clears the
    abort flag without exiting the worker, so multi-turn conversations
    keep working across barge-ins, Forget, and dictation pivots.
  • Frozen overlay during the post-release phase. The level task
    was aborted in stop_and_drain the moment capture ended, leaving
    the waveform on its last pre-release frame for 4–5 s while STT +
    LLM ran. The overlay now switches into the synthetic thinking
    animation as soon as F10 is released, and the FFT thinking
    visualisation gets even-spaced inter-bar gaps via integer-aligned
    slot widths.

Full Changelog: https://github.com/bogdanr/fono/compare/v0.6.1...v0.7.0

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Fono

Get notified when new releases ship.

Sign up free

Beta — feedback welcome: [email protected]