This release adds 3 notable features for engineering teams evaluating rollout.
Published 1mo
AI Agents & Assistants
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
assistant
dictation
linux
llm
local-first
rust
+5 more
speach-to-text
stt
vulkan
whisper
wyoming
Summary
AI summaryUpdates configurable, qwen-3-235b-a22b-instruct-2507, and openai/gpt-oss-120b across a mixed release.
Full changelog
Added
- Voice assistant — F10 hold-to-talk, streaming chat, TTS playback.
A second push-to-talk key (F10 by default) turns Fono into an
offline-capable voice assistant. The pipeline diverges after STT:
instead of cleaning the transcript and injecting it, Fono asks a
chat-capable LLM, streams the reply sentence-by-sentence into a TTS
backend, and plays the audio. First sentence starts speaking before
the model finishes generating, so time-to-first-audio is bounded by
one sentence's synth latency rather than the full reply. [assistant]and[tts]config blocks. Independent backend
selection from the[llm]cleanup pipeline — pick a fast local 3B
for cleanup and a bigger cloud model for the assistant, or any
mix-and-match. Multi-turn rolling history with a configurable time
window (default 5 minutes) and max-turn cap (default 12). Pressing
the dictation key clears assistant context (configurable);
pressing F10 again mid-reply barges in with history retained;
Escape stops playback ("shut up") without forgetting.- Cloud assistant backends. Anthropic (Claude Haiku 4.5) and the
full OpenAI-compatible family — OpenAI (gpt-5.4-mini), Cerebras
(qwen-3-235b-a22b-instruct-2507), Groq (openai/gpt-oss-120b),
OpenRouter, Ollama. Each
ships in the default binary; one feature flag per family lets slim
builds drop unused providers. - Cloud cleanup model defaults refreshed to match retired and
newly-released models: Cerebrasllama3.1-8b, Groq
openai/gpt-oss-20b, OpenAIgpt-5.4-nano, Anthropic
claude-haiku-4-5-20251001. The OpenAI-compat client now sends
max_completion_tokens(the new field name newer OpenAI models
require; older models still accept it). - TTS backends. Wyoming protocol client (any
wyoming-piper-style server on the LAN), the OpenAI
/v1/audio/speechAPI (24 kHz PCM stream), and an in-process
Piper stub that points users at Wyoming-piper for now (the
static-musl ship build can't yet pull in onnxruntime). Audio
playback usespaplayon the Linux release variant (no libasound
link, matches the existing parec capture path) or cpal behind the
cpal-backendfeature. - CLI surface.
fono use assistant <backend>,
fono use tts <backend> [--uri tcp://host:port],
fono assistant {press,release,stop}for scripted end-to-end
testing. - Tray. New Stop assistant and Forget conversation entries;
Assistant backend and TTS backend submenus mirror the existing
STT/LLM submenus and switch backends live via Reload. Tray icon
flips amber while the assistant is active. - Wizard.
fono setupends with an opt-in assistant + TTS step;
reuses any cloud key already entered earlier in the flow so a
single OPENAI_API_KEY powers both chat and TTS without a second
prompt. - Doctor.
fono doctorexercises both factories at startup so a
missing API key or unreachable Wyoming server surfaces in one
place; newProviders (assistant)andProviders (TTS)tables
show key/URI status per backend with an active marker. - Overlay feedback for the assistant flow. Recording paints
green ("ASSISTANT") with the chosen waveform style; the post-
release thinking + speaking phase paints amber ("THINKING") with
per-style synthetic animations distinct from the real-audio
recording shape:- FFT — Gaussian "scanner" (σ ≈ 8 bins out of 100) sweeps
across the panel; per-bin breathing baseline blends in via
a screen composite so the bell emerges smoothly. - Bars — symmetric centre-out, peak at midline rippling
outward. - Oscilloscope — two interfering sine waves with edge taper
pinning x = 0 / x = 1 to the centerline; central antinode
reaches ±1.0 without clipping. - Heatmap — two anti-phased Gaussian "neural strands"
crossing over the rolling 6 s window; transitions seamlessly
from recording-FFT data without clearing the cache.
Default[overlay].styleflipped from Bars → FFT — most active
visualisation across both phases.
- FFT — Gaussian "scanner" (σ ≈ 8 bins out of 100) sweeps
- Runtime overlay style swap. Changing
[overlay].stylevia
fono use, the tray Waveform style submenu, orfono config editnow applies on the next frame instead of waiting for a
daemon restart. - Smoke-test binary (
cargo run --release --example smoke_assistant -p fono) exercises each cloud assistant + the
OpenAI TTS path end-to-end. The release CI's new
cloud-assistantjob runs the--cisubset (Groq + Cerebras,
the providers whose API keys are stored as GitHub Secrets).
Fixed
- FSM stuck on a sub-300 ms F10 tap. Brief F10 taps released
beforeMIN_RECORDINGleft the orchestrator's
on_assistant_hold_releaseearly-returning without firing
ProcessingDone; the FSM sat inAssistantThinkingforever and
silently rejected subsequent F8/F9/F10 presses. Every early-return
path now emitsProcessingDone;AssistantRecordingalso accepts
ProcessingDoneas a safety net. - Audio playback worker dying after every cancel.
pb.stop()
used to sendCmd::Stopwhich made the workerbreakout of its
loop; the next turn's enqueue then failed with "audio playback
worker stopped".Cmd::Drainnow drains queued items + clears the
abort flag without exiting the worker, so multi-turn conversations
keep working across barge-ins, Forget, and dictation pivots. - Frozen overlay during the post-release phase. The level task
was aborted instop_and_drainthe moment capture ended, leaving
the waveform on its last pre-release frame for 4–5 s while STT +
LLM ran. The overlay now switches into the synthetic thinking
animation as soon as F10 is released, and the FFT thinking
visualisation gets even-spaced inter-bar gaps via integer-aligned
slot widths.
Full Changelog: https://github.com/bogdanr/fono/compare/v0.6.1...v0.7.0
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Fono
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]