Skip to content

Release history

speakr releases

Speakr is a personal, self-hosted web application designed for transcribing audio recordings

All releases

24 shown

v0.8.20-alpha Breaking risk
⚠ Upgrade required
  • Upgrade from v0.8.19-alpha or earlier promptly; if unable, mitigate by stripping the `next` query parameter on `/login` via a reverse proxy or blocking values that start with `//`, `\`, or contain a scheme.
Security fixes
  • CVE pending — open‑redirect fix in `is_safe_url()` (CWE-601)
Full changelog

v0.8.20-alpha — Security: open-redirect fix in is_safe_url

Security patch release on top of v0.8.19-alpha.

Fixed

  • Open redirect via the next parameter (CWE-601). The is_safe_url() helper validated urljoin(request.host_url, target) while redirect() was called with the raw target. A scheme-relative input such as ////evil.com resolved to a same-host URL during validation but was emitted verbatim in the Location header, where browsers interpret it as a network-path-relative redirect to an attacker-controlled host. is_safe_url() now validates the raw target against a local-path allowlist: leading / required, scheme-relative URLs (//, /\), backslashes, control characters, and any value that produces a scheme or netloc when parsed are rejected. The duplicate copy in src/api/auth.py was removed; password login and the SSO next / callback flow share one validator. Regression tests in tests/test_open_redirect.py.

    Reported by RacerZ and Fushuling. Tracked as a GitHub Security Advisory; CVE pending.

Tests

  • New tests/test_open_redirect.py — 7 cases covering scheme-relative URLs, absolute URLs, backslash variants, javascript: / data: schemes, CRLF/control-character injection, missing leading slash, and accepted local paths.
  • tests/test_transcription_model_override.py — pre-existing env-bleed flake fixed; the helper now isolates the call from any admin-saved transcription_default_model SystemSetting that may exist in a dev DB. 275 backend tests passing.

No new features, no breaking changes

Upgrade is the usual docker compose pull && docker compose up -d. Users on v0.8.19-alpha or earlier should upgrade promptly; the workaround for those who cannot is to front Speakr with a reverse proxy that strips next query parameters on the /login route, or block requests where next starts with //, \\, or contains a scheme.

v0.8.19-alpha Breaking risk

Vectorised chunk similarity search reduces per‑query latency from ~60 seconds to under 3 seconds and adds reliability retries for embedding failures.

Full changelog

v0.8.19-alpha — Inquire-mode performance and re-embed reliability

Patch release on top of v0.8.18-alpha. No new features, no breaking changes.

Performance

  • Vectorised chunk similarity search. The chunk search loop in Inquire mode previously called sklearn's cosine_similarity once per chunk, costing 13-20 seconds per enriched query on a ~17k-chunk library because each call had Python and sklearn boilerplate around what is ultimately one dot product. Replaced with a single batched np.vstack plus one cosine_similarity(query, matrix) call, with np.argpartition for top-k selection. Per-query search now completes in under one second; a 4-enriched-query inquire turn drops from roughly 60 seconds to 2-3 seconds end-to-end. Eliminates the chat UI timeout symptom that users with large libraries or slower embedding endpoints were hitting.
  • Dimension-mismatch warnings are folded into a single summary log line per search, instead of one warning per stale chunk, so a partially-migrated embedding configuration does not flood the log on every query.

Reliability

  • Embedding API retries. _api_embed now retries transient errors (rate limits, timeouts, 5xx, connection blips) with exponential backoff and jitter. Defaults to 3 attempts, tunable via EMBEDDING_API_MAX_RETRIES and EMBEDDING_API_BACKOFF_SECONDS. Auth and model-not-found errors fail fast since retrying will not help.
  • process_recording_chunks no longer silently loses chunks on partial API failure. Previously the function deleted the recording's existing chunks before calling generate_embeddings, then iterated zip(chunks, embeddings). If the embedding call returned fewer vectors than there were chunks (transient provider failure, exhausted retries), the zip yielded nothing and the function returned True with the deletion already committed. The recording was left with zero chunks. The function now verifies vector count matches chunk count and rolls back the transaction on mismatch, preserving the recording's existing chunks for a later retry.
  • Re-embed all retry passes. The admin Re-embed all loop now does up to two retry passes over any recording that failed in the first pass, with backoff between passes. Tunable via the retry_passes field in the request body. Combined with _api_embed's internal retries, a single click can survive several layers of transient provider failure.
  • Re-embed all picks up stale-chunk recordings regardless of status. The original query filtered strictly on status == 'COMPLETED'. Recordings that had stale chunks but were temporarily in another state at click time were silently skipped, leaving old vectors behind. The query now also matches any recording whose id appears in the transcript_chunk table, so existing stale vectors get refreshed even on recordings that are mid-reprocess.

No new features

This release is purely fixes and a performance improvement. Existing functionality is unchanged for users who were not affected by the issues above. Users hitting Inquire timeouts or stuck stale-chunk warnings should see both resolved after upgrading.

v0.8.17-alpha Breaking risk
⚠ Upgrade required
  • Nginx reverse‑proxy example updated: replace `proxy_set_header Connection "upgrade"` with `proxy_set_header Connection $http_connection` to avoid 500 errors on file uploads.
  • Documentation added for Nginx Proxy Manager: default `client_max_body_size` is 2000m; per‑host overrides go in `/data/nginx/proxy_host/ .conf`; global tweaks in `/data/nginx/custom/{http_top,server_proxy}.conf`.
Full changelog

v0.8.17-alpha — Bug fixes and CI maintenance

Patch release on top of v0.8.16-alpha. No new features, no breaking changes.

Fixed

  • Reprocess summary modal: the prompt-variables panel and the Append/Replace mode toggle now reflect the prompt source the user actually picked. Previously the panel showed the recording's original tag's variables even after the user switched to a different tag (or a custom prompt or the default), and the Append/Replace toggle was offered for the "Use prompt from tag" source where it does not apply (a tag selection always replaces by definition).
  • Docs — reverse-proxy nginx example: the example in getting-started/installation.md was setting proxy_set_header Connection "upgrade" unconditionally inside location /, which causes Gunicorn to return 500s on file uploads through the proxy. Replaced with proxy_set_header Connection $http_connection so the client's actual Connection header is forwarded (matches what Nginx Proxy Manager generates internally). Added a warning callout pointing at the one-word fix for anyone who copied the older example.
  • Docs — Nginx Proxy Manager section: new subsection covering NPM specifically. Documents that NPM's default client_max_body_size is 2000m (inherited from the bundled nginx.conf), where to put per-host overrides (the Advanced tab → /data/nginx/proxy_host/<id>.conf), and where global tweaks go (/data/nginx/custom/{http_top,server_proxy}.conf).

Infrastructure

  • GitHub Actions Node 24: bumped all action versions to clear the September 2025 Node 20 deprecation warnings. Affects actions/checkout, actions/setup-python, actions/upload-pages-artifact, actions/deploy-pages, actions/github-script, and the full docker/* family. No functional change.

Tests

276 backend tests passing plus 32 frontend Vitest tests, unchanged from v0.8.16-alpha.

v0.8.16-alpha Breaking risk
Notable features
  • Prompt template variables (`{{name}}`) allow agenda input on upload, substitution at summarisation, with caps (8k per value, 32k total).
  • Per‑upload / per‑tag / per‑folder transcription model selection via dropdown; admin can curate list from dashboard.
  • Full LLM prompt structure preview in admin Default Prompts page and user Customise‑prompts tab with colour‑coded placeholder chips.
Full changelog

v0.8.16-alpha — Prompt Templating, Transcription UX, and Observability

New

Prompt templating and summary control

  • Prompt template variables — tag, folder, user-default, and admin-default summary prompts can contain {{name}} placeholders. Selecting a tag with {{agenda}} exposes an agenda input on the upload form; the value is stored on the recording, substituted at summarisation time, and remains editable from the reprocess summary modal. Caps: 8,000 chars per value, 32,000 total. Single-pass re.sub substitution so values cannot introduce new placeholders or reach Python attributes.
  • Append vs Replace mode — the reprocess summary modal and the new Customise summary prompt modal each let you Append text to the resolved prompt or Replace it entirely. Append mode runs variable substitution after the append step so appended text can use the same {{var}} placeholders.
  • Customise summary prompt split-button (discussion #253) — a control next to Generate Summary opens the Append/Replace modal for recordings that don't have a summary yet, so one-off context (an agenda, custom focus instructions) can be passed in without rewriting your saved prompt.
  • Full LLM prompt structure preview — both the admin Default Prompts page and the user Customise-prompts tab now show the complete two-message payload (system prompt with context block, user message with transcription wrapper and language directive). Placeholder chips colour-code system tokens (blue, replaced by the framework) versus user-supplied variables (amber). The user-side preview re-renders live as you type into your custom prompt.

Per-recording transcription control

  • Per-upload / per-tag / per-folder transcription model selection (#266) — set TRANSCRIPTION_MODELS_AVAILABLE and the upload form, reprocess modal, and tag/folder edit forms gain a model dropdown. Optional TRANSCRIPTION_MODEL_LABELS for human-friendly names. Tag and folder edit forms warn if a previously-selected default is no longer in the configured list. The dropdown is hidden when only one option would be visible.
  • Admin-managed transcription model list — when the connector exposes /v1/models discovery, admins can curate the list from the dashboard rather than via env var. Stored in the database; overrides TRANSCRIPTION_MODELS_AVAILABLE when set.
  • WhisperX runtime model switching — the asr_endpoint connector forwards request.model as ?model=... on the WhisperX /asr call, so per-upload selection actually changes which model transcribes each file.
  • Per-connector capability gating — added HOTWORDS and INITIAL_PROMPT capabilities. Hotwords, initial-prompt, and speaker-count UI elements are hidden for connectors that don't support them, instead of accepting input that is silently ignored. Hotwords now show for OpenAI / Whisper / Azure / Mistral / VibeVoice with each connector mapping the field to its own underlying API.
  • Mistral Voxtral chunking (#267) — MISTRAL_ENABLE_CHUNKING=true plus MISTRAL_MAX_DURATION_SECONDS opts the Mistral connector into app-side chunked transcription for recordings approaching Voxtral's 3-hour timeout. Mistral does not return voice embeddings, so speakers are remapped per chunk.

ASR transcript editor

  • Autosave — saves edits 2 seconds after the last keystroke when the user opts in (Account → Preferences → Autosave editor).
  • Save without closing + Ctrl+S — new button keeps the editor open after saving; Ctrl+S triggers a save from anywhere in the editor.
  • Scroll memory — reopening the editor restores the previous scroll position instead of jumping to the top.
  • Double-click to edit — double-clicking a transcript row in the simple view jumps into the editor with that segment highlighted. The target row is briefly highlighted so it stands out.

Account preferences

  • Preferences tab — account settings has a new Preferences tab (split from the Account Information tab) using a two-column layout for transcript display, editor behaviour, and language preferences.
  • Compact timestamps in simple view — optional mm:ss (or h:mm:ss) timestamps in the simple transcript view, rendered as a two-part pill alongside the speaker label. The leading segment shows "Start" instead of 00:00.
  • Persist recording-list sort choice (discussion #263) — the Created date / Meeting date toggle now sticks across reloads and sessions on the same browser.

Embeddings and inquire mode

  • Configurable embedding model (#262) — EMBEDDING_MODEL swaps all-MiniLM-L6-v2 for any sentence-transformers model. Speakr records the model name on first startup and warns if it changes later.
  • OpenAI-compatible API mode for embeddingsEMBEDDING_BASE_URL, EMBEDDING_API_KEY, and EMBEDDING_DIMENSIONS route embeddings through any OpenAI-compatible provider (vLLM, OpenRouter, OpenAI, Together, etc.). Useful for the lite Docker image, low-RAM hosts, or consolidating providers. The Inquire startup banner reflects the active provider.
  • Re-embed all — admin Vector Store tab gained a Re-embed all action so you can rebuild the index after switching EMBEDDING_MODEL or EMBEDDING_BASE_URL.

Observability and admin

  • Per-operation token stats — admin Token Usage card splits into LLM and embedding panels with their own totals, charts, and per-operation breakdown (title, summary, chat, event extraction, embeddings).
  • Granular token budgetsTITLE_MAX_TOKENS and EVENT_MAX_TOKENS join the existing SUMMARY_MAX_TOKENS / CHAT_MAX_TOKENS so reasoning models that consume budget on hidden thinking tokens can be tuned per operation. The resolved max_tokens is logged with each LLM call.
  • LLM timeout diagnostics — configured LLM_REQUEST_TIMEOUT is logged at startup, and APITimeoutError log entries include elapsed time so it is clear whether the timeout was the actual bound that fired.

API v1

  • Folder CRUD endpoints — new /api/v1/folders for list, create, update, delete.
  • Connector discovery endpoint — exposes the active transcription connector and its capabilities for companion-app integrations.
  • Recording field parity (#274) — /api/v1/recordings and /api/v1/recordings/{id} now include audio_duration, transcription_duration_seconds, summarization_duration_seconds, folder_id, folder, events (detail only), deletion_exempt, prompt_variables, and the per-recording transcription model.
  • Forwarded per-request overrides/api/v1/recordings/{id}/transcribe accepts transcription_model, hotwords, and initial_prompt.

Localisation

  • Portuguese Brazilian translation (PR #271, lhpereira) — full pt-BR locale added, with backfill of all v0.8.16-alpha keys integrated during merge. All seven locales (en, fr, de, es, ru, zh, pt-BR) now sit at parity with zero missing and zero orphaned keys.
  • Locale parity cleanup — removed 149 stale keys from zh.json that no longer reference any code path, backfilled 10 keys missing from non-English locales, and added seven additional language codes (pl, uk, vi, th, tr, id, sv) to the transcription dropdown.

Fixed

  • Reprocessing applies tag/folder/user default hotwords + initial_prompt (#265) — previously these only flowed through at upload time. Reprocess now walks the same precedence chain, and the reprocess modal gained the two text fields (gated on the active connector's capabilities).
  • Language code normalization (#256) — old user records with transcription_language="français" were crashing WhisperX with HTTP 500. Added a normalize-on-save helper plus a one-shot migration that maps display names and locale codes to ISO 639-1 on upgrade.
  • Title generation Unicode escapes (#260) — for non-ASCII transcripts (Cyrillic, Chinese, etc.) titles were occasionally generated with literal \uXXXX escape sequences. Root cause was slicing the raw transcription JSON before parsing; the slice could land mid-Unicode-escape, the JSON parse failed, and the raw escapes leaked through. Fixed by formatting first, then truncating.
  • Reprocess modal hid hotwords / initial prompt / model dropdown for non-WhisperX connectors — the gating accidentally required connectorSupportsSpeakerCount for the entire block. Fixed via the new capability split.
  • Technical details panel always populated on transcription failures — when the ASR endpoint returns an HTTP error, Speakr now captures the upstream response body before raising, so the recording's "Technical details" section shows the real failure message (for example faster-whisper's "Invalid model size") instead of a bare status code.
  • Vector Store "recordings to process" message — Vue's custom ${...} delimiter was tripping over the nested braces in the i18n call; rewritten to use the t(key, params) parameter form.
  • CSRF token on the Preferences form — was missing, causing submissions to be rejected.
  • Test isolation — synthetic users and recordings created during the test suite are now cleaned up at module teardown so the dev DB stays free of leaked admin flags between runs.

Docs

  • New nginx reverse-proxy guidance: proxy_request_buffering off and client_max_body_size in the recommended config (resolves the 500-error class from #273)
  • Google Gemini OpenAI-compatible setup example for TEXT_MODEL_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/ (#254)
  • Prompt template variables guide in user-guide/settings.md
  • Per-upload / per-tag / per-folder model selection documentation in admin-guide/model-configuration.md
  • EMBEDDING_BASE_URL API mode documentation across inquire-mode, vector-store, and troubleshooting
  • ASR editor enhancements (autosave, Ctrl+S, scroll memory, double-click) and Append/Replace summary mode in user-guide/transcripts.md
  • Re-embed all action and embedding token tracking in admin-guide/vector-store.md
  • Per-operation token stats in admin-guide/statistics.md

Infrastructure

  • Vitest frontend tests — pure-helper modules in static/js/modules/utils/ are now covered by Vitest. Run npm test. Currently exercises the prompt-variable extraction and priority-chain logic.

Tests

276 backend tests passing plus 32 frontend tests, including new regression suites for the title truncation bug (#260), reprocess hotwords precedence (#265), language normalization (#256), API v1 parity (#274), the per-upload/tag/folder model override chain (#266), prompt-variable substitution, and the priority-chain helpers.

v0.8.15-alpha New feature
Notable features
  • Mistral/Voxtral Connector
  • VibeVoice Connector
  • Upload API metadata fields
v0.8.14-alpha New feature
Notable features
  • Fullscreen Video Mode
  • Custom Vocabulary (Hotwords)
  • Initial Prompt
v0.8.13-alpha Bug fix

Fixed large video files being silently stripped of video stream by implementing file-size-scaled probe timeout and extension-based fallback validation.

v0.8.12-alpha New feature
Notable features
  • Speaker name filter with bulk operation support
  • Auto-scroll follow-along mode for shared pages
v0.8.11-alpha New feature
Breaking changes
  • Speaker profiles now preserved by default instead of auto-deleted
Notable features
  • Video Retention
  • Parallel Uploads
  • Duplicate Detection
v0.8.10-alpha Bug fix

Fixed incognito recording visibility, language parameter handling in ASR, PostgreSQL syntax errors, UI z-index stacking, and added PostgreSQL integration testing.

v0.8.9-alpha Bug fix

Fixed scipy and scikit-learn version incompatibility that crashed the app on startup; pinned scipy version and made sklearn import conditional.

v0.8.8-alpha New feature
Notable features
  • Lightweight lite Docker image tag at ~725MB
  • Multi-stage Dockerfile with static ffmpeg
  • Improved text search fallback without embeddings
v0.8.7.2 Bug fix

Improved recording view UI with collapsible accordion sections for notes and upload settings, pinned action buttons always visible, smart defaults showing most relevant section, and navigation back to upload screen on discard.

v0.8.7.1-alpha Bugfix

Fixed transcription language setting being ignored when uploading files. User's default language from Account Settings now properly applies instead of falling back to auto-detect.

v0.8.7-alpha New feature
Notable features
  • Customizable Export Templates
  • Localized Export Labels
  • Consolidated Templates Tab
v0.8.6.1-alpha Bug fix

Fixed PostgreSQL migration syntax error for BOOLEAN columns and added proper handling for reserved SQL keywords in index creation.

v0.8.6-alpha New feature
Notable features
  • Folders Organization
  • Auto Speaker Labeling
  • Per-User Auto-Summarization
v0.8.5.1-alpha New feature
Notable features
  • Incognito Mode for Microphone Recordings
  • Default Incognito Configuration
v0.8.5-alpha New feature
Notable features
  • Multi-Select Mode
  • Incognito Mode
  • Playback Speed Control
v0.8.4 New feature
Notable features
  • Email verification for new registrations
  • Password reset via email with SMTP support
  • Registration domain restriction capability
v0.8.3-alpha New feature
Notable features
  • Custom naming templates with regex extraction
  • REST API v1 upload endpoint
  • Drag-and-drop tag reordering
v0.8.2-alpha New feature
Notable features
  • Per-user transcription minute budgets with alerts
  • Cost estimation for OpenAI and self-hosted ASR
  • Admin usage dashboard with per-user breakdowns
v0.8.0-alpha New feature
Notable features
  • Modular connector-based transcription with auto-detection
  • Complete REST API v1 with Swagger docs
  • Batch operations and chat API endpoints
v0.7.2-alpha New feature
Notable features
  • LLM token tracking with monthly budgets and alerts
  • Virtual scrolling for transcript segments
  • Audio player drag-to-seek and independent modals

Beta — feedback welcome: [email protected]