Skip to content

samson-art/transcriptor-mcp

MCP Data & Storage

An MCP server that fetches video transcripts and metadata from 11 major platforms, with optional Whisper audio transcription as a fallback.

TypeScript Latest v1.0.0 · 1mo ago Security brief →

Features

  • Multi‑platform transcript extraction (YouTube, Twitter/X, Instagram, TikTok, Twitch, Vimeo, Facebook, Bilibili, VK, Dailymotion, Reddit)
  • Provides cleaned text and raw SRT/VTT subtitles with pagination for large responses
  • Whisper fallback transcription when subtitles are unavailable (local or OpenAI API)
  • Optional Redis caching to reduce yt‑dlp calls
  • Docker‑first deployment; can run locally via Docker or remotely through mcp‑proxy
  • Prometheus metrics on the optional Fastify REST API

Recent releases

View all 19 releases →
v1.0.0 New feature
Security fixes
  • Server card now sets `authentication.required: false` to avoid advertising unsupported OAuth schemes, deferring auth enforcement to edge‑layer token policies.
Notable features
  • Added `docs/edge-smithery-gate.md` and `docs/mcp-edge-rate-limit.md` with policies for `X-MCP-Api-Token`.
  • Added `scripts/generate-server-card.mjs` and npm scripts to auto‑generate `.well-known/mcp/server-card.json` after build (SEP-1649).
  • Updated MCP config schema in `.well-known/mcp-config` to document and map `apiToken` (`X-MCP-Api-Token`).
Full changelog

Added

  • MCP HTTP edge guidance: Added documentation and examples for deploying stdio mcp-proxy behind an external edge (reverse proxy or API gateway) with token auth and traffic control.
  • Edge/operator guides: Added docs/edge-smithery-gate.md and docs/mcp-edge-rate-limit.md with concrete policies for X-MCP-Api-Token, Smithery-shaped traffic gating, and reverse-proxy rate-limit strategies.
  • Build-time server-card generation: Added scripts/generate-server-card.mjs and npm scripts (generate:server-card, postbuild) to produce .well-known/mcp/server-card.json automatically after build for SEP-1649/Smithery discovery.
  • MCP config schema support for apiToken: .well-known/mcp-config now documents and maps apiToken (X-MCP-Api-Token) in addition to authToken.

Changed

  • Smithery session config contract: smithery.yaml now separates authToken (Authorization/Bearer for self-hosted edge auth) from apiToken (X-MCP-Api-Token for token pools/quotas), with explicit header mapping metadata.
  • Docs alignment around MCP architecture: README and docs now consistently describe this repo’s MCP model as stdio + external mcp-proxy, clarify that Node app RATE_LIMIT_* applies to REST API only, and move MCP auth/rate-limit responsibilities to infrastructure edge layers.
  • Monitoring documentation scope: docs/monitoring.md clarifies that /metrics is exposed by the REST API only, while MCP-over-HTTP observability belongs to proxy/WAF metrics, logs, or Sentry.
  • Quick-start and public-url guidance: MCP quick-start/public URL docs now include stronger guidance for edge auth, /mcp and /sse protection, and safer .well-known behavior for catalog discovery.
  • Pre-commit checks: .husky/pre-commit now runs make prepare && make check-no-smoke.

Security

  • Safer MCP auth signaling in server card: Generated server card keeps authentication.required: false to avoid advertising unsupported OAuth schemes while relying on edge-enforced X-MCP-Api-Token/Bearer policies documented for operators.
v0.7.1 New feature
⚠ Upgrade required
  • If using reverse proxies, set `MCP_TRUST_PROXY=true` (or appropriate value) to ensure `request.ip` reflects the client address
  • To avoid high‑cardinality metrics, disable per‑client IP counting by setting `MCP_METRICS_HTTP_REQUESTS_BY_CLIENT_IP=false`
Notable features
  • `MCP_TRUST_PROXY` env variable to control Fastify `trustProxy` and expose real client IP
  • Prometheus counter `mcp_http_requests_by_client_ip_total` with route, method, and client_ip labels (optional disable via `MCP_METRICS_HTTP_REQUESTS_BY_CLIENT_IP`)
  • Anonymous MCP quota support for HTTP requests using normalized client IP as quota material
Full changelog

Added

  • MCP_TRUST_PROXY: Parsed in src/env.ts (parseMcpTrustProxyEnv) and passed to Fastify trustProxy in src/mcp-http.ts so request.ip reflects the client behind reverse proxies (X-Forwarded-For). Supports boolean-ish strings, hop counts, or proxy-addr-style strings (unset/empty defaults to true).
  • Prometheus mcp_http_requests_by_client_ip_total: Counter with route, method, client_ip (src/metrics.ts, recordMcpHttpRequestByClientIp); incremented on onResponse with stable route labels (routeLabelForMcpHttpMetrics). Optional disable via MCP_METRICS_HTTP_REQUESTS_BY_CLIENT_IP (isMcpMetricsHttpRequestsByClientIpEnabled) to avoid high cardinality.
  • Anonymous MCP quota by client (HTTP): When there is no X-Api-Key, resolveLimit / enforceMcpToolQuota accept optional anonymous material (hashed as anon:<material>); HTTP supplies normalized client IP via McpRequestContext.anonymousQuotaMaterial and createMcpServer({ getAnonymousQuotaMaterial }). Stdio MCP omits the resolver and keeps the legacy single global anonymous bucket (__mcp_quota_anonymous_v1__).
  • IP helpers for quota/metrics: normalizeIpStringForQuota, normalizeMcpClientIp in src/mcp-http.ts (trim, bracketed IPv6, zone id strip, lowercase).

Changed

  • MCP HTTP: GET /sse session setup runs inside runWithMcpRequestContext(buildMcpHttpRequestContext(...)) so SSE tool calls see the same API key and anonymous quota material as streamable /mcp and /message.

Tests

  • src/env.test.ts: parseMcpTrustProxyEnv, isMcpMetricsHttpRequestsByClientIpEnabled.
  • src/mcp-http.test.ts: IP normalization and route label helpers.
  • src/mcp-quota.test.ts: Distinct anonymous buckets, enforceMcpToolQuota with anonymous material vs stdio global bucket.
  • src/mcp-request-context.test.ts: anonymousQuotaMaterial in context.
  • src/metrics.test.ts: mcp_http_requests_by_client_ip_total export.
  • E2E src/e2e/api-smoke.ts: Asserts mcp_http_requests_by_client_ip_total on GET /metrics when MCP quota metrics are checked.
v0.7.0 New feature
Security fixes
  • .gitignore now excludes the `secrets/` directory to prevent accidental commit of local key material.
Notable features
  • Optional per‑client MCP tool call quota controlled via `MCP_QUOTA_ENABLED`, defaults, and strict mode (`MCP_QUOTA_REJECT_UNREGISTERED`).
  • Client API key registry supporting hashed secrets from file (`MCP_CLIENT_API_KEYS_FILE`) or env JSON (`MCP_CLIENT_API_KEYS_JSON`) with pepper.
  • Prometheus metrics for quota enforcement: `mcp_quota_checks_total`, `mcp_quota_exceeded_total`, `mcp_quota_tool_calls_blocked_total`, and related latency metrics.
Full changelog

Added

  • MCP tool call quota (optional): Per-client limits keyed by X-Api-Key on MCP HTTP (src/mcp-quota.ts, src/mcp-core.ts). Enable with MCP_QUOTA_ENABLED; defaults MCP_QUOTA_DEFAULT_MAX / MCP_QUOTA_DEFAULT_WINDOW; optional strict mode MCP_QUOTA_REJECT_UNREGISTERED; customizable messages via MCP_QUOTA_CONTACT_MESSAGE, MCP_QUOTA_MESSAGE_NO_KEY, MCP_QUOTA_MESSAGE_INVALID_KEY.
  • Client API key registry (hashed secrets only): JSON file or inline env — MCP_CLIENT_API_KEYS_FILE (preferred) or MCP_CLIENT_API_KEYS_JSON, plus MCP_CLIENT_API_KEY_PEPPER. Validation, prefix keys, and lookup in src/api-key-registry.ts; map-based registry builder for fast hash lookup; src/mcp-quota-registry.ts re-exports loader helpers.
  • HTTP request context for quota: src/mcp-request-context.ts (AsyncLocalStorage) so streamable /mcp, POST /sse, and /message expose the client key to createMcpServer({ getClientApiKey }) (src/mcp-http.ts).
  • Prometheus metrics (MCP): mcp_quota_checks_total, mcp_quota_exceeded_total, mcp_quota_tool_calls_blocked_total, mcp_quota_http_429_total, mcp_quota_check_duration_seconds in src/metrics.ts; resetMetricsRegistryForTests() for unit tests.
  • Quota counter store: Fixed-window buckets in src/mcp-quota-store.ts — in-memory (MemoryQuotaCounterStore) for single-process; RedisQuotaCounterStore (Lua INCR + PEXPIRE) for shared Redis when wired in.
  • Duration parsing: parseQuotaWindowMs() in src/env.ts for quota windows (e.g. 24h, 30m, 1 minute).
  • MCP session config / Smithery: Optional apiKey in MCP_SESSION_CONFIG_SCHEMA and .well-known/mcp-config — gateway maps form field to X-Api-Key (distinct from MCP_AUTH_TOKEN / Bearer).
  • Monitoring stack (repo): monitoring/prometheus.yml, Grafana provisioning (monitoring/grafana/provisioning/...) and MCP quota dashboard JSON monitoring/grafana/provisioning/dashboards/files/mcp-quota.json. docs/monitoring.md — new quota metrics, PromQL snippets, dashboard mount notes.
  • Docs and repo hygiene: CONTRIBUTING.md (dev setup, make prepare / make check); SECURITY.md (supported versions, private reporting via GitHub Security Advisories). docs/configuration.md and .env.example — full quota and registry variable list.
  • Tests: src/api-key-registry.test.ts, src/mcp-quota.test.ts, src/mcp-quota-store.test.ts, src/mcp-request-context.test.ts, src/metrics.test.ts; MCP quota scenarios in src/mcp-core.test.ts; schema assertions in src/mcp-http.test.ts. E2E src/e2e/api-smoke.ts — optional MCP_QUOTA_ENABLED / high default max, asserts quota-related series on GET /metrics (skip with SMOKE_SKIP_MCP_QUOTA_METRICS).

Changed

  • Dockerfile: Build stage node:20-alpine; runtime base node:20-bookworm-slim (Debian, apt-get for yt-dlp/ffmpeg stack); comment clarifying Alpine vs Debian for system packages.
  • npm test: Jest runs with --forceExit to avoid hanging on open handles in CI.

Security

  • .gitignore: Ignore secrets/ to reduce risk of committing local key material.
v0.6.9 New feature
⚠ Upgrade required
  • Set YT_DLP_PLAYLIST_IGNORE_ERRORS=0 in .env to opt out of ignoring errors during playlist runs
  • Enable YT_DLP_VERBOSE_ON_ERROR=1 for diagnostic verbose logging on full failures with no partial output
Notable features
  • --ignore-errors flag (default on) lets a single bad entry not abort the whole playlist run; disable via YT_DLP_PLAYLIST_IGNORE_ERRORS=0
  • YT_DLP_VERBOSE_ON_ERROR env var triggers a verbose yt-dlp rerun and logs stderr when a full failure occurs with no partial files
  • get_playlist_transcripts returns a discriminated DownloadPlaylistSubtitlesOutcome (ok/results or failure) and includes partial results if any subtitle files were written
Full changelog

Added

  • get_playlist_transcripts hardening (downloadPlaylistSubtitles in src/youtube.ts): Returns a discriminated DownloadPlaylistSubtitlesOutcome (ok + results or failure) instead of null on error. On yt-dlp failure, still scans the temp directory and returns partial results when any subtitle files were written (aligned with single-video runYtDlpAndExtractSubtitles).
  • --ignore-errors for playlist subtitle runs so one bad entry does not abort the batch. Opt out with YT_DLP_PLAYLIST_IGNORE_ERRORS=0. Documented in docs/configuration.md and .env.example.
  • YT_DLP_VERBOSE_ON_ERROR: When set to 1, after a failed playlist run with no partial files, runs yt-dlp once more with -v and without --quiet/--no-progress and logs stderr for diagnostics. Documented in docs/configuration.md and .env.example.
  • collectExecFileErrorDetails() and ExecFileErrorDetails: Normalized fields from failed execFile / yt-dlp runs (message, exitCode, signal, cmd, stdout, stderr) for structured logs.
  • formatPlaylistDownloadFailureMessage(): Builds the MCP/API-facing error string (message, exit code, stderr tail, operational hints).
  • appendYtDlpEnvArgs options: Optional third argument AppendYtDlpEnvArgsOptions with quiet: false to omit --no-progress and --quiet (used for verbose replay).

Changed

  • Playlist failure logging: Logs exitCode, signal, and cmd when present, not only empty stdout/stderr under --quiet.
  • MCP get_playlist_transcripts: On full failure, throws an error whose message comes from formatPlaylistDownloadFailureMessage instead of the generic Failed to fetch playlist subtitles.

Tests

  • src/youtube.test.ts: Coverage for collectExecFileErrorDetails, formatPlaylistDownloadFailureMessage, --ignore-errors / YT_DLP_PLAYLIST_IGNORE_ERRORS=0, failure outcome shape, and appendYtDlpEnvArgs with quiet: false.
v0.6.8 New feature
⚠ Upgrade required
  • Update configuration docs: WHISPER_TIMEOUT now only limits client wait time; background transcription may continue.
  • If using Redis caching, ensure it is enabled to benefit from background cache population.
  • Add `WHISPER_BACKGROUND_TIMEOUT` env var (optional) in deployment configs; default behavior uses 3×WHISPER_TIMEOUT or 1800000 ms.
Notable features
  • Background Whisper jobs continue after WHISPER_TIMEOUT to populate Redis subtitle cache
  • `WHISPER_BACKGROUND_TIMEOUT` env var controls long‑running background HTTP client (default = max(1800000, 3 × WHISPER_TIMEOUT))
  • Prometheus gauge `whisper_background_jobs_active` reports active deduplicated background Whisper jobs
Full changelog

Added

  • Background Whisper jobs and late cache write: When the client hits WHISPER_TIMEOUT but Whisper finishes afterward, the transcript is still saved to Redis (same subtitle cache keys as a normal success) so the next request for that video can be a cache hit. Implemented via deduplicated in-flight jobs in src/whisper-jobs.ts (startOrReuseWhisperJob), Promise.race against getWhisperConfig().timeout in src/validation.ts for auto-discovery and explicit type/lang flows, and optional timeoutMs on transcribeWithWhisper / local+API helpers (0 = no fetch abort).
  • WHISPER_BACKGROUND_TIMEOUT: Env var for the long-running Whisper HTTP client used by background jobs (unset = max(1800000, 3 × WHISPER_TIMEOUT); 0 = no client-side abort). Documented in docs/configuration.md, docs/caching.md, .env.example, and docker-compose.example.yml.
  • Prometheus gauge whisper_background_jobs_active: Tracks in-flight deduplicated background Whisper jobs; setWhisperBackgroundJobsActive() in src/metrics.ts.
  • Tests: src/whisper-jobs.test.ts; src/whisper.test.ts asserts fetch is called without signal when timeoutMs === 0; src/validation.test.ts covers cache.set after simulated timeout for auto-discover and explicit lang.

Changed

  • WHISPER_TIMEOUT semantics (docs): Clarified as the per-request wait before returning 404 to the client; background transcription may continue for cache population when Redis is enabled.
  • docs/monitoring.md: Documented whisper_background_jobs_active for API and MCP metrics tables.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
10
Forks
4
Languages
TypeScript JavaScript Makefile
Downloads/week
4 ↓42%
NPM Maintainers
2
Contributors
2
TypeScript
Types included ✓

Install & Platforms

Install via
docker

Beta — feedback welcome: [email protected]