Skip to content

Hister

MCP Search & Web

A privacy‑focused local web search engine that indexes the full text of visited pages and supports advanced queries, file indexing, multi‑user hosting, and optional AI semantic search.

Go Latest v0.15.0 · 10h ago Security brief →

Features

  • Privacy‑focused: index browsing history locally without remote services
  • Full‑text indexing of web page content you have visited
  • Advanced query language for precise search results
  • Local file knowledge base indexing
  • Crawler support (headless browser or traditional crawler) for fast expansion
  • Multi‑user hosting and optional AI‑enhanced semantic search

Recent releases

View all 15 releases →
No immediate action
v0.15.0 Mixed

Versioning + Priority Rules + PDF Indexing

v0.14.0 New feature
Notable features
  • Mastodon Extractor indexes each toot as its own document with `metadata.type:toot`.
  • Metadata query filtering via `metadata.key:value` syntax.
  • Full‑screen preview mode accessible from search and history pages.
Full changelog

New Features

Mastodon Extractor

A dedicated extractor for Mastodon detects Mastodon pages and indexes each
toot as its own separate document rather than one big blob of text. Works
with any Mastodon instance without configuration. Every toot gets a
metadata.type:toot field so you can filter toots in search queries.
Combined with a search alias (!toot → metadata.type:toot) this makes
finding past toots fast and convenient.

Metadata Query Filtering

Documents can now be filtered by arbitrary metadata fields using the
metadata.key:value query syntax. Extractors (including the newMastodon
extractor) populate these fields at index time.

Full Screen Preview

The split-screen preview can now be toggled into a full-screen mode that
occupies the entire content area (closes #401). The URL changes to
/preview/[id] so the view survives a page reload. Pressing the "view
result content" hotkey switches between split and full-screen. Full-screen
preview is also available on the history page.

Preview Panel on History Page

The history page now shows the same interactive preview panel as the search
page (closes #395). Hotkey navigation between history entries works the same
way as on the search page.

Infinite Scroll

Search results now load more pages automatically as you scroll to the bottom
of the list, removing the need to manually page through results (closes #1).

Image Lightbox

Images displayed inside the preview panel can be clicked to open a full-size
lightbox view.

Delete All Results

A new "delete all" action removes every document that matches the current
search query at once, without having to delete results one by one.

Keyboard Shortcut for Result Deletion

A dedicated hotkey deletes the currently focused search result directly from
the keyboard.

Quick Skip Rule

A skip-rule component lets you add a URL to your skip list directly from a
search result, available in both the web app and the browser extension
(closes #380).

robots.txt Support

The crawler and hister index now respect robots.txt by default (closes
#386). A new ignore_robots_txt config option disables this check when
needed.

Document Labeling

Documents can be tagged with custom labels when indexing from the CLI
(--label) or from the browser extension. Labels are stored as metadata
and can be used in search queries (closes #156, #373).

Editable Rules and Aliases

Existing skip/priority rules and search aliases can now be edited in the
web UI instead of having to delete and recreate them (closes #270).

Exact Phrase Matching

Multi-word queries now also attempt an exact phrase match across title and
text, so searching for open source surfaces pages that contain that exact
phrase more prominently (closes #394).

Extractor Templates and Extra Document Creation

Extractors can now supply a custom preview template, giving each content
type its own presentation in the preview panel. Extractors can also produce
additional sub-documents from a single page (used by the Mastodon extractor
to create one document per toot). A template scaffold is included to make
writing new extractors easier.

WebDriver BiDi Crawler Backend

A new bidi crawler backend uses the W3C WebDriver BiDi
protocol to drive an already-running browser over a WebSocket connection.
Unlike the chromedp backend, it does not launch a browser process: you start
the browser yourself (headless or not) and point Hister at it:

# Firefox
firefox --remote-debugging-port 9222

# Chrome / Chromium
chromium --remote-debugging-port=9222
crawler:
  backend: bidi
  backend_options:
    host: '127.0.0.1'
    port: '9222'
    capture_delay: 1.5 # extra seconds to wait after load for JS rendering

Supported by Firefox (≥ 102), Chrome/Chromium (≥ 106), and Edge. Options:
socket (full WebSocket URL), host, port, capture_delay. The crawler
reuses a single BiDi session for all URLs in one hister index run, making
multi-URL indexing significantly more efficient than opening a new browser
session per URL (closes #284).

Enhancements

  • Resizable preview panel: drag the divider in split-screen view to
    adjust the panel width; the chosen width persists across sessions
  • History hotkey navigation: keyboard navigation between history entries
    on the history page
  • Secondary date sorting: when search scores are equal, results are
    sorted by indexed date
  • yt-dlp multi-language subtitles: configure which subtitle languages
    the yt-dlp extractor indexes
  • MCP date filtering: MCP search requests can now be filtered by date
  • Semantic search chunking: punctuation-based boundaries used for
    chunk splitting, improving relevance for sentence-level queries
  • OIDC enhancements: userinfo endpoint is now configurable for
    providers without auto-discovery (#279); password login can be disabled
    when OAuth is the only configured auth method
  • Clearer CLI error messages: client-side HTTP errors now explain the
    problem in plain words and suggest which flag to use (#400)
  • Duplicate rule prevention: the server and UI both reject duplicate
    skip/priority rules and aliases (#399)
  • Deletion error feedback: errors during document deletion are now
    surfaced in the UI
  • Rules/aliases UX: input fields moved above their respective lists
  • Version string in server log: the server start message now includes
    the version number (#372)
  • Silent WebSocket disconnect: closing the browser tab no longer shows
    a connection-error message

Bug Fixes

  • Search terms are now properly escaped before query execution
  • Focused result index computed correctly when priority results are present
  • Phrase queries are only applied when no field-specific terms exist in the
    query
  • /api/delete requests from the browser extension are now accepted
  • text and html fields always included in search results (#374)
  • Language field included in document search results
  • OIDC scopes correctly forwarded to the provider (#371)
  • Auth page is scrollable on small screens (#370)
  • TUI client now passes the access token to the server (#368)
  • Optional peer dependencies no longer excluded on non-Linux platforms (#299)
v0.13.0 New feature
Notable features
  • Semantic vector search with SQLite (`sqlite-vec`) or PostgreSQL (`pgvector`) backends, configurable via `semantic` config section.
  • OAuth 2.0 and OpenID Connect providers can be added as login methods through the new `server.oauth` configuration block.
Full changelog

New Features

Semantic Search

Full vector search support via sentence-embedding models. Documents are chunked
and embedded at index time; search queries are embedded at query time and ranked
by cosine similarity. Two storage backends are supported:

  • SQLite (default, via bundled sqlite-vec) zero extra infrastructure required
  • PostgreSQL with pgvector auto-selected when the database is Postgres

Configure the embedding API endpoint, model, dimensions, and chunking parameters
in the new semantic config section. Semantic search is opt-in and off by default.
Relevance scores are shown alongside results when semantic search is active.

OAuth / SSO Authentication

OAuth 2.0 and OpenID Connect (OIDC) providers can now be configured as login
methods. Add one or more entries to the new server.oauth config section with
client_id, client_secret, configuration_url (for OIDC auto-discovery), or
manual auth_url / token_url, and optional scopes. Multiple providers can be
active at the same time alongside the built-in username/password login.

MCP Server

Hister now exposes a Model Context Protocol
endpoint at /api/mcp, enabling LLM agents and MCP-compatible tools to search
the index directly.

Persistent Crawler State Management

Recursive crawl jobs (hister index -r) are now stored in the database and
survive interruptions. Each job gets a unique ID (auto-generated or set via
--job-id). Pass --job-id <id> without --recursive to resume an
interrupted crawl from exactly where it left off, including original validator
rules and visited-URL counts.

New Extractors

  • Wikipedia extracts the article body and infobox, rewrites relative links, and sanitizes the output
  • GitHub project extracts repository descriptions and README content from GitHub project pages
  • Lobste.rs dedicated extractor for Lobste.rs story and comment pages
  • yt-dlp extracts video metadata (title, description, channel) from video pages via yt-dlp
  • JSON-LD surfaces structured metadata (@type, headline, description) from pages that embed JSON-LD

All extractors now expose a Description() method, and an extractor information
page is available at /extractors in the web UI.

OpenSearch Suggestions

The server now serves an OpenSearch suggestions endpoint (/api/suggest),
allowing browsers to display search-as-you-type completions when Hister is
configured as a search engine.

Enhancements

Crawler Backend for All Index Operations

The --backend flag (and --backend-option) is now available on both
hister index (plain and --recursive) and hister import-browser, allowing
a headless Chrome/Chromium backend for JavaScript-heavy pages without running a
full recursive crawl:

hister index --backend chromedp https://example.com
hister import-browser --backend chromedp --backend-option exec_path=/usr/bin/chromium

Headers and cookies can also be injected per-invocation:

hister index --header "Accept-Language=en" --cookie "session=abc; Domain=example.com" https://example.com

Cookies use standard Set-Cookie format with a required Domain attribute.

CLI Search Improvements

  • --limit N flag caps the number of results returned
  • --fields flag selects which document fields to include in output
  • --html flag includes raw HTML content in the output
  • Paging support added to both CLI search and list-urls
  • list-urls now fetches results from the server by default; --offline connects directly to the index without a running server

Quoted Field Queries

Field-qualified queries now support quoted values, enabling correct deletion and
lookup of URLs that contain spaces (common on Windows file paths):

url:"file:///C:/Users/My Documents/notes.txt"

Preview Panel Polish

  • Preview title is now clickable (opens the result URL)
  • Preview panel maximises available content width
  • JSON-LD metadata surfaced inside the preview panel
  • Dark theme font colors fixed in preview popup

NixOS / Nix Module

  • systemd and launchd hardening applied to the Hister service units
  • New services.hister.environmentFile option for secrets injection
  • openFirewall now requires explicit opt-in
  • services.hister.config renamed to services.hister.settings

Other

  • Executable size reduced ~70 MB by switching to a trimmed lingua-go fork
  • Sensitive content rejection errors surfaced in the browser extension
  • --verbose flag on hister delete lists matched URLs before deleting
  • Priority result deduplication now copies body text from the original result
  • /suggest endpoint protected by auth middleware and Sec-Fetch-Site header check
  • Version information included in the MCP endpoint response
  • Timezone data bundled into the binary for environments without a system tzdata

Bug Fixes

  • File URLs (file://) now handled correctly in the UI for both opening and deletion (#362)
  • Browser extension authentication documentation corrected (#366)
  • URLs no longer lowercased during query building, preventing mismatches on case-sensitive paths
  • History view correctly filtered per-user in multi-user mode (#314)
  • Token authentication middleware now respects NoAuth flag (#348)
  • Documents with no HTML content no longer attempt HTML extraction (#351)
  • Extension no longer resubmits documents after a 406 Not Acceptable response
  • Priority results correctly deduplicated against standard results
  • File indexing fixed on Windows
  • Wide tables no longer overflow the preview panel
  • Score field populated correctly in search responses
  • aws_access_key sensitive content pattern tightened to reduce false positives
  • Home-manager service units correctly gated on host platform in Nix module
v0.12.0 Breaking risk
⚠ Upgrade required
  • Rename any usage of the deprecated `import` command to `import-browser`
  • When migrating from SQLite to PostgreSQL, update `server.database` with a `postgres://` connection string
Breaking changes
  • `import` command renamed to `import-browser`
Notable features
  • Recursive web crawler (`hister index -r`) with Go HTTP and headless Chrome backends, configurable depth, link limits, domain filters, and URL patterns
  • Full PostgreSQL support via `postgres://` connection string in `server.database`
  • Extractor pipeline overhaul: extractors now have explicit states (continue/done), expose a `Preview()` method, added pkg.go.dev and Stack Overflow extractors
Full changelog

New Features

Web Crawler

New hister index -r <url> command crawls sites recursively.
Supports the standard Go HTTP backend and a headless Chrome backend (chromedp).
Configurable depth, link count, allowed/excluded domains, and URL patterns.

PostgreSQL Backend

Full PostgreSQL support as an alternative to SQLite.
Configure via a postgres:// connection string in server.database.

Extractor Pipeline Overhaul

Extractors are now configurable, have explicit states (continue/done), and expose
a Preview() method used by the preview panel. New extractors included:

  • Custom pkg.go.dev extractor for Go documentation pages
  • Basic Stack Overflow extractor

Desktop Readability Panel

Focused search results load automatically in a split-pane reader on the right side
on screens wider than 1280 px. The panel is togglable and its open/closed state persists.

Enhancements

  • HTML sanitizer (bluemonday) applied to all extracted content
  • metadata field added to documents for arbitrary key/value data
  • search input type attribute on search fields for better mobile UX
  • Build commit ID shown in the version string
  • Admin users can create global indexes or indexes on behalf of other users
  • hister index skips already-indexed URLs by default; pass --force to reindex them
  • URL and domain wildcard matching automatically anchors to start and end
  • Table of contents added to the API docs page
  • Document indexed date shown in the preview panel
  • Search query reflected in the browser tab title
  • WebSocket communication optimised to reduce redundant round-trips
  • Automatic redirect on zero results is now optional (configurable)
  • import command renamed to import-browser to free import for index import/export

Bug Fixes

  • Browser history database opened read-only to avoid lock conflicts (#304)
  • History entries now deleted when their associated document is deleted (#303)
  • Crawler user-agent correctly applied after redirect handling (#302)
  • Fixed field-specific alternation parts in query parser (#274)
  • Negated query terms no longer trimmed twice
  • HTML field no longer leaks into search results (#268)
  • Expanded query hint only shown when the expansion is longer than the original query
  • URL changes after HTTP redirects now resolved correctly
  • Crawler no longer stops on HTTP errors
  • Crawler timeout now applied during browser history import (#278)
  • Pinned result titles no longer truncated on narrow screens
  • Dark mode handled correctly in the preview panel
  • Mobile layout no longer introduces unwanted line breaks
v0.11.0 New feature
Notable features
  • Multi-user support
  • Better authentication method for the extension
  • Enhanced delete with query and `--dry` flag
Full changelog

What's Changed

  • Multi-user support.
  • Better authentication method for the extension.
  • Enhanced delete. Queries can be used to delete multiple items. --dry flag.
  • Reindex does not require server stop anymore.
  • Enhanced history view.
  • Extension feedback if the active tab has a matching skip rule.
  • Add pages/domains to skip rules directly from the extension popup.
  • Plenty of bugfixes.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
1,117
Forks
59
Languages
Go Svelte TypeScript

Community & Support

Beta — feedback welcome: [email protected]