Hister
MCP Search & WebA privacy‑focused local web search engine that indexes the full text of visited pages and supports advanced queries, file indexing, multi‑user hosting, and optional AI semantic search.
Features
- Privacy‑focused: index browsing history locally without remote services
- Full‑text indexing of web page content you have visited
- Advanced query language for precise search results
- Local file knowledge base indexing
- Crawler support (headless browser or traditional crawler) for fast expansion
- Multi‑user hosting and optional AI‑enhanced semantic search
Recent releases
View all 15 releases →- Mastodon Extractor indexes each toot as its own document with `metadata.type:toot`.
- Metadata query filtering via `metadata.key:value` syntax.
- Full‑screen preview mode accessible from search and history pages.
Full changelog
New Features
Mastodon Extractor
A dedicated extractor for Mastodon detects Mastodon pages and indexes each
toot as its own separate document rather than one big blob of text. Works
with any Mastodon instance without configuration. Every toot gets a
metadata.type:toot field so you can filter toots in search queries.
Combined with a search alias (!toot → metadata.type:toot) this makes
finding past toots fast and convenient.
Metadata Query Filtering
Documents can now be filtered by arbitrary metadata fields using the
metadata.key:value query syntax. Extractors (including the newMastodon
extractor) populate these fields at index time.
Full Screen Preview
The split-screen preview can now be toggled into a full-screen mode that
occupies the entire content area (closes #401). The URL changes to
/preview/[id] so the view survives a page reload. Pressing the "view
result content" hotkey switches between split and full-screen. Full-screen
preview is also available on the history page.
Preview Panel on History Page
The history page now shows the same interactive preview panel as the search
page (closes #395). Hotkey navigation between history entries works the same
way as on the search page.
Infinite Scroll
Search results now load more pages automatically as you scroll to the bottom
of the list, removing the need to manually page through results (closes #1).
Image Lightbox
Images displayed inside the preview panel can be clicked to open a full-size
lightbox view.
Delete All Results
A new "delete all" action removes every document that matches the current
search query at once, without having to delete results one by one.
Keyboard Shortcut for Result Deletion
A dedicated hotkey deletes the currently focused search result directly from
the keyboard.
Quick Skip Rule
A skip-rule component lets you add a URL to your skip list directly from a
search result, available in both the web app and the browser extension
(closes #380).
robots.txt Support
The crawler and hister index now respect robots.txt by default (closes
#386). A new ignore_robots_txt config option disables this check when
needed.
Document Labeling
Documents can be tagged with custom labels when indexing from the CLI
(--label) or from the browser extension. Labels are stored as metadata
and can be used in search queries (closes #156, #373).
Editable Rules and Aliases
Existing skip/priority rules and search aliases can now be edited in the
web UI instead of having to delete and recreate them (closes #270).
Exact Phrase Matching
Multi-word queries now also attempt an exact phrase match across title and
text, so searching for open source surfaces pages that contain that exact
phrase more prominently (closes #394).
Extractor Templates and Extra Document Creation
Extractors can now supply a custom preview template, giving each content
type its own presentation in the preview panel. Extractors can also produce
additional sub-documents from a single page (used by the Mastodon extractor
to create one document per toot). A template scaffold is included to make
writing new extractors easier.
WebDriver BiDi Crawler Backend
A new bidi crawler backend uses the W3C WebDriver BiDi
protocol to drive an already-running browser over a WebSocket connection.
Unlike the chromedp backend, it does not launch a browser process: you start
the browser yourself (headless or not) and point Hister at it:
# Firefox
firefox --remote-debugging-port 9222
# Chrome / Chromium
chromium --remote-debugging-port=9222
crawler:
backend: bidi
backend_options:
host: '127.0.0.1'
port: '9222'
capture_delay: 1.5 # extra seconds to wait after load for JS rendering
Supported by Firefox (≥ 102), Chrome/Chromium (≥ 106), and Edge. Options:
socket (full WebSocket URL), host, port, capture_delay. The crawler
reuses a single BiDi session for all URLs in one hister index run, making
multi-URL indexing significantly more efficient than opening a new browser
session per URL (closes #284).
Enhancements
- Resizable preview panel: drag the divider in split-screen view to
adjust the panel width; the chosen width persists across sessions - History hotkey navigation: keyboard navigation between history entries
on the history page - Secondary date sorting: when search scores are equal, results are
sorted by indexed date - yt-dlp multi-language subtitles: configure which subtitle languages
the yt-dlp extractor indexes - MCP date filtering: MCP search requests can now be filtered by date
- Semantic search chunking: punctuation-based boundaries used for
chunk splitting, improving relevance for sentence-level queries - OIDC enhancements: userinfo endpoint is now configurable for
providers without auto-discovery (#279); password login can be disabled
when OAuth is the only configured auth method - Clearer CLI error messages: client-side HTTP errors now explain the
problem in plain words and suggest which flag to use (#400) - Duplicate rule prevention: the server and UI both reject duplicate
skip/priority rules and aliases (#399) - Deletion error feedback: errors during document deletion are now
surfaced in the UI - Rules/aliases UX: input fields moved above their respective lists
- Version string in server log: the server start message now includes
the version number (#372) - Silent WebSocket disconnect: closing the browser tab no longer shows
a connection-error message
Bug Fixes
- Search terms are now properly escaped before query execution
- Focused result index computed correctly when priority results are present
- Phrase queries are only applied when no field-specific terms exist in the
query /api/deleterequests from the browser extension are now acceptedtextandhtmlfields always included in search results (#374)- Language field included in document search results
- OIDC scopes correctly forwarded to the provider (#371)
- Auth page is scrollable on small screens (#370)
- TUI client now passes the access token to the server (#368)
- Optional peer dependencies no longer excluded on non-Linux platforms (#299)
- Semantic vector search with SQLite (`sqlite-vec`) or PostgreSQL (`pgvector`) backends, configurable via `semantic` config section.
- OAuth 2.0 and OpenID Connect providers can be added as login methods through the new `server.oauth` configuration block.
Full changelog
New Features
Semantic Search
Full vector search support via sentence-embedding models. Documents are chunked
and embedded at index time; search queries are embedded at query time and ranked
by cosine similarity. Two storage backends are supported:
- SQLite (default, via bundled
sqlite-vec) zero extra infrastructure required - PostgreSQL with
pgvectorauto-selected when the database is Postgres
Configure the embedding API endpoint, model, dimensions, and chunking parameters
in the new semantic config section. Semantic search is opt-in and off by default.
Relevance scores are shown alongside results when semantic search is active.
OAuth / SSO Authentication
OAuth 2.0 and OpenID Connect (OIDC) providers can now be configured as login
methods. Add one or more entries to the new server.oauth config section with
client_id, client_secret, configuration_url (for OIDC auto-discovery), or
manual auth_url / token_url, and optional scopes. Multiple providers can be
active at the same time alongside the built-in username/password login.
MCP Server
Hister now exposes a Model Context Protocol
endpoint at /api/mcp, enabling LLM agents and MCP-compatible tools to search
the index directly.
Persistent Crawler State Management
Recursive crawl jobs (hister index -r) are now stored in the database and
survive interruptions. Each job gets a unique ID (auto-generated or set via
--job-id). Pass --job-id <id> without --recursive to resume an
interrupted crawl from exactly where it left off, including original validator
rules and visited-URL counts.
New Extractors
- Wikipedia extracts the article body and infobox, rewrites relative links, and sanitizes the output
- GitHub project extracts repository descriptions and README content from GitHub project pages
- Lobste.rs dedicated extractor for Lobste.rs story and comment pages
- yt-dlp extracts video metadata (title, description, channel) from video pages via yt-dlp
- JSON-LD surfaces structured metadata (
@type,headline, description) from pages that embed JSON-LD
All extractors now expose a Description() method, and an extractor information
page is available at /extractors in the web UI.
OpenSearch Suggestions
The server now serves an OpenSearch suggestions endpoint (/api/suggest),
allowing browsers to display search-as-you-type completions when Hister is
configured as a search engine.
Enhancements
Crawler Backend for All Index Operations
The --backend flag (and --backend-option) is now available on both
hister index (plain and --recursive) and hister import-browser, allowing
a headless Chrome/Chromium backend for JavaScript-heavy pages without running a
full recursive crawl:
hister index --backend chromedp https://example.com
hister import-browser --backend chromedp --backend-option exec_path=/usr/bin/chromium
Headers and cookies can also be injected per-invocation:
hister index --header "Accept-Language=en" --cookie "session=abc; Domain=example.com" https://example.com
Cookies use standard Set-Cookie format with a required Domain attribute.
CLI Search Improvements
--limit Nflag caps the number of results returned--fieldsflag selects which document fields to include in output--htmlflag includes raw HTML content in the output- Paging support added to both CLI search and
list-urls list-urlsnow fetches results from the server by default;--offlineconnects directly to the index without a running server
Quoted Field Queries
Field-qualified queries now support quoted values, enabling correct deletion and
lookup of URLs that contain spaces (common on Windows file paths):
url:"file:///C:/Users/My Documents/notes.txt"
Preview Panel Polish
- Preview title is now clickable (opens the result URL)
- Preview panel maximises available content width
- JSON-LD metadata surfaced inside the preview panel
- Dark theme font colors fixed in preview popup
NixOS / Nix Module
systemdandlaunchdhardening applied to the Hister service units- New
services.hister.environmentFileoption for secrets injection openFirewallnow requires explicit opt-inservices.hister.configrenamed toservices.hister.settings
Other
- Executable size reduced ~70 MB by switching to a trimmed
lingua-gofork - Sensitive content rejection errors surfaced in the browser extension
--verboseflag onhister deletelists matched URLs before deleting- Priority result deduplication now copies body text from the original result
/suggestendpoint protected by auth middleware andSec-Fetch-Siteheader check- Version information included in the MCP endpoint response
- Timezone data bundled into the binary for environments without a system
tzdata
Bug Fixes
- File URLs (
file://) now handled correctly in the UI for both opening and deletion (#362) - Browser extension authentication documentation corrected (#366)
- URLs no longer lowercased during query building, preventing mismatches on case-sensitive paths
- History view correctly filtered per-user in multi-user mode (#314)
- Token authentication middleware now respects
NoAuthflag (#348) - Documents with no HTML content no longer attempt HTML extraction (#351)
- Extension no longer resubmits documents after a
406 Not Acceptableresponse - Priority results correctly deduplicated against standard results
- File indexing fixed on Windows
- Wide tables no longer overflow the preview panel
- Score field populated correctly in search responses
aws_access_keysensitive content pattern tightened to reduce false positives- Home-manager service units correctly gated on host platform in Nix module
- Rename any usage of the deprecated `import` command to `import-browser`
- When migrating from SQLite to PostgreSQL, update `server.database` with a `postgres://` connection string
- `import` command renamed to `import-browser`
- Recursive web crawler (`hister index -r`) with Go HTTP and headless Chrome backends, configurable depth, link limits, domain filters, and URL patterns
- Full PostgreSQL support via `postgres://` connection string in `server.database`
- Extractor pipeline overhaul: extractors now have explicit states (continue/done), expose a `Preview()` method, added pkg.go.dev and Stack Overflow extractors
Full changelog
New Features
Web Crawler
New hister index -r <url> command crawls sites recursively.
Supports the standard Go HTTP backend and a headless Chrome backend (chromedp).
Configurable depth, link count, allowed/excluded domains, and URL patterns.
PostgreSQL Backend
Full PostgreSQL support as an alternative to SQLite.
Configure via a postgres:// connection string in server.database.
Extractor Pipeline Overhaul
Extractors are now configurable, have explicit states (continue/done), and expose
a Preview() method used by the preview panel. New extractors included:
- Custom
pkg.go.devextractor for Go documentation pages - Basic Stack Overflow extractor
Desktop Readability Panel
Focused search results load automatically in a split-pane reader on the right side
on screens wider than 1280 px. The panel is togglable and its open/closed state persists.
Enhancements
- HTML sanitizer (bluemonday) applied to all extracted content
metadatafield added to documents for arbitrary key/value datasearchinput type attribute on search fields for better mobile UX- Build commit ID shown in the version string
- Admin users can create global indexes or indexes on behalf of other users
hister indexskips already-indexed URLs by default; pass--forceto reindex them- URL and domain wildcard matching automatically anchors to start and end
- Table of contents added to the API docs page
- Document indexed date shown in the preview panel
- Search query reflected in the browser tab title
- WebSocket communication optimised to reduce redundant round-trips
- Automatic redirect on zero results is now optional (configurable)
importcommand renamed toimport-browserto freeimportfor index import/export
Bug Fixes
- Browser history database opened read-only to avoid lock conflicts (#304)
- History entries now deleted when their associated document is deleted (#303)
- Crawler user-agent correctly applied after redirect handling (#302)
- Fixed field-specific alternation parts in query parser (#274)
- Negated query terms no longer trimmed twice
- HTML field no longer leaks into search results (#268)
- Expanded query hint only shown when the expansion is longer than the original query
- URL changes after HTTP redirects now resolved correctly
- Crawler no longer stops on HTTP errors
- Crawler timeout now applied during browser history import (#278)
- Pinned result titles no longer truncated on narrow screens
- Dark mode handled correctly in the preview panel
- Mobile layout no longer introduces unwanted line breaks
- Multi-user support
- Better authentication method for the extension
- Enhanced delete with query and `--dry` flag
Full changelog
What's Changed
- Multi-user support.
- Better authentication method for the extension.
- Enhanced delete. Queries can be used to delete multiple items.
--dryflag. - Reindex does not require server stop anymore.
- Enhanced history view.
- Extension feedback if the active tab has a matching skip rule.
- Add pages/domains to skip rules directly from the extension popup.
- Plenty of bugfixes.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.