cameronrye/openzim-mcp

v2.0.0b3 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo MCP Data & Storage

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

kiwix mcp mcp-server openzim zim

ReleasePort's take

Light signal

editorial:auto 2mo

This release fixes several parsing and handling bugs related to trailing/politeness tokens across multiple components, adds a new invariant test for regex modal class sharing, performs refactors to unify politeness processing, and confirms no performance or security regressions.

Why it matters: Patch now if your code relies on correct handling of multi‑token possessives or trailing politeness; the fixes resolve incorrect query stripping and decomposition issues. No migration deadline is imposed, but testing in dev is recommended to verify unchanged behavior.

Summary

AI summary

Updates Out of scope, Pass-1 defects, and Methodology evolution across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Security	Medium	No security vulnerabilities introduced; all scans passed. No security vulnerabilities introduced; all scans passed. Source: llm_adapter@2026-05-22 Confidence: low	—
Feature	Medium	New invariant test ensures leading and trailing politeness regexes share modal class. New invariant test ensures leading and trailing politeness regexes share modal class. Source: llm_adapter@2026-05-22 Confidence: low	—
Performance	Medium	No performance regression reported; benchmarks unchanged across CI runs. No performance regression reported; benchmarks unchanged across CI runs. Source: llm_adapter@2026-05-22 Confidence: low	—
Bugfix
Bugfix	Medium	Trailing modal politeness ≥2 words now correctly stripped from user queries. Trailing modal politeness ≥2 words now correctly stripped from user queries. Source: llm_adapter@2026-05-22 Confidence: high	—
Bugfix	Medium	Reranker telemetry comment now emitted on no-results searches. Reranker telemetry comment now emitted on no-results searches. Source: llm_adapter@2026-05-22 Confidence: high	—
Bugfix	Medium	Compact filtered search now retains the “filtered” qualifier in results. Compact filtered search now retains the “filtered” qualifier in results. Source: llm_adapter@2026-05-22 Confidence: high	—
Bugfix	Medium	Possessive topic handling now retries decomposition for multi‑token possessives. Possessive topic handling now retries decomposition for multi‑token possessives. Source: llm_adapter@2026-05-22 Confidence: low	—
Bugfix	Medium	Possessive multi-token topics now retry decomposition in _handle_tell_me_about. Possessive multi-token topics now retry decomposition in _handle_tell_me_about. Source: granite4.1:30b@2026-05-22-audit Confidence: low	—
Refactor	Medium	Universal trailing‑politeness regex now shares modal class with leading counterpart. Universal trailing‑politeness regex now shares modal class with leading counterpart. Source: llm_adapter@2026-05-22 Confidence: low	—
Refactor	Medium	Chained‑intent guidance now strips trailing politeness from both halves before rendering. Chained‑intent guidance now strips trailing politeness from both halves before rendering. Source: llm_adapter@2026-05-22 Confidence: low	—

Full changelog

Post-b2 sweep packaged from PR #167 (commits 45de8da → cc26b3d).
Sweep shape: 4 → 1 → 1 across pass-1, pass-2, pass-3. All eight b2
user-facing fix families verified clean on live MCP first; sweep then
probed the adversarial shapes the b2 fixes unlocked. Both pass-2 and
pass-3 surfaced single narrow-scope siblings of pass-1 fixes —
consistent with the "narrow-scope sibling" pattern (now 8 sweeps
strong) and the "fix unlocks new paths" pattern (now 9 sweeps strong).

Pass-1 defects (4, `45de8da`)

D1 — trailing modal politeness ≥2 words falls through. The
trailing-politeness regex in _extract_tell_me_about only matched
please / to me / for me; the LEADING regex (line ~374)
recognised the modal class (could/can/would/will + you) but
the trailing twin was missing. Live: tell me about Tokyo if you would → Would (verb stub); ... if you could → Could; ... would you → Would_You disambig. Fix: add a trailing pattern
symmetric to the leading one (both branches require a you so a
bare trailing modal verb in real article titles isn't stripped).
D2 — reranker telemetry comment suppressed on no-results. The
b1 D-1 in-band telemetry contract promised 
on every multi-token search. _handle_search compact path
early-returned on total == 0 BEFORE reaching
_maybe_rerank_compact, so neither _RERANKER_SKIPPED_NO_RESULTS
nor _RERANKER_SKIPPED_NOT_INSTALLED bumped and the envelope
writer skipped the comment. Live: search for asdfqwerzxcv nonexistent → no reranker comment. Fix: invoke
_maybe_rerank_compact on the empty payload before the bail
(no-op aside from the counter bump; the rerank singleton is
cached).
D3 — Rule 2 + multi-token possessive picks wrong token. Live:
tell me about Photosythesis's reproduction → Reproduction
article (expected Photosynthesis). Rule 2's affix retry
correctly fires (Photosythesis's → Photosynthesis's), but
the b1 P1-D5 fix unlocked the path — pre-fix returned No search results found, post-fix returns a SILENT WRONG ANSWER.
Root cause: Rule 4's _POSSESSIVE_RE is ^...$-anchored and
runs against the FULL query at parse time; the verb prefix
prevents the match. Fix: in _handle_tell_me_about, when no
decomposition hint was attached AND the topic carries an
apostrophe-s followed by another token, retry
_decompose_x_of_y on the bare topic. Scope narrowed to
the possessive shape ONLY (NOT X of Y) to avoid regressing
non-canonical X-of-Y queries.
D4 — compact filtered search drops "filtered" qualifier.
Live: search Berlin in namespace C → Found 3 matches for "Berlin" (legacy non-compact path emits Found N filtered matches for "X"<filter_text>). Both paths shared
_format_search_text; pre-fix the formatter had no filter
awareness. Fix: add optional filter_text kwarg to
_format_search_text (mirrors display_query); compact filtered
call site threads through _format_filter_text helper. Symmetric
treatment for filtered no-results.

Pass-2 sibling (1, `ed674b5`)

D1 universal-layer mirror. Pass-1 added the modal-politeness
strip inside _extract_tell_me_about only, but the universal
_TRAILING_POLITENESS_RE (called by _strip_trailing_politeness
at parse_intent line 1048) was added by the post-a20 PD2-1
sweep specifically so every extractor sees the cleaned query.
Every NON-tell_me_about intent kept leaking the modal class:
search for biology if you would → query="biology if you would"; find article titled Berlin if you would → looks up
Berlin if you would (not found). Fix: lift the modal class into
_TRAILING_POLITENESS_RE. Pass-1 extractor-level strip kept as
defense-in-depth. New invariant pinned:
TestD1RegexSync.test_leading_and_trailing_share_modal_class —
leading + trailing politeness regexes must share the modal class.

Pass-3 sibling (1, `cc26b3d`)

Chained-intent trailing-politeness leak.
_chained_intent_guidance runs UPSTREAM of parse_intent on the
raw user query. The post-a24 P1-D6 sweep mirrored the param-leak
strip there; the equivalent mirror of _strip_trailing_politeness
was never added. Pre-fix every trailing-politeness token (the
full set, including the pass-2 modal class) leaked into chain
rejection bullets — tell me about Tokyo if you would then list namespaces produced a rejection whose left bullet read
tell me about Tokyo if you would verbatim, modal politeness and
all. Caller would copy the suggested left half back,
re-introducing the politeness on every iteration. Same structural
sibling pattern as the post-a24 P1-D6 param-leak version. Fix:
apply _strip_trailing_politeness to BOTH chain halves after the
existing connector / punct trim loop, before bullets render.
Per-half rather than full-query because the politeness can appear
inside the chain (not just at the very end). Structurally safe —
_CHAINED_OPERATION_PREFIX_RE checks the LEADING op verb, which
the trailing strip never touches.

Out of scope (deferred design call)

D5 — death of stalin → Death_and_state_funeral_of_Joseph_Stalin
instead of the 2017 Iannucci film. P1-D3 probe-gate correctly
suppressed the Stalin disambig misroute; title-probe picked a
different canonical X-related title rather than the film
(canonical is The_Death_of_Stalin). Picking the film would
require a prefix-widening probe (The <query>) — unwanted side
effects on arbitrary bare topics — or a popularity ranker. Both
are design choices beyond the b2 sweep scope.

D2 / D3 / D4 sibling audits clean

D2: _handle_filtered_search always routes through
_maybe_rerank_compact; _handle_search_all uses its own
rerank apply that bumps a counter on every path. _handle_search
was the only early-return gap.
D3: _handle_tell_me_about is the only handler that
auto-fetches a single article based on the extracted topic.
Other intents take the topic literally; synthesize uses RAG-style
passage retrieval where decomposition would lose the attribute
context (pre-existing design out of scope).
D4: _format_search_text has three call sites — only the
compact filtered one needed filter_text.
search_with_filters_with_canonical_splice (non-compact filtered)
already uses _format_filtered_response which natively emits
the qualifier.

Cross-feature composition verified

search for Photosythesis's reproduction in namespace C if you would → universal trailing strip peels if you would → intent
= filtered_search → _maybe_rerank_compact bumps counter →
_format_search_text renders with filter_text. D1+D2+D4
compose.
tell me about Photosythesis's reproduction if you would →
universal strip peels if you would → intent = tell_me_about
→ D3 retry fires on possessive topic → photosynthesis. D1+D3
compose.

Tests

40 new tests in tests/test_post_b2_beta_fixes.py across 10
classes (TestD1TrailingModalPoliteness,
TestD1ParseIntentEndToEnd, TestD1SiblingUniversalTrailingModal,
TestD1RegexSync, TestD1Pass3ChainedIntentPolitenessLeak,
TestD2RerankerCounterOnNoResults,
TestD3PossessiveDecompositionRetry,
TestD4FilteredSearchEchoQualifier, TestRegressionGuards).
Full suite: 2360 passing, 54 skipped, 38 deselected. mypy
clean across 52 source files. black + flake8 clean. CI checks
all green (CodeQL, SonarCloud, bandit, security scanning,
6 OS × Python matrix, both [reranker]-extra suites,
performance benchmarks).

Methodology evolution

"Narrow-scope sibling" pattern — now 8 sweeps strong. Both
pass-2 and pass-3 surfaced a single sibling of pass-1's D1
fix-family: pass-2 caught the universal-layer mirror (modal
class missing from _TRAILING_POLITENESS_RE); pass-3 caught the
upstream-chained-guidance mirror (trailing-politeness strip
missing from _chained_intent_guidance). Both are STRUCTURAL
mirrors of fixes already shipped — pass-2's sibling mirrors the
post-a20 PD2-1 universal-strip extension, pass-3's sibling
mirrors the post-a24 P1-D6 param-leak strip placement.
"Fix unlocks new paths" — 9th consecutive sweep. D3 is
particularly nasty because the failure mode changed from
explicit No search results found (pre-b1 P1-D5) to silent
wrong answer (post-b1 P1-D5 affix retry → post-b2 D3 retry).
New invariants pinned via canonical-source tests — two
feature-level guards: (a) leading + trailing politeness regexes
must share the modal class; (b) the no-results early-return path
in _handle_search must route through _maybe_rerank_compact.
These pin the "added X to one side, forgot the other side"
drift class that drove both pass-2 and pass-3 defects.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track cameronrye/openzim-mcp

Get notified when new releases ship.

About cameronrye/openzim-mcp

Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.

All releases →

Related context

Related tools

Earlier breaking changes

v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.