Skip to content

cameronrye/openzim-mcp

v2.0.0b3 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 13d MCP Data & Storage
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

kiwix mcp mcp-server openzim zim

ReleasePort's take

Light signal
editorial:auto 13d

This release fixes several parsing and handling bugs related to trailing/politeness tokens across multiple components, adds a new invariant test for regex modal class sharing, performs refactors to unify politeness processing, and confirms no performance or security regressions.

Why it matters: Patch now if your code relies on correct handling of multi‑token possessives or trailing politeness; the fixes resolve incorrect query stripping and decomposition issues. No migration deadline is imposed, but testing in dev is recommended to verify unchanged behavior.

Summary

AI summary

Updates Out of scope, Pass-1 defects, and Methodology evolution across a mixed release.

Changes in this release

Security Medium

No security vulnerabilities introduced; all scans passed.

No security vulnerabilities introduced; all scans passed.

Source: llm_adapter@2026-05-22

Confidence: low

Feature Medium

New invariant test ensures leading and trailing politeness regexes share modal class.

New invariant test ensures leading and trailing politeness regexes share modal class.

Source: llm_adapter@2026-05-22

Confidence: low

Performance Medium

No performance regression reported; benchmarks unchanged across CI runs.

No performance regression reported; benchmarks unchanged across CI runs.

Source: llm_adapter@2026-05-22

Confidence: low

Bugfix Medium

Trailing modal politeness ≥2 words now correctly stripped from user queries.

Trailing modal politeness ≥2 words now correctly stripped from user queries.

Source: llm_adapter@2026-05-22

Confidence: high

Bugfix Medium

Reranker telemetry comment now emitted on no-results searches.

Reranker telemetry comment now emitted on no-results searches.

Source: llm_adapter@2026-05-22

Confidence: high

Bugfix Medium

Compact filtered search now retains the “filtered” qualifier in results.

Compact filtered search now retains the “filtered” qualifier in results.

Source: llm_adapter@2026-05-22

Confidence: high

Bugfix Medium

Possessive topic handling now retries decomposition for multi‑token possessives.

Possessive topic handling now retries decomposition for multi‑token possessives.

Source: llm_adapter@2026-05-22

Confidence: low

Bugfix Medium

Possessive multi-token topics now retry decomposition in _handle_tell_me_about.

Possessive multi-token topics now retry decomposition in _handle_tell_me_about.

Source: granite4.1:30b@2026-05-22-audit

Confidence: low

Refactor Medium

Universal trailing‑politeness regex now shares modal class with leading counterpart.

Universal trailing‑politeness regex now shares modal class with leading counterpart.

Source: llm_adapter@2026-05-22

Confidence: low

Refactor Medium

Chained‑intent guidance now strips trailing politeness from both halves before rendering.

Chained‑intent guidance now strips trailing politeness from both halves before rendering.

Source: llm_adapter@2026-05-22

Confidence: low

Full changelog

Post-b2 sweep packaged from PR #167 (commits 45de8dacc26b3d).
Sweep shape: 4 → 1 → 1 across pass-1, pass-2, pass-3. All eight b2
user-facing fix families verified clean on live MCP first; sweep then
probed the adversarial shapes the b2 fixes unlocked. Both pass-2 and
pass-3 surfaced single narrow-scope siblings of pass-1 fixes —
consistent with the "narrow-scope sibling" pattern (now 8 sweeps
strong) and the "fix unlocks new paths" pattern (now 9 sweeps strong).

Pass-1 defects (4, 45de8da)

  • D1 — trailing modal politeness ≥2 words falls through. The
    trailing-politeness regex in _extract_tell_me_about only matched
    please / to me / for me; the LEADING regex (line ~374)
    recognised the modal class (could/can/would/will + you) but
    the trailing twin was missing. Live: tell me about Tokyo if you wouldWould (verb stub); ... if you couldCould; ... would youWould_You disambig. Fix: add a trailing pattern
    symmetric to the leading one (both branches require a you so a
    bare trailing modal verb in real article titles isn't stripped).
  • D2 — reranker telemetry comment suppressed on no-results. The
    b1 D-1 in-band telemetry contract promised <!-- reranker=<state> -->
    on every multi-token search. _handle_search compact path
    early-returned on total == 0 BEFORE reaching
    _maybe_rerank_compact, so neither _RERANKER_SKIPPED_NO_RESULTS
    nor _RERANKER_SKIPPED_NOT_INSTALLED bumped and the envelope
    writer skipped the comment. Live: search for asdfqwerzxcv nonexistent → no reranker comment. Fix: invoke
    _maybe_rerank_compact on the empty payload before the bail
    (no-op aside from the counter bump; the rerank singleton is
    cached).
  • D3 — Rule 2 + multi-token possessive picks wrong token. Live:
    tell me about Photosythesis's reproductionReproduction
    article (expected Photosynthesis). Rule 2's affix retry
    correctly fires (Photosythesis'sPhotosynthesis's), but
    the b1 P1-D5 fix unlocked the path — pre-fix returned No search results found, post-fix returns a SILENT WRONG ANSWER.
    Root cause: Rule 4's _POSSESSIVE_RE is ^...$-anchored and
    runs against the FULL query at parse time; the verb prefix
    prevents the match. Fix: in _handle_tell_me_about, when no
    decomposition hint was attached AND the topic carries an
    apostrophe-s followed by another token, retry
    _decompose_x_of_y on the bare topic. Scope narrowed to
    the possessive shape ONLY (NOT X of Y) to avoid regressing
    non-canonical X-of-Y queries.
  • D4 — compact filtered search drops "filtered" qualifier.
    Live: search Berlin in namespace CFound 3 matches for "Berlin" (legacy non-compact path emits Found N filtered matches for "X"<filter_text>). Both paths shared
    _format_search_text; pre-fix the formatter had no filter
    awareness. Fix: add optional filter_text kwarg to
    _format_search_text (mirrors display_query); compact filtered
    call site threads through _format_filter_text helper. Symmetric
    treatment for filtered no-results.

Pass-2 sibling (1, ed674b5)

  • D1 universal-layer mirror. Pass-1 added the modal-politeness
    strip inside _extract_tell_me_about only, but the universal
    _TRAILING_POLITENESS_RE (called by _strip_trailing_politeness
    at parse_intent line 1048) was added by the post-a20 PD2-1
    sweep specifically so every extractor sees the cleaned query.
    Every NON-tell_me_about intent kept leaking the modal class:
    search for biology if you wouldquery="biology if you would"; find article titled Berlin if you would → looks up
    Berlin if you would (not found). Fix: lift the modal class into
    _TRAILING_POLITENESS_RE. Pass-1 extractor-level strip kept as
    defense-in-depth. New invariant pinned:
    TestD1RegexSync.test_leading_and_trailing_share_modal_class
    leading + trailing politeness regexes must share the modal class.

Pass-3 sibling (1, cc26b3d)

  • Chained-intent trailing-politeness leak.
    _chained_intent_guidance runs UPSTREAM of parse_intent on the
    raw user query. The post-a24 P1-D6 sweep mirrored the param-leak
    strip there; the equivalent mirror of _strip_trailing_politeness
    was never added. Pre-fix every trailing-politeness token (the
    full set, including the pass-2 modal class) leaked into chain
    rejection bullets — tell me about Tokyo if you would then list namespaces produced a rejection whose left bullet read
    tell me about Tokyo if you would verbatim, modal politeness and
    all. Caller would copy the suggested left half back,
    re-introducing the politeness on every iteration. Same structural
    sibling pattern as the post-a24 P1-D6 param-leak version. Fix:
    apply _strip_trailing_politeness to BOTH chain halves after the
    existing connector / punct trim loop, before bullets render.
    Per-half rather than full-query because the politeness can appear
    inside the chain (not just at the very end). Structurally safe —
    _CHAINED_OPERATION_PREFIX_RE checks the LEADING op verb, which
    the trailing strip never touches.

Out of scope (deferred design call)

  • D5 — death of stalinDeath_and_state_funeral_of_Joseph_Stalin
    instead of the 2017 Iannucci film.
    P1-D3 probe-gate correctly
    suppressed the Stalin disambig misroute; title-probe picked a
    different canonical X-related title rather than the film
    (canonical is The_Death_of_Stalin). Picking the film would
    require a prefix-widening probe (The <query>) — unwanted side
    effects on arbitrary bare topics — or a popularity ranker. Both
    are design choices beyond the b2 sweep scope.

D2 / D3 / D4 sibling audits clean

  • D2: _handle_filtered_search always routes through
    _maybe_rerank_compact; _handle_search_all uses its own
    rerank apply that bumps a counter on every path. _handle_search
    was the only early-return gap.
  • D3: _handle_tell_me_about is the only handler that
    auto-fetches a single article based on the extracted topic.
    Other intents take the topic literally; synthesize uses RAG-style
    passage retrieval where decomposition would lose the attribute
    context (pre-existing design out of scope).
  • D4: _format_search_text has three call sites — only the
    compact filtered one needed filter_text.
    search_with_filters_with_canonical_splice (non-compact filtered)
    already uses _format_filtered_response which natively emits
    the qualifier.

Cross-feature composition verified

  • search for Photosythesis's reproduction in namespace C if you would → universal trailing strip peels if you would → intent
    = filtered_search → _maybe_rerank_compact bumps counter →
    _format_search_text renders with filter_text. D1+D2+D4
    compose.
  • tell me about Photosythesis's reproduction if you would
    universal strip peels if you would → intent = tell_me_about
    → D3 retry fires on possessive topic → photosynthesis. D1+D3
    compose.

Tests

  • 40 new tests in tests/test_post_b2_beta_fixes.py across 10
    classes (TestD1TrailingModalPoliteness,
    TestD1ParseIntentEndToEnd, TestD1SiblingUniversalTrailingModal,
    TestD1RegexSync, TestD1Pass3ChainedIntentPolitenessLeak,
    TestD2RerankerCounterOnNoResults,
    TestD3PossessiveDecompositionRetry,
    TestD4FilteredSearchEchoQualifier, TestRegressionGuards).
  • Full suite: 2360 passing, 54 skipped, 38 deselected. mypy
    clean across 52 source files. black + flake8 clean. CI checks
    all green (CodeQL, SonarCloud, bandit, security scanning,
    6 OS × Python matrix, both [reranker]-extra suites,
    performance benchmarks).

Methodology evolution

  • "Narrow-scope sibling" pattern — now 8 sweeps strong. Both
    pass-2 and pass-3 surfaced a single sibling of pass-1's D1
    fix-family: pass-2 caught the universal-layer mirror (modal
    class missing from _TRAILING_POLITENESS_RE); pass-3 caught the
    upstream-chained-guidance mirror (trailing-politeness strip
    missing from _chained_intent_guidance). Both are STRUCTURAL
    mirrors of fixes already shipped — pass-2's sibling mirrors the
    post-a20 PD2-1 universal-strip extension, pass-3's sibling
    mirrors the post-a24 P1-D6 param-leak strip placement.
  • "Fix unlocks new paths" — 9th consecutive sweep. D3 is
    particularly nasty because the failure mode changed from
    explicit No search results found (pre-b1 P1-D5) to silent
    wrong answer (post-b1 P1-D5 affix retry → post-b2 D3 retry).
  • New invariants pinned via canonical-source tests — two
    feature-level guards: (a) leading + trailing politeness regexes
    must share the modal class; (b) the no-results early-return path
    in _handle_search must route through _maybe_rerank_compact.
    These pin the "added X to one side, forgot the other side"
    drift class that drove both pass-2 and pass-3 defects.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track cameronrye/openzim-mcp

Get notified when new releases ship.

Sign up free

About cameronrye/openzim-mcp

Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.

All releases →

Related context

Earlier breaking changes

  • v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
  • v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
  • v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
  • v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
  • v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.

Beta — feedback welcome: [email protected]