cameronrye/openzim-mcp

v2.0.0a16 Bugfix

This release fixes issues for SREs watching stability and regressions.

Published 2mo MCP Data & Storage

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

kiwix mcp mcp-server openzim zim

Summary

AI summary

Fixed multiple parsing, validation, and response‑format defects affecting disambiguation handling, intent extraction, namespace walking, browsing, suggestions, chained intents, and politeness stripping.

Full changelog

The multi-pass live sweep of a15 against
wikipedia_en_all_maxi_2026-02.zim (~118 GB, ~27.2 M entries) ran
across seven passes. Pass 1 surfaced four user-facing defects (D4 in
the tell_me_about disambig-page handling for Mercury-class bare
titles; D5 in the intent parser's politeness-prefix regex; D6 in
find_by_title's response to namespace-prefixed input; D7 a
schema-consistency gap in walk_namespace). Pass 2 self-audited
every D-fix in both verbose and compact rendering modes and
exercised the canonical-article paths (Berlin / Apollo 11 / Java)
the disambig-detection logic must not regress. Pass 3 re-tested
across a broader disambig set (Mars, Sun, Moon, Paris, Apollo bare),
walked empty namespaces B / X / Z, and exercised cross-fix
interactions (could you find article titled M/Title); both
passes 2 and 3 found zero new defects. Pass 4 then deliberately
stress-tested the four landed D-fixes from angles the earlier
passes hadn't probed (more bare-title disambigs, pathological
politeness combinations, find_by_title edge cases, walk_namespace
malformed args) AND exercised the intent paths the earlier passes
had barely touched (synthesize, browse namespace, show structure
of, links in, suggestions for, search in namespace); it surfaced
three more defects (P4-D1 / P4-D2 / P4-D3). Pass 5 verified those
three fixes; zero new defects. Pass 6 went deeper — a source-level
audit of every intent handler for the silent-default pattern
P4-D3 fixed (params.get("X", DEFAULT)) caught the same shape in
_handle_browse, and a parallel audit of every intent extractor
for the trigger-word-capture pattern P4-D1 fixed caught a sibling
extractor permissiveness in _extract_browse; plus a leading-
politeness probe surfaced a third defect (P6-D3) — please tell me about X leaks the leading politeness into the parsed topic
just like the original D5 did for modal verbs. Pass 7 verified
all ten fixes and audited cumulative regressions across the three
commits; zero new defects.

Fixed

D4: tell me about Mercury no longer attaches a misleading
_May also refer to: Mercury_Monterey — use tell me about <full title>_ footer to the disambiguation-page body. Two cooperating
bugs: SimpleToolsHandler._is_disambig_lead returned False
whenever pre_h2 exceeded 400 chars — Mercury's 628-char pre-H2
(the "most commonly refers to" preamble, three top-level entries,
and the "may also refer to" header) blew past the cap, so the
existing disambig-page detection in _lead_with_toc never fired;
AND the trailing-footer block in _handle_tell_me_about had no
way to suppress the disambig_twin_path / related_extends_paths
hints when the resolved body was itself a disambig page. Fixed
by checking only the trailing 400 characters of pre_h2 (the
regex-free endswith stays bounded, but long preambles now
trigger) and by gating both trailing footers on a fresh
body_is_disambig_page check on the fetched body. Canonical
pages with disambig twins (Berlin) keep their footer; canonical
pages with extends-topic siblings (Apollo 11 → anniversaries /
lunar sample display / goodwill messages) keep their footer.
D5: could you tell me about Photosynthesis now parses
topic = "Photosynthesis" instead of leaking the modal lead-in
into the topic. The verb-prefix regex in
_extract_tell_me_about anchored at ^\s* and never matched
"could you" / "can you" / "would you" / "will you", so the whole
query fell through to the topic = query.strip() fallback and
downstream relied on the tail-probe entity rescue to find the
article anyway. Fixed by stripping the modal scaffold
((?:could|can|would|will)\s+(?:you|we|i)\s+(?:please\s+)?) before
the verb regex runs. Leaves non-modal queries unchanged; combines
cleanly with the existing trailing-politeness strip
(could you tell me about X please → topic=X).
D6: find article titled M/Title now redirects to get article M/Title instead of returning a silent 0_hits. The title index
only stores titles (M/Title's title is "Title"), so passing a ZIM
namespace path through the title-lookup backend was guaranteed to
return nothing — with no signal to the caller that the wrong tool
was in use. _handle_find_by_title now detects the
uppercase-letter + slash + non-empty-suffix shape upfront and
returns a structured Namespace Path, Not a Title message that
points at both get article <path> (direct lookup) and find article titled <stripped> (title-only fallback). Lowercase
prefixes (a/b) and titles without the namespace shape pass
through to the backend unchanged.
D7: walk namespace A (and any other empty new-scheme
namespace) now includes namespace_entry_count: 0 in the
response. The short-circuit at
openzim_mcp/zim/namespace.py for new-scheme non-C/M/W namespaces
built an empty result without passing namespace_entry_count to
_build_walk_result, so the field was omitted entirely while
walk-M and walk-W (which surface their bounded totals) included
it. Downstream consumers had to special-case "missing" vs "zero".
Fixed by passing namespace_entry_count=0 in the short-circuit.
Updated the walk_A_10 golden to reflect the new schema; walk-M
and walk-W goldens are unchanged (already carried the field).
P4-D1: suggestions for (no actual prefix) now returns the
structured "Missing Search Term" error instead of silently
autocompleting against the literal word "for". The regex's
optional (?:for\s+)? group failed to match without trailing
whitespace, so the mandatory capture greedily swallowed "for"
itself; the handler's existing missing-arg guard then saw a
non-empty partial_query and ran the suggestion fallback (which
spent ~70 s scanning for "for" — a high-frequency English token).
Fixed in _extract_suggestions by discarding a bare-"for"
capture so the guard takes over. Legitimate prefixes that happen
to start with "for" (e.g., suggestions for forest) still work.
P4-D2: chained-intent detector no longer bypassed by a modal
lead-in. _chained_intent_guidance's
_CHAINED_OPERATION_PREFIX_RE is anchored at ^ and only
recognised operation verbs at position 0, so could you tell me about Photosynthesis then list namespaces shifted the verb past
the anchor — left_is_op evaluated False, the chain gate failed,
and the query fell through to normal intent classification where
the higher-confidence list_namespaces won and silently dropped
the tell me about half. The D5 modal-strip lives inside
_extract_tell_me_about; it only runs AFTER the chain detector
has already decided. Fixed by pre-stripping the same modal
scaffold ((?:could|can|would|will)\s+(?:you|we|i)\s+ (?:please\s+)?) at the top of _chained_intent_guidance so
detection sees the cleaned query.
P4-D3: walk namespace with a malformed argument now returns
a structured "Missing or Invalid Namespace" error instead of
silently walking C. Multi-char (AB), digit (1), special
(_), and missing-argument forms all fell through to
params.get("namespace", "C") in _handle_walk_namespace with
no signal to the caller that the input was rejected. Sibling
tools (find_by_title, links_in, suggestions,
tell_me_about) already return structured missing-arg errors;
this one didn't. Fixed by adding an upfront guard that mirrors
their shape (rule / examples) before the C-default kicks in.
P6-D1 + P6-D2: browse namespace now reaches input-validation
parity with walk namespace. Two cooperating gaps — the
handler _handle_browse had the same
params.get("namespace", "C") silent-default that P4-D3 fixed
for walk; AND the extractor _extract_browse accepted multi-char,
digit, and special-character namespace arguments
(browse namespace AB / 1 / _) without uppercasing lowercase
input — diverging from the strict
_extract_walk_namespace. The two siblings now agree: regex
tightened to namespace\s+['"]?([A-Za-z])\b['"]? with .upper()
on the captured letter, and the handler returns a structured
"Missing or Invalid Namespace" error when the extractor produces
nothing.
P6-D3: leading please / kindly now strip cleanly from the
parsed topic. please tell me about Photosynthesis and
kindly describe Photosynthesis previously parsed with the
politeness phrase leaking into the topic — same shape as the
pass-1 D5 defect but for non-modal politeness words. The article
still resolved via tail-probe rescue, but the parsed topic was
wrong. Fix extends the leading-strip in _extract_tell_me_about
to cover please / kindly AND wraps both the modal-strip and
the politeness-strip in a loop so composite phrases
(please could you tell me about X, please please tell me about X) peel cleanly. Same loop also applied to the chain-
detector's _chained_intent_guidance pre-strip so leading
politeness doesn't bypass chain detection (mirror of P4-D2).
Leaves the existing trailing-politeness strip alone, so
tell me about X please still works, and the leading-only
anchor (^\s*) prevents stripping mid-query mentions of
please / kindly that are legitimately part of the topic.

Tests

tests/test_post_a15_beta_fixes.py — 80 regression tests
pinning all ten defects. Each defect gets:
- The fix-case test (Mercury body has no misleading trailer;
  could you tell me about X parses topic=X; find article titled M/Title returns redirect; _build_walk_result exposes the
  zero-count field; suggestions for triggers the missing-arg
  guard; could you tell me about X then list namespaces is
  detected as chained; walk namespace AB returns the missing-
  namespace error; browse namespace AB returns the same error
  and browse namespace c lowercases to "C"; please tell me about X strips cleanly).
- Negative self-audit cases (Berlin keeps its disambig-twin
  footer; non-modal queries unchanged; lowercase a/b not
  redirected by find_by_title; namespace_entry_count omitted
  when caller passes None; legitimate suggestions for forest
  still captures the prefix; non-chained could you tell me about X not tripped by the chain detector; trailing please still
  works; mid-query please in linguistics not stripped).
- Cross-defect probes (Java disambig body suppresses
  disambig_twin_path footer too; please could you tell me about X peels both layers; please tell me about X then list namespaces trips chain detector).

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track cameronrye/openzim-mcp

Get notified when new releases ship.

About cameronrye/openzim-mcp

Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.

All releases →

Related context

Related tools

Earlier breaking changes

v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.