cameronrye/openzim-mcp

v2.0.0a22 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 2mo MCP Data & Storage

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

kiwix mcp mcp-server openzim zim

Affected surfaces

auth rce_ssrf

Summary

AI summary

Updates Testing, multi-language, and British/texting across a mixed release.

Changes in this release

Type	Severity	Summary	CVE
Security	High	Defence‑in‑depth dispatcher‑edge strip removes trailing politeness from request parameters (P1-D1). Defence‑in‑depth dispatcher‑edge strip removes trailing politeness from request parameters (P1-D1). Source: granite4.1:30b@2026-05-20-audit Confidence: high	—
Bugfix
Bugfix	Medium	Multi-entity chain warning for 3+ entity bare-topic chains fixed (P1-D2 / P1-D3 / P1-D4). Multi-entity chain warning for 3+ entity bare-topic chains fixed (P1-D2 / P1-D3 / P1-D4). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Bugfix	Medium	Trailing politeness regex extended to British/texting and multi-language tokens (P1-D6 / P1-D7). Trailing politeness regex extended to British/texting and multi-language tokens (P1-D6 / P1-D7). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Bugfix	Medium	`Search Terms Required` B4 guard now peels politeness from the tail before empty-check (P1-D8). `Search Terms Required` B4 guard now peels politeness from the tail before empty-check (P1-D8). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Bugfix	Medium	`_Q_EMITTING_CURSOR_TOOLS` drift guard added to prevent missing q-emitting tools (P1-D5). `_Q_EMITTING_CURSOR_TOOLS` drift guard added to prevent missing q-emitting tools (P1-D5). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Bugfix	Medium	PD2-2 sibling docstring path-bait sweep replaced literal paths with placeholders (P1-D9). PD2-2 sibling docstring path-bait sweep replaced literal paths with placeholders (P1-D9). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Bugfix	Medium	PD2-4 recovery hint now preserves original error reason (P1-D10). PD2-4 recovery hint now preserves original error reason (P1-D10). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: high	—
Bugfix	Medium	Defence-in-depth dispatcher-edge politeness strip applied to params in `handle_zim_query` (P1-D1). Defence-in-depth dispatcher-edge politeness strip applied to params in `handle_zim_query` (P1-D1). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—
Bugfix	Medium	`limit` docstring clarified for atomic intents (T-D1). `limit` docstring clarified for atomic intents (T-D1). Source: granite4.1:8b-q6_K@2026-05-20 Confidence: low	—

Full changelog

Live-MCP beta sweep against wikipedia_en_all_maxi_2026-02.zim on
the freshly-deployed v2.0.0a21 build, plus a small-model failure
transcript review. Smoke gates landed 3/4 green pre-fix; the
politeness-strip gate (search for biology please →
Found 5000 matches for "biology please") leaked on the live MCP
despite the source-side parse_intent strip working correctly
under direct unit test — most likely cause is an in-process
module cache on the live server that loaded only part of PR #152's
diff. The user-visible defect class is the same regardless of
root cause; defence-in-depth dispatcher-edge strip lands here
(P1-D1).

Pass 2 source-level audit found no new sibling defects across the
landed fix sites. The recurring "fix unlocks new paths" cycle
reproduced again three times: post-a20 P1-D2 (alias-fallback
widening to 2-entity asymmetric chains) didn't address 3+ entity
chains (this sweep's P1-D2/D3/D4 catch them); post-a20 PD2-1
(parse_intent politeness strip) didn't widen the token set
(P1-D6/D7 add British/texting/multilang variants); post-a20
PD2-2 (zim_query docstring de-bait) didn't sweep the sibling
advanced-tool docstrings (P1-D9 widens the regression net).
Live-transcript review remains a distinct test surface (T-D1
came from a Qwen3-8B-Q4 transcript; not reachable via adversarial
query probing alone).

Fixed

Multi-entity chain warning for 3+ entity bare-topic chains
(P1-D2 / P1-D3 / P1-D4). Three observed shapes that bypass the
existing 2-entity _soft_connector_footer alias-fallback:
tell me about Köln, München, and Berlin returned Berlin + a
footer suggesting tell me about Köln, München, (still-chained
recursive suggestion — re-running it re-triggers the same
defect); tell me about Berlin or 東京 or Tokyo silently fell
through to "No search results found"; tell me about Berlin and München and Köln returned Cologne (Köln alias) with no
footer about the dropped Berlin / München. Fix: new
_multi_entity_chain_guidance detects 3+ substantive halves
split by combined soft connectors (and / or / , / & /
vs / /) AND probes the title index for the whole topic —
clean single-title hits (Earth, Wind & Fire band;
Lions, Tigers, and Bears idiom) suppress the warning; no
clean hit fires a structured Multi-Entity Chain Detected
rejection naming each entity. Iterative single-pattern splits
(no combined alternation regex) keep SonarCloud's S5852
polynomial-backtracking flag quiet; string-prefix/-suffix
scans (not regex) handle leftover leading/trailing conjunctions
for the same reason.
Trailing politeness regex extended to British/texting and
multi-language tokens (P1-D6 / P1-D7). Post-a20 PD2-1
enumerated only please / kindly / thanks / thank you|u.
Live probes showed ta / cheers / thx / ty / pls
(British/texting) and bitte / danke / merci / gracias /
por favor (multi-language) all leaked into search query /
topic / title silently. Several new tokens are short (ta /
ty are 2 chars) so the leading anchor tightens from
\s*[,;.!?]?\s* to (?:^|\s+|[,;.!?]\s*) — embedded
substrings in longer words (cantata / feta / Dante) no
longer get their last two chars eaten.
Search Terms Required B4 guard now peels politeness from
the tail before the empty-check (P1-D8). Pre-fix,
_search_query_tail(query) ran on the ORIGINAL query, so
trailing politeness wasn't stripped before the empty-tail
check; search for please silently dispatched with
query="for" (the literal verb word) and returned a 200k-hit
response dominated by stop-word collisions. Same shape for any
search for <politeness> after the P1-D6 extension. Fix:
apply IntentParser._strip_trailing_politeness to the tail
before the B4 emptiness check.
Defence-in-depth dispatcher-edge politeness strip on params
(P1-D1). The live-MCP sweep observed
Found 5000 matches for "biology please" for the query
search for biology please despite the post-a20 PD2-1 fix.
Source-side, the strip works correctly under direct unit test;
the most likely cause is an in-process module cache on the
live server that loaded only part of PR #152. Fix: in
handle_zim_query, after the parse_intent call, apply
IntentParser._strip_trailing_politeness to each of the
user-supplied content fields in params (query / topic /
title / entry_path / partial_query). Idempotent when
parse_intent already cleaned them; belt-and-suspenders catch
for any future regression that bypasses parse_intent.
_Q_EMITTING_CURSOR_TOOLS drift guard (P1-D5). The
post-a20 P1-D1 fix introduced
SimpleToolsHandler._Q_EMITTING_CURSOR_TOOLS as a hand-
maintained frozenset of tool names whose cursors legitimately
carry an s.q field. If a future contributor adds a new
q-emitting tool (a new Cursor.encode(state={..., "q": ...})
callsite) but forgets to update the set, the dispatcher's
q-overlap guard silently degrades to no-op for that tool —
paginating with the wrong query proceeds silently. New
regression test scans every Cursor.encode(tool=...) callsite
in zim/search.py and pins membership equality with the set;
encode-callsite comments updated to point at the set so the
cross-module link is greppable from either side.
PD2-2 sibling docstring path-bait sweep (P1-D9). Post-a20
PD2-2 only pinned the zim_query docstring in server.py.
Sibling literal path examples lived in advanced tool docstrings
— structure_tools.get_entry_summary ("/path/to/wiki.zim"),
structure_tools.get_table_of_contents ("/path/to/wiki.zim"),
structure_tools.get_binary_entry ("/path/file.zim"), and
content_tools.get_zim_entries ("/path/x.zim"). Small models
copy these verbatim too — the same weak-instruction-follower
class PD2-2 was designed to break. Fix: replace literal paths
with <zim_path> placeholders that don't validate as
filesystem paths; widen the regression test to scan every
openzim_mcp/tools/*.py for /path/...\.zim or
/data/...\.zim shapes.
PD2-4 recovery hint now preserves the original error reason
(P1-D10). The PD2-4 detector substring-matched "access denied"
in the exception message and fired on OpenZimMcpSecurityError's
"Access denied - Path is outside allowed directories" message
in addition to the intended file-not-found
OpenZimMcpValidationError. The replacement body dropped the
security-specific reason on the floor; callers saw only the
generic "doesn't match any loaded archive" hint. Fix: surface
the original exception message as a new **Reason** line
alongside the recovery hint so the security-specific context
isn't lost.
limit docstring nudge for atomic intents (T-D1). Live
small-model transcript (Qwen3-8B-Q4) showed the model passing
limit=5 on a tell_me_about query. The pre-fix docstring
said "Max search/browse results (default: 3)" — silent about
whether limit applies to atomic intents. Fix: docstring
nudge explicitly enumerating the atomic intents that ignore
limit (tell me about / get article / show structure /
links in / articles related to / main_page /
list_namespaces / metadata for / list_files).

Testing

54 new regression tests in tests/test_post_a21_beta_fixes.py
covering all eleven defects:
TestP1D6P1D7TrailingPolitenessExtensions (29 parametric strip
cases + word-boundary safety + full-parse integration);
TestP1D8SearchTermsRequiredAfterPolitenessStrip (8 parametric
guard-fires); TestP1D1DispatcherEdgePolitenessStrip (buggy-
parse-stub regression); TestMultiEntityChainGuidance (8
cases — 3-entity AND/OR/4-entity chains, 2-entity guard,
real-title suppression via title-index probe, search-intent
isolation, leading-conjunction split, Lions/Tigers/Bears idiom
suppression); TestP1D5QEmittingCursorToolsDrift (3 cases —
set value, search-encode comment hook, parametric
Cursor.encode scan); TestP1D9DocstringPathBaitSiblings
(directory-wide scan of tools/*.py);
TestP1D10RecoveryHintMarkerDiscriminatesSecurityError (1
case — surfaces original OpenZimMcpSecurityError reason);
TestTD1LimitDocstringClarifiesAtomicIntents (1 case —
docstring contract pin).
Full suite: 1954 passed, 50 skipped. mypy clean across all
45 source files.

Release process

After this changelog lands on main, push the v2.0.0a22 tag
on main to trigger .github/workflows/release.yml — PyPI
publish + GitHub release notes auto-extracted from the matching
CHANGELOG section.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track cameronrye/openzim-mcp

Get notified when new releases ship.

About cameronrye/openzim-mcp

Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.

All releases →

Related context

Related tools

Earlier breaking changes

v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.