This release fixes issues for SREs watching stability and regressions.
✓ No known CVEs patched in this version
Topics
Affected surfaces
Summary
AI summaryUpdates Pass-2 source-level audit, P1-D1, and P1-D2 across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Soft-connector footer recognises short non-Latin proper nouns as substantive (P1-D3). Soft-connector footer recognises short non-Latin proper nouns as substantive (P1-D3). Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
| Dependency | Medium |
Dedupe cursor encode helpers committed in `8745012` for Sonar quality gate. Dedupe cursor encode helpers committed in `8745012` for Sonar quality gate. Source: granite4.1:8b-q6_K@2026-05-19 Confidence: low |
— |
| Bugfix | Medium |
search for X rejects cross-tool cursors (P1-D1). search for X rejects cross-tool cursors (P1-D1). Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
| Bugfix | Medium |
search X in namespace C and links in X reject cross-tool cursors (P1-D2). search X in namespace C and links in X reject cross-tool cursors (P1-D2). Source: granite4.1:8b-q6_K@2026-05-19 Confidence: high |
— |
Full changelog
Live-MCP beta sweep against wikipedia_en_all_maxi_2026-02.zim on
the freshly-deployed v2.0.0a19 build. Pass 1 confirmed all six
prior fixes (post-a17 P1-D1/P1-D2/P1-D3 and post-a18 P3-D1/P3-D2/
P1-D4) still work as designed in production, then surfaced three
new user-facing defects. Pass 2 source-level self-audit found
zero new siblings.
All three defects follow the recurring "fixes unlock previously-
broken code paths" pattern: a17's Unicode tail-tokenisation fix
made non-Latin topics REACHABLE; a18's soft-connector alias
fallback + table-dominated subject-attribute fix landed on those
paths; THIS sweep found that the substantiveness filter guarding
the soft-connector footer wasn't Unicode-aware (P1-D3) AND that
the cross-tool cursor guard from a18's P1-D4 hadn't widened to
the search/filtered-search/links siblings (P1-D1, P1-D2 — the
deferred follow-up explicitly flagged by post-a18).
Fixed
search for Xrejects cross-tool cursors (P1-D1). A
walk_namespaceorbrowse_namespacecursor passed to
search for Photosynthesispreviously decodeds.o=3into
options["offset"]and search returnedshowing 4-6 of 4237
instead ofshowing 1-3. Simple-tools-layer mirror of the
post-a18 P1-D4 fix that landed for_handle_browse/
_handle_walk_namespace. The advancedsearch_zim_filetool
already enforces tool-binding via
Cursor.decode(expected_tool=...); this restores the check at
the simple-tools handler edge with
_cursor_tool_mismatch(options, "search_zim_file")at the
top of_handle_search. User now sees the structured
Cursor / Tool Mismatchrejection before any backend call.search X in namespace Candlinks in Xreject cross-tool
cursors (P1-D2). Same shape in_handle_filtered_search:
_cursor_tool_mismatch(options, "search_with_filters")guard
added. Defence-in-depth:_handle_linkshardcodesoffset=0
today so the live shape didn't reproduce, but it IS a cursor-
emitting handler and the guard
(_cursor_tool_mismatch(options, "extract_article_links"))
keeps the boundary consistent with sibling handlers and
prevents a future offset-reading change from regressing
silently. All fouroptions.get("offset")sites in
simple_tools.py(_handle_browse,_handle_walk_namespace,
_handle_search,_handle_filtered_search) are now guarded.- Soft-connector footer recognises short non-Latin proper nouns
as substantive (P1-D3).tell me about Berlin and 東京
resolved correctly to 東京 (right-promote via a18's Unicode
tail fix), but the soft-connector footer was silently
suppressed because_is_substantive_topic("東京")returned
False. The ASCII-length-5 heuristic was tuned for English
particles (Then/Both/Here/Now) and didn't
account for non-Latin scripts where each character carries
syllable-level lexical weight —東京is 2 chars but names
the capital of Japan;Kölnis 4 chars but names Germany's
fourth-largest city. Same shape for京都/北京/上海.
Fix: keep the original ASCII path (multi-token OR len≥5 OR
digit-containing), and add a relaxed branch — when the string
contains a non-ASCII letter, accept at len≥2. ASCII
abbreviations (Dr./St./Mt.) remain rejected because
they have no non-ASCII characters; single CJK ideograms (京)
remain rejected because of the len≥2 floor. Both the chain
detector and the soft-connector footer now fire correctly for
non-Latin halves.
Tests
19 regression tests in tests/test_post_a19_beta_fixes.py:
- P1-D1 (4): walk-cursor-to-search rejected; browse-cursor-to-
search rejected; same-tool search-cursor round-trips cleanly;
no-cursor passthrough unaffected. - P1-D2 (3): walk-cursor-to-filtered-search rejected; walk-
cursor-to-links rejected; filtered-search no-cursor passthrough
unaffected. - P1-D3 (12): CJK 2-char accept (
東京/北京/京都/
上海); umlaut 4-char accept (Köln); ASCII particles still
rejected (Then/Both/Here/Now/This);
abbreviations still rejected (Dr./St./Mt./Jr.);
single CJK char still rejected (京/北); regression guards
for ASCII long topics, multi-token, digit topics, empty /
whitespace; Cyrillic short topic via existing 5-char path +
relaxed branch; end-to-end soft-connector footer fires with
CJK dropped half + umlaut dropped half.
Full test suite: 1842 passed, 50 skipped (up from 1823 in a19).
Pass-2 source-level audit (no siblings)
- P1-D1 / P1-D2: all 4 sites in
simple_tools.pythat read
options.get("offset", 0)(_handle_browse,
_handle_walk_namespace,_handle_search,
_handle_filtered_search) are now guarded by
_cursor_tool_mismatch._handle_search_alland
_handle_relateddon't readoptions["offset"]at all. No
siblings remaining. - P1-D3:
_is_substantive_topicis called from two sites —
the chain detector right-promote branch
(simple_tools.py:983-984) and_soft_connector_footer
(simple_tools.py:1156). Both benefit from the fix. Searched
for otherlen(stripped) >= NASCII-length heuristics across
simple_tools.py/intent_parser.py/title_promotion.py/
synthesize.py;intent_parser.py:1012already has explicit
Unicode handling for the analogous_looks_like_topic_ask
check. No other ASCII-only thresholds on user-provided strings.
Pass-3 live re-probe deferred following the post-a17 methodology:
the three fixes are narrow handler-edge guards + a pure-function
heuristic with no cross-module contract changes, no cursor codec
/ serialization changes. The 19 mock-based regression tests
cover the exact surfaces a live re-probe would.
PR: #149.
Commits on the sweep branch: cc9eb64 (pass-1 fixes + tests),
8745012 (dedupe cursor encode helpers — Sonar quality gate).
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About cameronrye/openzim-mcp
Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.
Related context
Related tools
Earlier breaking changes
- v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
- v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
- v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
- v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
- v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.
Beta — feedback welcome: [email protected]