Skip to content

cameronrye/openzim-mcp

v2.0.0b13 Bugfix

This release fixes issues for SREs watching stability and regressions.

Published 11d MCP Data & Storage
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

kiwix mcp mcp-server openzim zim

ReleasePort's take

Light signal
editorial:auto 11d

The release extends the disambiguation lead‑phrase list with a "may refer also to" variant, addressing silent incorrect answers. It also cleans up annotations and refactors test utilities.

Why it matters: Fixes misinterpretation of ambiguous leads; refactor reduces code redundancy. No measurable gate or trigger specified in the fact text.

Summary

AI summary

Extended disambiguation lead‑phrase list to include "may refer also to", fixing silent wrong answers.

Changes in this release

Bugfix Medium

Extend _DISAMBIG_LEAD_PHRASES with "may refer also to" variant.

Extend _DISAMBIG_LEAD_PHRASES with "may refer also to" variant.

Source: llm_adapter@2026-05-24

Confidence: high

Refactor Medium

Remove redundant string quotes from three annotations in synthesize.py.

Remove redundant string quotes from three annotations in synthesize.py.

Source: llm_adapter@2026-05-24

Confidence: low

Refactor Medium

Extract make_disambig_handler to shared fixture _promote_fixtures.make_disambig_handler.

Extract make_disambig_handler to shared fixture _promote_fixtures.make_disambig_handler.

Source: llm_adapter@2026-05-24

Confidence: low

Other Medium

Add 7 new tests covering disambig lead phrase variants and integration cases.

Add 7 new tests covering disambig lead phrase variants and integration cases.

Source: llm_adapter@2026-05-24

Confidence: low

Full changelog

Post-b12 live-MCP verification confirmed the Z4 multi-token canonical
fix lands cleanly (7/8 historical defects now route correctly) and
the Sub-pattern C disambig rejection works for Lincoln / O'Brien.
One new silent-wrong-answer slipped through:
Shakespeare England plays at v2.0.0b12 still ships Play
(disambig page) at cert=0.85.

Root cause — phrasing variant not in _DISAMBIG_LEAD_PHRASES

_is_disambig_lead runs a trailing-tail endswith check against
the phrase set ("may refer to", "may also refer to"). The
Wikipedia Play disambig template ends its pre-H2 with:

Play may refer also to:

Word order: may-refer-also-to (NOT may-also-refer-to). The
two-phrase set misses this variant, so _is_disambig_lead returns
False, the b12 Sub-pattern C rejection doesn't fire, and the Play
disambig page is served as the tell_me_about answer.

The b11 implementation comment at simple_tools.py:2660 explicitly
anticipated this: "easier to extend with new phrasings if ZIM
exporters ever produce them".

Fix — extend _DISAMBIG_LEAD_PHRASES with the third variant

One-line tuple extension:

_DISAMBIG_LEAD_PHRASES = (
    "may refer to",
    "may also refer to",
    "may refer also to",  # b13 fix: Play-style word order
)

No regex, no backtracking risk, no architectural change. The
trailing-tail endswith check still position-anchors against
false-positives where the phrase appears earlier in the body but
not at the tail.

Verification

Live-MCP probe of all documented preserved cases plus the 8 Z4
defect repros from b11. After b13: Shakespeare England plays
falls to BM25 (Z4 + Sub-pattern C combine to reject Shakespeare's_Kings
AND Play disambig). All other 7 Z4 defects continue routing
correctly (4 to head bios, 3 to tail concepts / BM25). 13/13
preserved cases hold; no regressions.

CodeQL alert #231 — unquote forward refs to TYPE_CHECKING imports

CodeQL's py/unused-import flagged RerankerConfig as unused
in synthesize.py because two annotations used explicit string-
quoting ("Optional[RerankerConfig]") which the static analyzer
treats as opaque string literals rather than deferred forward
references.

Under from __future__ import annotations (line 14 of synthesize.py),
ALL annotations are automatically stringified at runtime — explicit
quotes are redundant and serve only to hide the import usage from
static analyzers. Fix: remove the redundant string quotes from three
annotations (lines 1041 / 1444 / 1538). mypy / runtime behavior
unchanged.

Test dedupe — extract make_disambig_handler to shared fixtures

SonarCloud flagged 6.2% new-code duplication (threshold 3%) because
the b13 sweep's TestPlayDisambigRejection._make_handler was a
copy of b11's TestSubPatternCDisambigRejection._make_handler.
Extracted to tests/_promote_fixtures.make_disambig_handler,
both sweep files now import the shared helper. Same dedup pattern
the post-b8 sweep used when it created _promote_fixtures.py.

Tests

7 new tests in tests/test_post_b12_beta_fixes.py:

  • 5 direct unit tests on _is_disambig_lead covering all three
    phrase variants + Play-style full pre-H2 + false-positive defense.
  • 2 integration tests: Shakespeare England plays (multi-token →
    BM25 fallback) and tell me about Play (bare-head → preserve
    disambig).
2562 passed, 54 skipped (full suite, ~28s)

mypy / black / flake8 / pip-audit all clean.

Methodology — "fix unlocks new paths" 20 sweeps strong

Smallest sweep since b6 — one-line phrase extension. The b11
Sub-pattern C rejection architecture was solid; only the underlying
detection primitive needed a phrase variant added. This is the
"easy to extend" promise of the b11 design paying off.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track cameronrye/openzim-mcp

Get notified when new releases ship.

Sign up free

About cameronrye/openzim-mcp

Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.

All releases →

Related context

Earlier breaking changes

  • v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
  • v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
  • v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
  • v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
  • v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.

Beta — feedback welcome: [email protected]