cameronrye/openzim-mcp

v2.0.0a9 Breaking

This release includes 1 breaking change for platform teams planning a safe upgrade.

Published 2mo MCP Data & Storage

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

kiwix mcp mcp-server openzim zim

Affected surfaces

auth breaking_upgrade

ReleasePort's take

Light signal

editorial:auto 2mo

The v2.0.0a9 release fixes cache accounting bugs and adds clearer search error messages.

Why it matters: Patch to v2.0.0a9 immediately if you use the cache system; it prevents silent eviction bypasses.

Summary

AI summary

Cache accounting fixes prevent silent eviction bypass and improve search error handling.

Changes in this release

Type	Severity	Summary	CVE
Breaking	Medium	HTTP rate-limiter client_id now derived from token or IP; defaults to "default" fallback. HTTP rate-limiter client_id now derived from token or IP; defaults to "default" fallback. Source: llm_adapter@2026-05-21 Confidence: low	—
Feature	Medium	get_related_articles response includes scan_truncated, scan_total_internal, scan_limit and _meta.reason when cap fires. get_related_articles response includes scan_truncated, scan_total_internal, scan_limit and _meta.reason when cap fires. Source: llm_adapter@2026-05-21 Confidence: high	—
Performance
Performance	Medium	Removed unnecessary JSON parse outside lock in _load_from_disk, now inside critical section. Removed unnecessary JSON parse outside lock in _load_from_disk, now inside critical section. Source: llm_adapter@2026-05-21 Confidence: low	—
Performance	Medium	Removed three mypy errors by narrowing types and adding explicit casts. Removed three mypy errors by narrowing types and adding explicit casts. Source: llm_adapter@2026-05-21 Confidence: low	—
Performance	Medium	Moved JSON parsing inside the _load_from_disk lock critical section to avoid race conditions. Moved JSON parsing inside the _load_from_disk lock critical section to avoid race conditions. Source: granite4.1:30b@2026-05-23-audit Confidence: low	—
Deprecation	Medium	openzim_mcp.types module removed; consumers must use openzim_mcp.tool_schemas. openzim_mcp.types module removed; consumers must use openzim_mcp.tool_schemas. Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix
Bugfix	Medium	create_snippet no longer collapses to bare "..." on leading-highlight truncation. create_snippet no longer collapses to bare "..." on leading-highlight truncation. Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	render_search_all emits distinct error hint when all archives errored. render_search_all emits distinct error hint when all archives errored. Source: llm_adapter@2026-05-21 Confidence: high	—
Bugfix	Medium	_restore_entry now updates _total_bytes symmetrically with set and _remove. _restore_entry now updates _total_bytes symmetrically with set and _remove. Source: llm_adapter@2026-05-21 Confidence: low	—
Bugfix	Medium	_load_from_disk enforces max_size or max_bytes against loaded snapshot after load. _load_from_disk enforces max_size or max_bytes against loaded snapshot after load. Source: llm_adapter@2026-05-21 Confidence: low	—
Refactor
Refactor	Medium	Deleted dead module alpha clean break per v2 plan. Deleted dead module alpha clean break per v2 plan. Source: llm_adapter@2026-05-21 Confidence: low	—
Refactor	Medium	Deleted openzim_mcp/types module and its test suite as dead code. Deleted openzim_mcp/types module and its test suite as dead code. Source: llm_adapter@2026-05-21 Confidence: low	—
Refactor	Low	Deleted dead code module (plus its test suite) as part of the v2 clean break plan. Deleted dead code module (plus its test suite) as part of the v2 clean break plan. Source: granite4.1:30b@2026-05-23-audit Confidence: low	—

Full changelog

Follow-up review wave after the post-a8 batch (commit d3e310e). 4
parallel code-reviewer agents covered Phases A/B/C plus cross-cutting
concerns; 13 findings were verified, 8 were withdrawn after closer
read (the suspected bug was either already correct or by-design), 5
were real defects. The 4 items the post-a8 batch explicitly deferred
("bigger than this batch") are now closed.

Net: 1420 tests pass (11 new red-green-verified regression tests),
50 skipped. One module + its test suite deleted as dead code — alpha
clean break per v2 plan.

Fixed — Critical (post-a9)

A1: cache _restore_entry skipped _total_bytes accounting.
After a warm-start with persistence enabled, max_bytes eviction
read zero for _total_bytes — the while self._total_bytes > max_bytes
loop in set() never fired even on a snapshot that already
exceeded the configured cap. The byte budget was silently
inoperative across every restart until enough new sets accumulated
to cross the threshold from zero. Now _restore_entry updates
_total_bytes += entry.size_bytes symmetrically with set() and
_remove().
A2: cache _load_from_disk did not enforce max_size or
max_bytes against the loaded snapshot. Operators tightening
caps between restarts saw the prior caps until eviction was
triggered by new sets. Added a post-load eviction pass using the
same LRU heap set() maintains.

Fixed — Medium (post-a9)

A3: create_snippet collapsed to bare "..." on leading-highlight
truncation. When the post-highlight slice began with ** at
position 0 (an unpaired marker landing inside the first highlighted
term), sliced[:0] produced "" and the caller saw a content-free
ellipsis. Now drops the orphan ** marker and keeps the term text.
A4: render_search_all blamed the query when every archive
errored. files_with_hits == 0 emitted "Try suggestions for X"
prose for both "no matches" and "all archives failed" cases, sending
the model to chase a query-correction fix for a server-side problem.
Now branches on files_failed >= files_searched and emits a
targeted "all archives errored" hint.

Added — Opportunity (post-a9)

A5: get_related_articles surfaces scan-truncation signal. Hub
articles ("List of …", "Index of …") routinely carry 1000–5000
internal links; the underlying extract_article_links_data was
called with limit=500 and the frequency rank was operating on a
document-head-biased sample with no signal to callers. Response now
carries optional scan_truncated / scan_total_internal /
scan_limit and _meta.reason="scan_truncated" when the cap fired.
Added to the RelatedArticlesResponse TypedDict in tool_schemas.py.

Deferred items resolved (post-a9)

D1 (cross-cutting H1): HTTP rate-limiter client_id always
"default". Every check_rate_limit call across tools/*.py
passed no client_id, so the per-(client_id, operation) bucket
infrastructure was dead in HTTP mode — one aggressive caller could
exhaust the global bucket for everyone. Added
openzim_mcp/request_context.py with a ContextVar[str];
BearerTokenAuthMiddleware derives client_id from the presented
token ("bearer:<sha256-8>") or remote IP ("ip:<host>") and sets
the context var on every request; check_rate_limit reads the var
when client_id=None (the default at every tool call site). Stdio
transport has no middleware so the ContextVar reads its "default"
fallback — single-bucket behavior preserved. No tool call sites
changed.
D2 (cross-cutting H3): _load_from_disk JSON parse moved inside
the _lock critical section. The prior window (file open +
json.load outside the lock, restore inside) was narrow — only
__init__-time threads could race — but a foreign-thread regression
probe now verifies the lock is held during open(). Single brief
startup blocking window, no contention in production.
D3 (Phase B HIGH-4): openzim_mcp/types.py + tests/test_types.py
deleted. The module last shipped in v1.0.0; its TypedDicts
(SearchResponse with total_results / has_more, NamespaceInfo
with entry_count / has_more / offset / limit) contradicted
the live Phase B contract in tool_schemas.py. Only the test file
imported from it (32 tests pinning dead code). Removed both —
v2 alpha allows clean breaks per the v2 plan.
D4: 3 pre-existing mypy errors fixed. content_processor. _cell_belongs_to_infobox narrowed via intermediate node_bound: Tag
local so the closure default carries the post-guard type;
simple_tools._splice_title_match_into_search call site added
explicit cast(SearchResponse, ...) / cast(Dict[str, Any], ...)
bridges between the TypedDict and the splice helper signature.

Withdrawn findings (post-a9, 8)

After verification each was either correct as-written or by-design:

browse_namespace sampled-cache poisoning — the underlying
per-namespace listing is cached separately by archive_stat_token,
so per-page responses are deterministic after the first call.
bundle parent_stack not popped for dropped sections — the pop loop
is level-relative, correctly handles dropped sections.
synthesize outer / _meta total_chars divergence — by intentional
design (outer = answer length, _meta = pre-cap chars).
heading regex mandatory space — html2text always emits the space.
_find_entry_typo_fallback extra_probes cap overshoot — the cap
holds; initial analysis was wrong.
cursor ns field bypasses sanitize_input — sanitize_input IS
called on the post-cursor namespace value at the tool layer.
_walk_new_scheme_metadata missing ai field — only fires when
validated_path=None, which does not happen in production.
synthesize.fallback_used semantics on empty hits — the TypedDict's
Literal constraint precludes a more accurate value.

Wire-format / surface changes

openzim_mcp.types module removed. Any external consumer
importing from openzim_mcp.types must move to
openzim_mcp.tool_schemas. The v1 shapes (total_results /
has_more) are gone; the v2 Phase B shapes (total / done /
next_cursor / page_info) are authoritative.
get_related_articles response gains optional keys.
scan_truncated, scan_total_internal, scan_limit plus
_meta.reason="scan_truncated" when the 500-link scan cap fired.
Existing callers that ignore the new keys see no behavior change.

Breaking Changes

Removed `openzim_mcp/types.py` module; import from `openzim_mcp/tool_schemas` instead.

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track cameronrye/openzim-mcp

Get notified when new releases ship.

About cameronrye/openzim-mcp

Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.

All releases →

Related context

Related tools

Earlier breaking changes

v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.