This release includes 1 breaking change for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
Affected surfaces
ReleasePort's take
Light signalThe v2.0.0a9 release fixes cache accounting bugs and adds clearer search error messages.
Why it matters: Patch to v2.0.0a9 immediately if you use the cache system; it prevents silent eviction bypasses.
Summary
AI summaryCache accounting fixes prevent silent eviction bypass and improve search error handling.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Breaking | Medium |
HTTP rate-limiter client_id now derived from token or IP; defaults to "default" fallback. HTTP rate-limiter client_id now derived from token or IP; defaults to "default" fallback. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Feature | Medium |
get_related_articles response includes scan_truncated, scan_total_internal, scan_limit and _meta.reason when cap fires. get_related_articles response includes scan_truncated, scan_total_internal, scan_limit and _meta.reason when cap fires. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Performance | Medium |
Removed unnecessary JSON parse outside lock in _load_from_disk, now inside critical section. Removed unnecessary JSON parse outside lock in _load_from_disk, now inside critical section. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Performance | Medium |
Removed three mypy errors by narrowing types and adding explicit casts. Removed three mypy errors by narrowing types and adding explicit casts. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Performance | Medium |
Moved JSON parsing inside the _load_from_disk lock critical section to avoid race conditions. Moved JSON parsing inside the _load_from_disk lock critical section to avoid race conditions. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Deprecation | Medium |
openzim_mcp.types module removed; consumers must use openzim_mcp.tool_schemas. openzim_mcp.types module removed; consumers must use openzim_mcp.tool_schemas. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
create_snippet no longer collapses to bare "..." on leading-highlight truncation. create_snippet no longer collapses to bare "..." on leading-highlight truncation. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
render_search_all emits distinct error hint when all archives errored. render_search_all emits distinct error hint when all archives errored. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
_restore_entry now updates _total_bytes symmetrically with set and _remove. _restore_entry now updates _total_bytes symmetrically with set and _remove. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
_load_from_disk enforces max_size or max_bytes against loaded snapshot after load. _load_from_disk enforces max_size or max_bytes against loaded snapshot after load. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Refactor | Medium |
Deleted dead module alpha clean break per v2 plan. Deleted dead module alpha clean break per v2 plan. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Refactor | Medium |
Deleted openzim_mcp/types module and its test suite as dead code. Deleted openzim_mcp/types module and its test suite as dead code. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Refactor | Low |
Deleted dead code module (plus its test suite) as part of the v2 clean break plan. Deleted dead code module (plus its test suite) as part of the v2 clean break plan. Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
Full changelog
Follow-up review wave after the post-a8 batch (commit d3e310e). 4
parallel code-reviewer agents covered Phases A/B/C plus cross-cutting
concerns; 13 findings were verified, 8 were withdrawn after closer
read (the suspected bug was either already correct or by-design), 5
were real defects. The 4 items the post-a8 batch explicitly deferred
("bigger than this batch") are now closed.
Net: 1420 tests pass (11 new red-green-verified regression tests),
50 skipped. One module + its test suite deleted as dead code — alpha
clean break per v2 plan.
Fixed — Critical (post-a9)
- A1: cache
_restore_entryskipped_total_bytesaccounting.
After a warm-start with persistence enabled,max_byteseviction
read zero for_total_bytes— thewhile self._total_bytes > max_bytes
loop inset()never fired even on a snapshot that already
exceeded the configured cap. The byte budget was silently
inoperative across every restart until enough new sets accumulated
to cross the threshold from zero. Now_restore_entryupdates
_total_bytes += entry.size_bytessymmetrically withset()and
_remove(). - A2: cache
_load_from_diskdid not enforcemax_sizeor
max_bytesagainst the loaded snapshot. Operators tightening
caps between restarts saw the prior caps until eviction was
triggered by new sets. Added a post-load eviction pass using the
same LRU heapset()maintains.
Fixed — Medium (post-a9)
- A3:
create_snippetcollapsed to bare"..."on leading-highlight
truncation. When the post-highlight slice began with**at
position 0 (an unpaired marker landing inside the first highlighted
term),sliced[:0]produced""and the caller saw a content-free
ellipsis. Now drops the orphan**marker and keeps the term text. - A4:
render_search_allblamed the query when every archive
errored.files_with_hits == 0emitted "Trysuggestions for X"
prose for both "no matches" and "all archives failed" cases, sending
the model to chase a query-correction fix for a server-side problem.
Now branches onfiles_failed >= files_searchedand emits a
targeted "all archives errored" hint.
Added — Opportunity (post-a9)
- A5:
get_related_articlessurfaces scan-truncation signal. Hub
articles ("List of …", "Index of …") routinely carry 1000–5000
internal links; the underlyingextract_article_links_datawas
called withlimit=500and the frequency rank was operating on a
document-head-biased sample with no signal to callers. Response now
carries optionalscan_truncated/scan_total_internal/
scan_limitand_meta.reason="scan_truncated"when the cap fired.
Added to theRelatedArticlesResponseTypedDict intool_schemas.py.
Deferred items resolved (post-a9)
- D1 (cross-cutting H1): HTTP rate-limiter
client_idalways
"default". Everycheck_rate_limitcall acrosstools/*.py
passed noclient_id, so the per-(client_id, operation) bucket
infrastructure was dead in HTTP mode — one aggressive caller could
exhaust the global bucket for everyone. Added
openzim_mcp/request_context.pywith aContextVar[str];
BearerTokenAuthMiddlewarederives client_id from the presented
token ("bearer:<sha256-8>") or remote IP ("ip:<host>") and sets
the context var on every request;check_rate_limitreads the var
whenclient_id=None(the default at every tool call site). Stdio
transport has no middleware so the ContextVar reads its"default"
fallback — single-bucket behavior preserved. No tool call sites
changed. - D2 (cross-cutting H3):
_load_from_diskJSON parse moved inside
the_lockcritical section. The prior window (file open +
json.loadoutside the lock, restore inside) was narrow — only
__init__-time threads could race — but a foreign-thread regression
probe now verifies the lock is held duringopen(). Single brief
startup blocking window, no contention in production. - D3 (Phase B HIGH-4):
openzim_mcp/types.py+tests/test_types.py
deleted. The module last shipped in v1.0.0; its TypedDicts
(SearchResponsewithtotal_results/has_more,NamespaceInfo
withentry_count/has_more/offset/limit) contradicted
the live Phase B contract intool_schemas.py. Only the test file
imported from it (32 tests pinning dead code). Removed both —
v2 alpha allows clean breaks per the v2 plan. - D4: 3 pre-existing mypy errors fixed.
content_processor. _cell_belongs_to_infoboxnarrowed via intermediatenode_bound: Tag
local so the closure default carries the post-guard type;
simple_tools._splice_title_match_into_searchcall site added
explicitcast(SearchResponse, ...)/cast(Dict[str, Any], ...)
bridges between the TypedDict and the splice helper signature.
Withdrawn findings (post-a9, 8)
After verification each was either correct as-written or by-design:
- browse_namespace sampled-cache poisoning — the underlying
per-namespace listing is cached separately by archive_stat_token,
so per-page responses are deterministic after the first call. - bundle parent_stack not popped for dropped sections — the pop loop
is level-relative, correctly handles dropped sections. - synthesize outer /
_metatotal_chars divergence — by intentional
design (outer = answer length,_meta= pre-cap chars). - heading regex mandatory space — html2text always emits the space.
_find_entry_typo_fallbackextra_probes cap overshoot — the cap
holds; initial analysis was wrong.- cursor
nsfield bypassessanitize_input—sanitize_inputIS
called on the post-cursor namespace value at the tool layer. _walk_new_scheme_metadatamissingaifield — only fires when
validated_path=None, which does not happen in production.synthesize.fallback_usedsemantics on empty hits — the TypedDict's
Literalconstraint precludes a more accurate value.
Wire-format / surface changes
openzim_mcp.typesmodule removed. Any external consumer
importing fromopenzim_mcp.typesmust move to
openzim_mcp.tool_schemas. The v1 shapes (total_results/
has_more) are gone; the v2 Phase B shapes (total/done/
next_cursor/page_info) are authoritative.get_related_articlesresponse gains optional keys.
scan_truncated,scan_total_internal,scan_limitplus
_meta.reason="scan_truncated"when the 500-link scan cap fired.
Existing callers that ignore the new keys see no behavior change.
Breaking Changes
- Removed `openzim_mcp/types.py` module; import from `openzim_mcp/tool_schemas` instead.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About cameronrye/openzim-mcp
Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.
Related context
Related tools
Earlier breaking changes
- v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
- v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
- v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
- v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
- v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.
Beta — feedback welcome: [email protected]