This release adds 2 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
Summary
AI summaryCanonical splice now requires exact path equality, fixing missing canonical articles like Berlin.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Breaking | Medium |
canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes. canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Feature | Medium |
block‑cell joiner now emits "; " between block boundaries for distinct item tokenisation by downstream LLMs. block‑cell joiner now emits "; " between block boundaries for distinct item tokenisation by downstream LLMs. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Feature | Medium |
Chained‑intent splitter detects bare‑topic chains like `Biology; Chemistry` and continuation prefixes like `tell me about X and then about Y`. Chained‑intent splitter detects bare‑topic chains like `Biology; Chemistry` and continuation prefixes like `tell me about X and then about Y`. Source: granite4.1:30b@2026-05-22-audit Confidence: low |
— |
| Bugfix | Medium |
search for Berlin in namespace C now returns canonical Berlin at rank #1. search for Berlin in namespace C now returns canonical Berlin at rank #1. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
orphan-bullet sub-rows consistently anchored to section parent. orphan-bullet sub-rows consistently anchored to section parent. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
list_namespaces reports M=12, matching walk namespace M and metadata for. list_namespaces reports M=12, matching walk namespace M and metadata for. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
Biology; Chemistry detected as chained query, no longer resolves to journal title. Biology; Chemistry detected as chained query, no longer resolves to journal title. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
infobox cells with <br>‑separated values render with "; " separator between block boundaries. infobox cells with <br>‑separated values render with "; " separator between block boundaries. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Bugfix | Medium |
tell me about X and then about Y detected as chained query. tell me about X and then about Y detected as chained query. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
L2 chained‑intent trim handles both orphan connectors and trailing punctuation. L2 chained‑intent trim handles both orphan connectors and trailing punctuation. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Bugfix | Medium |
legacy not‑found responses structured with intent telemetry and recovery commands for show structure of, summary of, get article, links in. legacy not‑found responses structured with intent telemetry and recovery commands for show structure of, summary of, get article, links in. Source: llm_adapter@2026-05-21 Confidence: low |
— |
Full changelog
Three-pass beta-test of v2.0.0a12 against the same 118 GB Wikipedia
ZIM (Feb 2026 snapshot) the a8 → a12 cuts targeted, via the simple-
mode zim_query MCP surface. The pattern across the alpha series
continues to diminish (a10: 22+6+3, a11: 11+3+1, a12: ~6+2+0 split
across the same three-pass shape — first pass surfaced six defects,
second pass two structural gaps, third pass zero new).
The single most user-visible defect was search for Berlin in namespace C rendering List_of_songs_about_Berlin at rank #1 with
the canonical Berlin article absent. The H2 canonical-splice gate
short-circuited to the legacy search_with_filters whenever the top
BM25 hit token-prefix-matched the topic — is_strong_title_match
returns True for any candidate that extends the topic
(Berlin_(disambiguation) extends Berlin), so the splice never
fired for new-scheme archives that have a disambig page for the
topic. Tightening the gate to require exact path equality fixes the
H2/H3 surface end-to-end for every shape, not just the case the a12
third-pass self-audit addressed.
The recurring infobox-cell concatenation bug (5th in Europe1st in Germany) got its final user-visible fix this cycle: the a10/a11
sweep added a space separator between block-level cell children, but
a downstream small LLM still tokenised 5th in Europe 1st in Germany as one phrase. The block-cell joiner now emits "; "
between block boundaries so each value reads as a distinct item.
Net: 1513 tests pass (+20 over v2.0.0a12), 50 skipped, 38
deselected. black / isort / flake8 / mypy all clean.
Fixed — High (post-a12 beta sweep)
- D1: orphan-bullet sub-rows chained the previous row's full label
as their parent.tell me about Francerendered
**Government — • President:** Macron(correct) but then
**• President — • Prime Minister:** Lecornu,
**• Prime Minister — • President of the Senate:** Larcher
(wrong — the parent kept shifting). Same shape in the USA infobox.
Berlin'sGovernmentsub-rows happened to render correctly because
Wikipedia marked them differently in HTML. Root cause: the
virtual-parent extractor for orphan-bullet rows used
prev_label.split(" — ", 1)[-1](trailing segment) instead of
[0](original parent). Each bullet row's parent inherited the
PREVIOUS bullet's label rather than the constant section parent.
Fixed by taking the original parent. - D2:
list_namespacesreports M=13 whilewalk namespace M/
metadata forreport 12. The a12 M1 fix plumbed the shared
is_human_readable_metadata_keypredicate to two of three
reporting surfaces but missed_add_new_scheme_metadata_namespace
in the namespace walker.list_namespacesreported the raw libzim
count (13, including theIllustration_48x48@1binary entry)
while the other two filtered. Added the predicate to the third
site so all three surfaces agree on 12. - D3 / D4: chained-intent splitter missed two recurring-set
shapes.Biology; Chemistry(bare topics,;connector) fell
through to topic-fetch and resolved toComputational_Biology_&_ Chemistry(a journal).tell me about Photosynthesis and then about DNA(single-imperative-prefix continuation, right side is
about DNA) fell through to full-text search on the literal
phrase. The splitter required an operation verb on BOTH sides of
the connector. D3 adds a bare-topic-chain branch that wraps both
halves withtell me aboutwhen the connector is unambiguous
(;/then/and then/after that/, then) AND both
halves are topic-shaped (≤6 tokens, no internal connectors). D4
adds a continuation-prefix branch that re-prefixes the right half
with the left's verb when the right starts with
about/of/for/with/on/in/into/to. A
negative-case guard prevents the bare-topic branch from
over-triggering when a half is JUST an operation verb prefix with
no topic content (tell me about then and now— the connector
was inside the topic name, not a chain marker). - D5: H2 canonical-splice early-return fired on any token-prefix
strong match. The gate at the top of the populated-results
branch invokedis_strong_title_match(query, top.path, top.title)
to decide whether to short-circuit to the legacy
search_with_filterspath (avoiding canonical duplication when
BM25 already returned a strong hit). But the matcher returns True
for any candidate that extends the topic via prefix
(Berlin_(disambiguation)extendsBerlin,
Apollo_(disambiguation)extendsApollo,
List_of_…_named_after_XextendsX). For new-scheme Wikipedia
archives — where a disambig page nearly always sits next to the
canonical — the gate fired on the disambig and the splice never
ran. Tightened totop_path == canonical_pathso the splice's
reorder logic handles canonical promotion in every other shape.
As a side effect this also unblocks H3's list-article demote,
which lives inside the same splice block.
Fixed — Medium (post-a12 beta sweep)
- D6: L2 trailing-punctuation trim only stripped one category per
call.tell me about DNA, and then tell me about Photosynthesis
split onthento left=tell me about DNA, and(after
trimming) → only the orphanandgot stripped, the trailing,
stayed. Thefor/elseshape entered the punctuation branch only
when no connector matched. Reworked to loop until stable so the
trim handles any combination of orphan connector word + trailing
;/,in any order. - D7: block-level cell separator was a bare space — final fix.
The a10/a11 fix turned5th in Europe1st in Germanyinto
5th in Europe 1st in Germany(space separator at block
boundaries) so cells with<br>/<li>/<p>children no longer
concatenated without a separator. But downstream LLMs still
tokenised the space-separated form as a single phrase. Upgraded
the block-cell joiner to emit"; "between block boundaries so a
population-rank cell like<td>5th in Europe<br>1st in Germany</td>renders as5th in Europe; 1st in Germany— two
distinct values, same row label. Inline span groups (number
formatting3,913,644, coordinates52°31′N) still concatenate
directly per the a11 second-pass invariant.
Fixed — Low (post-a12 beta sweep)
- D8: legacy unstructured
**Error Processing Query**template
on four not-found surfaces.show structure of nonexistent_x,
summary of nonexistent_x,get article nonexistent_x, and
links in nonexistent_xall let their backend exception
propagate to the top-levelhandle_zim_queryexceptblock,
which emitted a generic template with: no intent telemetry
comment (<!-- intent=... cert=... -->was added in a12 L1 but
only for the structured early-return paths), Python helper-name
leakage (Try using search_zim_file()/browse_namespace()—
none of which are MCP-surface commands), and unhelpful
troubleshooting refs (Check server logs— not accessible from
the MCP surface).articles related to nonexistent_xwas
already modernised in a10 F3. Added a
_render_not_found_recoveryhelper that returns the modernised
shape (**Article not found: \path`**+suggestions for/find article titled/search forrecovery) and wrapped the four handler delegations withtry/except. The outerhandle_zim_query` now layers the intent telemetry on success
because the handlers return a string instead of raising.
Wire-format / surface changes (alpha-line clean breaks)
tell me about Francerenders consecutive bullet sub-rows
consistently anchored to the section parent. Pre-fix every
Wikipedia country article showed a chained sequence like
**• President — • Prime Minister:** .../
**• Prime Minister — • President of the Senate:** .... Post-fix
each row reads**Government — • Prime Minister:** ....list_namespacesreports M=12 (matchingwalk namespace Mand
metadata for) for archives whose only non-human-readable M
entry isIllustration_*. Pre-fix M=13.Biology; Chemistryis detected as a chained query. Pre-fix
it silently resolved toComputational_Biology_&_Chemistry.
Other bare-topic chains (DNA then Photosynthesis,Berlin and then Munich) likewise.tell me about X and then about Yis detected as a chained
query. Pre-fix the right half (about Y) wasn't recognised as
an op verb continuation; the query fell through to full-text
search on the literal phrase.tell me about then and now(a topic whose name contains
then) passes through unchanged. The bare-topic chain branch
guards against incomplete-verb halves so connector-in-topic
queries aren't mis-classified.search for Berlin in namespace Creturns the canonical
Berlinat rank #1. Pre-fix it returned
[List_of_songs_about_Berlin, Berlin_(disambiguation), Timeline_of_Berlin]with the canonical absent. Similar shape on
every namespace-C archive that has a disambig page for the topic.- L2 chained-intent trim handles both orphan connectors and
trailing punctuation.tell me about DNA, and then …renders
the left op astell me about DNA(no trailing,orand). - Wikipedia infobox cells with
<br>-separated values render with
a"; "separator between values.**Rank:** 5th in Europe; 1st in Germanyinstead of5th in Europe 1st in Germany. Inline
span groups (number formatting, coordinates) unchanged. show structure of/summary of/get article/links in
not-found responses are structured guidance with intent telemetry
and concrete recovery commands. Same shapearticles related tohas carried since a10. Pre-fix these four used a legacy
template with no telemetry and Python helper-name leakage.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About cameronrye/openzim-mcp
Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.
Related context
Related tools
Earlier breaking changes
- v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
- v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
- v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
- v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.
- v2.0.0a10 Infobox extraction now emits trailing rows without the preceding "GDP —" label, changing bullet-label strings.
Beta — feedback welcome: [email protected]