This release adds 1 notable feature for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
Affected surfaces
ReleasePort's take
Light signalThe v2.0.0b9 release fixes a non‑possessive multi‑token tail hijack issue and adds possessive fuzzy suggestion support.
Why it matters: Fixes incorrect canonical selection in title‑suggest logic; new feature enables accept_possessive_promotion when the canonical contains a possessor token – critical for accurate fuzzy matching.
Summary
AI summaryUpdates HIGH, Redniss_book, and MEDIUM across a mixed release.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Adds possessive fuzzy_suggest acceptance when canonical contains possessor token Adds possessive fuzzy_suggest acceptance when canonical contains possessor token Source: llm_adapter@2026-05-23 Confidence: high |
— |
| Feature | Medium |
Adds 26 new tests covering post‑b8 fixes and integration guards Adds 26 new tests covering post‑b8 fixes and integration guards Source: llm_adapter@2026-05-23 Confidence: low |
— |
| Bugfix | High |
Prevents non-possessive multi-token tail hijack in fuzzy_suggest matches Prevents non-possessive multi-token tail hijack in fuzzy_suggest matches Source: granite4.1:30b@2026-05-23-audit Confidence: low |
— |
| Bugfix | Medium |
Fixes non‑possessive multi‑token tail hijack causing wrong canonical selection Fixes non‑possessive multi‑token tail hijack causing wrong canonical selection Source: llm_adapter@2026-05-23 Confidence: low |
— |
| Refactor | Medium |
Refactors accept_possessive_promotion into helper functions and extracts shared test fixtures to reduce duplication Refactors accept_possessive_promotion into helper functions and extracts shared test fixtures to reduce duplication Source: llm_adapter@2026-05-23 Confidence: low |
— |
Full changelog
Post-b8 sweep packaged from PR #178. Live-MCP verification against
v2.0.0b8 confirmed all prior b6/b7/b8 fixes land cleanly. ONE HIGH-
severity defect + ONE MEDIUM opportunity unlocked by deeper probing
of the non-possessive 3+ token shape.
Z3 (HIGH) — Non-possessive multi-token tail-hijack
The b4 D2 raised-min_len floor protected possessive topics from
trailing 1-token tails winning at strict 1.0. Non-possessive
multi-token queries still leaked the same hijack at Pass 0
(_promote_topic_via_title_index): libzim's title-suggest
fuzzy-matches a STRONG single token in the topic at score 0.95 and
returns just that token's canonical alone. The full-topic probe at
min_score=0.95 (added in b3) accepts the row because
accept_possessive_promotion returned True for any
non-possessive topic.
Live silent-wrong-answer repros at v2.0.0b8 (all cert=0.85):
Stalin USSR Russia→Russia(user wanted Stalin)Hitler Germany Berlin→Berlin(user wanted Hitler)Marie Curie polonium discovery→Discovery(a disambig
page!)Marie Curie radioactivity→Radioactive_(Redniss_book)
(an obscure 2010 graphic novel surfaced via stemming match)Big Rapids Michigan tourism→Tourism(contradicts the
iter_query_windowsdocstring's own canonical example,
Big_Rapids,_Michigan)O'Brien character 1984→1984(the year article)
Fix — non-possessive fuzzy_suggest gate
Two narrow rejections in the non-possessive branch when
match_type="fuzzy_suggest" and the topic has 3+ tokens:
- Tail-token hijack — canonical is a single token equal to
the topic's LAST token. The user typed
<subject> ... <generic>; libzim returned the generic
article.Hamlet Denmark prince→Hamletstays accepted
because the canonical sits at the HEAD position, not the tail. - Zero-overlap stemming hit — canonical's tokens have zero
exact-overlap with topic's tokens (the match was via stemming
only). The graphic novel surfaced forMarie Curie radioactivitybecause libzim's title index stems
radioactivitytoradioactive; no other topic token
matches the canonical, so the hit is one-stem-token-deep —
too thin a signal to auto-fetch.
Topics with fewer than 3 tokens are unaffected.
Counter-cases the fix preserves: Hamlet Denmark prince →
Hamlet, Napoleon France emperor → Napoleon,
Apollo 11 moon landing → Moon_landing,
quantum mechanics Einstein → Albert_Einstein,
Lincoln Gettysburg Address → Gettysburg_Address,
Berlin Germany → Berlin.
OPP-1 (MEDIUM) — Possessive fuzzy_suggest carve-out
The b6 D1 rule REJECTS every match_type="fuzzy_suggest" row
for a possessive topic. Live probe found this is too strict:
Newton's gravity falls to BM25 even though
Newton's_law_of_universal_gravitation is the obvious rank-1
BM25 canonical AND contains the possessor token newton
literally.
Refinement
For possessive topics + fuzzy_suggest, ACCEPT iff the
canonical path tokens include any of the topic's possessor tokens.
The canonical literally preserves the user's named entity,
signalling it's a longer-form expansion rather than the
Darwin's evolution → Evolution shape that drops the
possessor.
Decision matrix for possessive + fuzzy_suggest:
| Topic | Canonical | Decision |
| --- | --- | --- |
| Newton's gravity | Newton's_law_of_universal_gravitation | ACCEPT (OPP-1) |
| Mary's lamb | Mary_Had_a_Little_Lamb | ACCEPT |
| Darwin's evolution | Evolution | REJECT (b6 D1 preserved) |
| Plato's republic philosophy | Czech_philosophy | REJECT (b6 Z1 preserved) |
Tokenization uses _TOKEN_RE (apostrophe-splitting), same as
the b8 Z1.1 subset rule for redirects, so newton's in the
canonical surfaces as the bare token newton for comparison.
Refactor (Sonar S3776 + duplication)
Quality-gate-driven follow-ups landed in the same PR:
accept_possessive_promotionextracted three per-branch
helpers (_accept_non_possessive,
_accept_possessive_fuzzy_suggest,
_accept_possessive_redirect) to bring cognitive complexity
from 21 down under the 15 threshold. No behaviour change.- The three shared sweep test fixtures (
_make_simple_handler,
_fake_find_title_match,_run_promote_simple) moved to
tests/_promote_fixtures.py. b6/b7/b8 sweep test files now
import from the shared module instead of duplicating locally.
Tests
26 new tests in tests/test_post_b8_beta_fixes.py across 5
classes (TestZ3NonPossessiveTailHijack,
TestZ3RegressionGuards, TestOPP1PossessorInCanonical,
TestZ3PromoteIntegration, TestStructuralGuards).
2448 passed, 54 skipped (full suite, ~28s)
pip-audit: no known vulnerabilities
mypy clean across 52 source files. black + flake8 + isort clean.
Methodology — "fix unlocks new paths" 16 sweeps strong
Each sweep peels back another layer; the post-b8 sweep generalised
the b4 D2 raised-min_len protection to non-possessive multi-token
topics, and relaxed b6 D1's blanket-reject when the canonical
preserves the possessor literally.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About cameronrye/openzim-mcp
Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.
Related context
Related tools
Earlier breaking changes
- v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
- v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
- v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
- v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
- v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.
Beta — feedback welcome: [email protected]