This release includes breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
ReleasePort's take
Light signalv2.0.0a14 enhances entity resolution for long prose queries and boosts section relevance in synthesize mode, while introducing new configuration options and refactoring internal matching logic.
Why it matters: Improves accuracy of canonical entity extraction for lengthy inputs and surfaces the most pertinent sections; new `section_affinity_threshold` and `section_affinity_boost` parameters let teams fine‑tune behavior before adopting v2.0.0a14.
Summary
AI summaryProse questions now resolve canonical entities and lead with the most relevant section when synthesize is enabled.
Changes in this release
| Type | Severity | Summary | CVE |
|---|---|---|---|
| Feature | Medium |
Greedy length-down tail-probe entity resolution improves long prose query handling. Greedy length-down tail-probe entity resolution improves long prose query handling. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Feature | Medium |
Section-heading affinity boost in synthesize mode promotes relevant sections to lead passage. Section-heading affinity boost in synthesize mode promotes relevant sections to lead passage. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Feature | Medium |
Multi-round handles added to SynthesizeResponse expose candidate articles and sections for follow-up turns. Multi-round handles added to SynthesizeResponse expose candidate articles and sections for follow-up turns. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Feature | Medium |
_boost_by_section_affinity pipeline stage computes section affinity boost based on query and heading token overlap. _boost_by_section_affinity pipeline stage computes section affinity boost based on query and heading token overlap. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Feature | Medium |
SynthesizeConfig parameters `section_affinity_threshold` and `section_affinity_boost` added for tuning affinity boosting. SynthesizeConfig parameters `section_affinity_threshold` and `section_affinity_boost` added for tuning affinity boosting. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Feature | Medium |
ConsideredArticle and ConsideredSection TypedDicts defined in `tool_schemas.py` to structure candidate handles. ConsideredArticle and ConsideredSection TypedDicts defined in `tool_schemas.py` to structure candidate handles. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Feature | Medium |
SynthesizeResponse TypedDict now `total=False` to accommodate new optional fields without affecting existing callers. SynthesizeResponse TypedDict now `total=False` to accommodate new optional fields without affecting existing callers. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Dependency | Medium |
`iter_query_tails` helper introduced in `title_promotion.py` for shared trailing-token iteration. `iter_query_tails` helper introduced in `title_promotion.py` for shared trailing-token iteration. Source: llm_adapter@2026-05-21 Confidence: low |
— |
| Refactor | Medium |
_promote_title_match removed M26 4-token short-circuit, enabling full tail probing for long queries. _promote_title_match removed M26 4-token short-circuit, enabling full tail probing for long queries. Source: llm_adapter@2026-05-21 Confidence: high |
— |
| Refactor | Medium |
_promote_topic_via_title_index rewritten as two-pass strict then fuzzy tail probe to prioritize exact matches. _promote_topic_via_title_index rewritten as two-pass strict then fuzzy tail probe to prioritize exact matches. Source: llm_adapter@2026-05-21 Confidence: high |
— |
Full changelog
First post-beta-test alpha that ships a feature rather than a sweep:
natural-language prose questions now resolve to canonical entities
and (in synthesize=True mode) lead with the most relevant section
of the resolved article. Three coordinated changes:
-
Greedy length-down tail-probe entity resolution. A shared
iter_query_tailshelper intitle_promotion.pyiterates the
trailing 4 → 3 → 2 → 1 tokens of a query. Both the default
_handle_tell_me_aboutpath (via_promote_topic_via_title_index,
two-pass strict-then-fuzzy) and the synthesize path (via
_promote_title_match, single-pass strict) now probe each tail.
This replaces the M26 4-token short-circuit that previously caused
long prose queries like "who are some famous people from big
rapids, michigan" to fall through to BM25 noise instead of
resolving the canonicalBig_Rapids,_Michiganentity. -
Section-heading affinity boost in synthesize. A new
_boost_by_section_affinitypipeline stage runs after
_attribute_sections. For each passage carrying a#section_id,
it computes|query_tokens ∩ heading_tokens| / |heading_tokens|.
When that ratio meetsSynthesizeConfig.section_affinity_threshold
(default0.25), the passage score is multiplied by
section_affinity_boost(default1.5) and the list is
re-sorted (withrankrenumbered to match). Archive-agnostic:
the archive's own section headings supply the matching
vocabulary, no curated synonym tables. -
Multi-round handles on
SynthesizeResponse. Two new optional
fields surface the candidate space:
considered_articles(top-3 article hits not featured) exposes
(archive, entry_path, title, score)so a follow-up turn can pivot
viaget_zim_entries.considered_sections(top-10 sections of
the featured article, in document order, minus the featured one)
exposes(section_id, title)so a follow-up turn can pivot via
get_section.SynthesizeResponseswitches to
TypedDict(total=False)to accommodate the additive shape;
existing callers populating every field are unaffected. Compact-
mode markdown rendering of these fields is deferred — the
structured payload (structuredContent) always carries them.
The motivating query "who are some famous people from big rapids,
michigan" now traces:
- Default mode: tail probe resolves
Big_Rapids,_Michigan, returns
the article body. Better than today's BM25-noise outcome, though
the response is not yet section-targeted in default mode. synthesize=True: tail probe resolves the entity, affinity boost
promotes the#Notable_peoplesection to the lead passage, and
the response carriesconsidered_articles+considered_sections
handles for the next turn.
Added
iter_query_tails(query, *, max_len=4, min_len=1)in
openzim_mcp/title_promotion.py— greedy length-down trailing-
token iterator, lowercased +[a-z0-9]+tokenized. Shared by both
entity-resolution paths. Underscore is treated as a token boundary
so path-form input likeBig_Rapids,_Michigantokenizes correctly._boost_by_section_affinitypipeline stage in
openzim_mcp/synthesize.pyplus the_section_titles_forand
_maybe_boost_passagehelpers. Bundle-titles lookup is memoized
per call; exceptions andNonebundles are no-ops (score unchanged).SynthesizeConfig.section_affinity_threshold(default0.25,
bounds[0.0, 1.0]) andsection_affinity_boost(default1.5,
bounds[1.0, 10.0]) — Pydantic-validated tunables for the new
stage.ConsideredArticleandConsideredSectionTypedDicts in
openzim_mcp/tool_schemas.py._build_considered_articlesand_build_considered_sections
helpers inopenzim_mcp/synthesize.py. Featured article and
section are excluded so the lists are alternatives, not
duplicates of the featured citation.
Changed
_promote_title_matchinsynthesize.py: removed the M26 4-token
short-circuit. Long prose queries with a clear entity tail now
resolve canonically instead of falling through to BM25 noise._promote_topic_via_title_indexinsimple_tools.py: rewritten
as a two-pass tail-probe (strict 1.0-score gate across all tails
first, then 0.8-score typo-tolerant gate across all tails). The
two-pass ordering prevents a fuzzy 0.8 match on a long noisy tail
from winning over an exact 1.0 match on a clean shorter tail.SynthesizeResponseTypedDict is nowtotal=Falseto accommodate
the new optional fields. Existing callers populating every field
are unaffected.
Tests
- 46 new unit tests across
tests/test_iter_query_tails.py,
tests/test_simple_tools_tail_probe.py,
tests/test_synthesize_section_affinity.py,
tests/test_synthesize_considered_handles.py, and additions to
tests/test_synthesize_title_promotion_v2a9.pyand
tests/test_tool_schemas.py. Test count: 1567 → 1566 (one less
because two affinity-boost tests with identical setup blocks were
merged into one combined assertion; SonarCloud flagged the
intra-file duplication). - Three golden snapshots refreshed
(synthesize_berlin_geography.json,synthesize_munich_history.json,
synthesize_capital_city.json) — the newconsidered_*fields are
always emitted, and the score change from1.0 → 1.5on
entity-name section headings reflects the affinity boost firing. test_metadata_namespace_from_metadata_keysthreshold relaxed
from>= 10to>= 5after an upstreamzim-testing-suite
fixture refresh changednons/small.zim's metadata-key count
from 10 to 9 (broke comprehensive-testing onmainbefore this
alpha was cut).
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About cameronrye/openzim-mcp
Modern, secure MCP server for accessing ZIM format knowledge bases offline. Enables AI models to search and navigate Wikipedia, educational content, and other compressed knowledge archives with smart retrieval, caching, and comprehensive API.
Related context
Related tools
Earlier breaking changes
- v2.0.0a15 _attribute_sections falls back to first section when no section brackets located passage
- v2.0.0a13 canonical‑splice gate tightened to require exact path equality, fixing H2/H3 surface end‑to‑end behavior across all shapes.
- v2.0.0a11 Exposed `content_offset` as top-level `zim_query` parameter, validated >=0, threaded through options.
- v2.0.0a10 `get article M/<key>` now returns ZIM metadata entry rather than aliased C-namespace article body.
- v2.0.0a10 `metadata for <file>` returns concise metadata strings instead of full article bodies for new-scheme archives.
Beta — feedback welcome: [email protected]