This release adds 5 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
+5 more
Summary
AI summaryAdded fuzzy text matching normalization, structured revision extraction, DOCX diff generation to tracked changes, enhanced metadata stripping, and Presidio-powered PII redaction.
Full changelog
What's New
Fuzzy text matching (OOXML artifact normalization)
- Invisible character removal: soft hyphen (U+00AD), zero-width space (U+200B), word-joiner (U+2060) are stripped from both query and document text before matching — these commonly appear in copy-pasted legal text and were silently breaking find/replace
- Non-standard space normalization: thin space, figure space, hair space, narrow no-break space → regular space
ignore_caseparameter oninsert_text,delete_text,replace_text- Multi-run spanning:
replace_textanddelete_textnow handle text that spans multiple<w:r>runs
get_tracked_changes — structured revision extraction
Returns all pending w:ins / w:del tracked changes as structured JSON with type, change_id, author, date, para_id, and text fields.
compare_documents — DOCX diff to tracked changes
Diffs two DOCX files at paragraph level (LCS via difflib) and produces a third document where changes appear as Word tracked revisions (w:ins / w:del), ready for review in Word or LibreOffice.
sanitize_metadata — three-level metadata stripping
- Level 1: remove rsid session-fingerprint attributes
- Level 2: anonymize tracked-change authors (
w:author) - Level 3: clear
dc:creator,cp:lastModifiedBy,cp:revisionincore.xml; clearCompanyinapp.xml; removeattachedTemplate/rsidsfromsettings.xml
scrub_pii — Presidio-powered PII redaction
- Detects PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, SSN, IP_ADDRESS, IBAN_CODE, and all other Presidio entity types via spaCy NER (
en_core_web_lg) - Deduplication pass: every occurrence of a detected entity string is redacted, including instances the NER model missed
- Redacted runs rendered as black bars (
w:highlight val="black") — standard legal redaction appearance in Word dry_run=Truemode returns entity list without modifying the documententities=[...]filter for targeted redaction (e.g. email-only)- Requires:
pip install "docx-mcp-server[pii]"+python -m spacy download en_core_web_lg
Install / Upgrade
pip install --upgrade docx-mcp-server
# For PII scrubbing:
pip install "docx-mcp-server[pii]"
python -m spacy download en_core_web_lg
Full Changelog: https://github.com/SecurityRonin/docx-mcp/compare/v0.3.1...v0.4.0
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About SecurityRonin/docx-mcp
Read and edit Word (.docx) documents with track changes, comments, footnotes, and structural validation. The only MCP server combining w:ins/w:del tracked changes, threaded comments, and footnotes with OOXML-level paraId validation and document auditing. 18 tools, Python 3.10+.
Related context
Related tools
Earlier breaking changes
- v0.6.1 Empty `document_handle` resolves to `__default__` slot, maintaining backward compatibility.
Beta — feedback welcome: [email protected]