Skip to content

SecurityRonin/docx-mcp

v0.4.0 Feature

This release adds 5 notable features for engineering teams evaluating rollout.

Published 28d MCP Developer Tools
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

ai-tools comments document-editing docx footnotes mcp
+5 more
mcp-server model-context-protocol ooxml track-changes word

Summary

AI summary

Added fuzzy text matching normalization, structured revision extraction, DOCX diff generation to tracked changes, enhanced metadata stripping, and Presidio-powered PII redaction.

Full changelog

What's New

Fuzzy text matching (OOXML artifact normalization)

  • Invisible character removal: soft hyphen (U+00AD), zero-width space (U+200B), word-joiner (U+2060) are stripped from both query and document text before matching — these commonly appear in copy-pasted legal text and were silently breaking find/replace
  • Non-standard space normalization: thin space, figure space, hair space, narrow no-break space → regular space
  • ignore_case parameter on insert_text, delete_text, replace_text
  • Multi-run spanning: replace_text and delete_text now handle text that spans multiple <w:r> runs

get_tracked_changes — structured revision extraction

Returns all pending w:ins / w:del tracked changes as structured JSON with type, change_id, author, date, para_id, and text fields.

compare_documents — DOCX diff to tracked changes

Diffs two DOCX files at paragraph level (LCS via difflib) and produces a third document where changes appear as Word tracked revisions (w:ins / w:del), ready for review in Word or LibreOffice.

sanitize_metadata — three-level metadata stripping

  • Level 1: remove rsid session-fingerprint attributes
  • Level 2: anonymize tracked-change authors (w:author)
  • Level 3: clear dc:creator, cp:lastModifiedBy, cp:revision in core.xml; clear Company in app.xml; remove attachedTemplate / rsids from settings.xml

scrub_pii — Presidio-powered PII redaction

  • Detects PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, SSN, IP_ADDRESS, IBAN_CODE, and all other Presidio entity types via spaCy NER (en_core_web_lg)
  • Deduplication pass: every occurrence of a detected entity string is redacted, including instances the NER model missed
  • Redacted runs rendered as black bars (w:highlight val="black") — standard legal redaction appearance in Word
  • dry_run=True mode returns entity list without modifying the document
  • entities=[...] filter for targeted redaction (e.g. email-only)
  • Requires: pip install "docx-mcp-server[pii]" + python -m spacy download en_core_web_lg

Install / Upgrade

pip install --upgrade docx-mcp-server

# For PII scrubbing:
pip install "docx-mcp-server[pii]"
python -m spacy download en_core_web_lg

Full Changelog: https://github.com/SecurityRonin/docx-mcp/compare/v0.3.1...v0.4.0

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track SecurityRonin/docx-mcp

Get notified when new releases ship.

Sign up free

About SecurityRonin/docx-mcp

Read and edit Word (.docx) documents with track changes, comments, footnotes, and structural validation. The only MCP server combining w:ins/w:del tracked changes, threaded comments, and footnotes with OOXML-level paraId validation and document auditing. 18 tools, Python 3.10+.

All releases →

Related context

Earlier breaking changes

  • v0.6.1 Empty `document_handle` resolves to `__default__` slot, maintaining backward compatibility.

Beta — feedback welcome: [email protected]