Rebuilding Microsoft Word’s Layout Engine in Rust+WASM: The Oxi Project at SSIM 0.89 After Six Weeks

TL;DR

Oxi is a Rust + WebAssembly OSS suite that parses, renders, and edits .docx / .xlsx / .pptx / PDF entirely in the browser — no server, no proprietary fonts at runtime, no DLL disassembly. The premise is simple: “LibreOffice breaks Word layouts, Google Docs requires a server, there is an empty seat between them.” Six weeks into the empirical Word-compatibility loop, the engine sits at SSIM mean = 0.8932 over 235 real-world .docx documents (410 pages). This post is the engineering postmortem of how we got here — including the methodology axioms that were empirically falsified along the way.

Clean-room Word compatibility via COM API black-box measurement (no disassembly)
Why the “single SSIM gate” methodology broke and got redesigned into phase-based gates
The GDI integer-rounding cascade: a 0.18pt/char drift becomes 10.8pt over 60 characters
Founding axioms that survived vs. those that were falsified (R30 measurement bug, R33 41-page regression)

Why “Word in the browser”

As of 2026, EU public-sector de-Microsoft-365 has moved from aspiration to policy. The French DINUM directive (2026-04-08) mandates that 2.5M public-sector PCs migrate to free-software stacks by 2027. The Swiss Federal Chancellery officially announced phased M365 reduction on 2026-04-18. Germany’s ZenDiS OpenDesk is already in production at Schleswig-Holstein, Thüringen, Baden-Württemberg, and — after US sanctions blocked Microsoft access — the International Criminal Court.

Every one of these transitions is missing the same piece: a rendering engine that opens existing .docx files indistinguishably from Microsoft Word. Without that, “migration” stops being “switch the app” and becomes “audit every document for visual divergence” — which is what no organization can staff. LibreOffice’s 20-year struggle in public-sector rollouts isn’t a feature problem; it’s a per-document visual-divergence problem that forces every site into a per-file audit.

Oxi’s goal is not “a better migration tool.” It’s the dissolution of the migration problem, via a mechanical convergence loop toward SSIM = 1.0 against Microsoft Word. Internally we call this loop Ra.

Architecture — why Rust + WASM

crates/
  oxi-common/      Shared OOXML utilities (ZIP, XML, relationships)
  oxidocs-core/    .docx engine — parser, IR, layout, font metrics, editor
  oxicells-core/   .xlsx engine — parser, IR, editor
  oxislides-core/  .pptx engine — parser, IR, editor
  oxipdf-core/     PDF 1.7 engine — parser, text extraction, generator
  oxihanko/        Japanese digital stamp (hanko) generator + PAdES signer
  oxi-wasm/        WebAssembly bindings (wasm-bindgen)
web/               Web demo (vanilla JS + Canvas)

The IR (Intermediate Representation) is deliberately format-agnostic:

Document → Page → Block (Paragraph | Table | Image) → Run

LibreOffice treats ODF as native and OOXML as an import, so OOXML degrades on round-trip. Word does the inverse to ODF. Oxi was built from day one so that neither format owns the IR — a prerequisite for the v2 dual-format goal (.docx + .odt as equally first-class citizens).

Why WASM, practically:

Zero server cost — all processing runs client-side. Viable without a SaaS business model
Privacy by construction — documents never leave the device. Essential for legal / medical / government
Binary size — the whole suite is ~1.4 MB compiled .wasm
Memory safety — Word documents can be adversarial inputs (zip-bombs in public-sector .xlsx are a thing). Rust’s safety is a practical, not theoretical, benefit

“100% clean-room” — building compatibility without reading the DLL

Every layout specification in Oxi is derived from exactly two sources:

Published standards — OOXML (ISO/IEC 29500 / ECMA-376), PDF (ISO 32000)
Black-box measurement — observed via the Microsoft Office COM API

A typical measurement script looks like this (Word VBA or pywin32):

y1 = doc.Paragraphs(1).Range.Information(6)  # wdVerticalPositionRelativeToPage
y2 = doc.Paragraphs(2).Range.Information(6)
gap = y2 - y1  # = line_height + spacing (in twips)

The critical rule: never use Format.LineSpacing. It returns the configured value, not the rendered line height. Word composes the actual line height from font metrics, docGrid settings, table-cell context, etc., and the result rarely equals the configured setting. The right answer is always subtract two rendered positions.

Microsoft’s Open Specification Promise covers OOXML implementations against patent assertion. Observing only the COM outputs (never the DLL) keeps the project legally and technically clean-room.

Findings I actually enjoyed

1. GDI’s integer-pixel rounding accumulates to 10.8pt over 60 characters

Word’s text engine is built on GDI — a 30-year-old API that rounds advance widths to integer pixels and computes line height by rounding ascent and descent separately before summing. At Calibri 11pt, a 0.18pt-per-character drift accumulates to 10.8pt over 60 characters. That’s enough to change where lines wrap and where pages break.

The implication: a Word-compatible renderer cannot use modern floating-point metrics. It must reproduce GDI’s integer rounding bug-for-bug. Oxi uses a dual font engine — GDI for .docx, DirectWrite for .odt and PDF export — behind a shared FontEngine trait:

FontEngine trait
├── GdiEngine     — Word-compatible (integer px rounding)
└── DWriteEngine  — Cross-platform (floating-point precision)

2. Word rounds character widths to 10 twips (0.5pt)

Not in the OOXML spec, but COM-confirmed across 13 font/size combinations and 181 characters: Word rounds the computed advance to 10 twips (0.5pt):

width_twips = round(advance × fontSize × 20 / UPM)
width_10tw  = round(width_twips / 10) × 10

Implementing this triggered a cascade of improvements (mid-paragraph page break: +0.041 SSIM; table-cell line_height reset: +0.66) that took the average from 0.7884 to 0.8584 in three weeks.

3. Multiple line spacing accumulates with CEIL, not ROUND

Also undocumented. MS Mincho 10.5pt × 1.15 line spacing nominally computes to 310.5 twips, but Word actually uses 320 twips (16.0pt) — ceiling, not round. And the cumulative position math inherits CEIL through all subsequent lines. COM-confirmed across 8 of 9 measured positions before we shipped the fix.

4. is_fullwidth is broader than OOXML says

Pulling Unicode East_Asian_Width=F/W isn’t enough — Word treats 7 additional blocks (Arrows, Math Operators, Letterlike Symbols, etc.) as fullwidth. Without this fix, “→” gets Western advance width and visually overlaps the adjacent CJK glyph.

5. Information(6) does not return paragraph-start Y

This one was a measurement-side bug and is the core of the R30 incident below. doc.Paragraphs(N).Range.Information(6) returns the active-end position. For paragraphs that span pages, you get “Y on the next page” and the per-paragraph page index is silently off by one. The fix is to collapse the range to zero width first:

y = doc.Range(rng.Start, rng.Start).Information(6)

Founding axioms that got falsified

The Ra loop was originally built on a strong “No Excuses by Design” premise:

No layout difference has a valid excuse
Every value is measurable via COM API
The spec space is finite; measurement results are permanent assets
Fixing one spec gap improves multiple documents simultaneously (convergent structure)
Not “cannot do,” only “not yet done”

Sessions 38-45 (about a month of empirical pressure) falsified three of these.

R21 plateau — the bottom-5 gate stops moving

The initial merge gate required the bottom-5 documents’ SSIM sum to strictly increase. Reasonable on its face, but the bottom-5 each carried different structural problems (table charGrid, in-textbox wrap, vertical writing, …), each of which needed a multi-week refactor before any single document moved. The gate locked, and no PRs could land.

R30 — the measurement API itself had a bug

The Information(6) issue above. The axiom “every value is measurable via COM” assumed the API returns what you think it returns. It does not — for multi-page paragraphs it returns the active-end position. Months of measurement data carried a silent one-page drift. The corollary: “measurement results are permanent assets” is wrong; measurement methodology has to be re-validated periodically, not just the values.

R33 — a “minimal-case spec-correct” fix regressed 41 pages

A spec was derived from a minimal repro, the implementation matched the repro exactly, and the production baseline got worse by 41 pages. The cause: the spec was derived in a single context (a standalone paragraph), but Word actually composes the rendering from font-cascade × szCs × per-context wrap rules. What’s correct in the minimal repro can be wrong in other contexts.

The rule that survived: no EXCEPTION stacking. If a “confirmed” spec needs per-document, per-font, or per-context carve-outs, the spec isn’t incomplete — it’s wrong. Re-derive from a richer input space rather than stacking exceptions, because every exception is a future R33 in the making.

Redesign: phase-based gate

On 2026-04-28 we redesigned the methodology. The core admission: SSIM = 1.0 is an outcome, not a signal. Gating on the outcome directly produces R21-style plateaus, because outcomes don’t decompose into per-PR work. Instead, gate on cause-attributable signals staged by phase:

Phase	Primary gate	Why this signal
1 (current)	Pagination correctness — does Word’s paragraph N land on Oxi’s page N?	One page-break bug can cascade into 47 low-SSIM pages. Fixing the root needs a gate that won’t punish you for it
2	Element IoU mean ≥ 0.99 — bbox IoU per text block / image / cell	Position accuracy (not pixel). The dominant structural error left after pagination is correct
3	SSIM mean ≥ 0.99 + bottom-N floor	Once positions match, residual pixel diff = font hinting / sub-pixel rendering

SSIM is tracked at every phase but is the gate only at Phase 3. During Phase 1 it acts as a regression sentinel (any drop > 0.005 requires review). This unblocks fixes whose payoff is “47 low-SSIM pages all shift simultaneously after the root page-break bug is fixed” — exactly the shape of fix the old gate refused.

Five weeks after the redesign:

Phase 1 pass rate: 25/55 → 46/55 (83.6%)
SSIM mean: 0.8699 → 0.8932 (+0.0233)
Individual PRs are merge-eligible when pagination or IoU moves — SSIM doesn’t have to budge

In this window, root-cause fixes that were unmergeable under the old gate landed as one- or two-line changes: vMerge cell height exclusion, fixed table column width preservation, widow control inheritance, grid-snapped lines extending into the bottom margin.

The verify pipeline (stale-binary incidents)

We made the same mistake on 2026-04-26 and again on 2026-05-07. Oxi has three render paths:

WASM (crates/oxi-wasm/pkg/) — used by the browser editor
Native GDI renderer (tools/oxi-gdi-renderer/) — used by the pagination measurer (Phase 1 gate)
Native DWrite renderer (tools/oxi-dwrite-renderer/) — the default for pipeline.verify

Running cargo build in the gdi-renderer directory does not rebuild the dwrite-renderer crate. We forgot this, ran verify against a stale DWrite binary, and got a false-positive “0 improved / 0 regressed” report — because pre-patch baseline was being compared to pre-patch output. The actual delta turned out to be -0.0911 and we had to revert.

The hygiene is now hard-coded in CLAUDE.md:

Rebuild both native renderers before every verify, and delete oxi_png/ + pagination_oxi/ caches for affected docs
WASM rebuild alone is insufficient — pipeline.verify does not use WASM
“0 improved, 0 regressed” is a red flag. Stale binaries return identical numbers, so 0 deltas are a stale-output signature

Methodology bugs are nastier than spec bugs: if undetected for a few days, they invalidate the entire measurement corpus accumulated during that window. The only working defense is to write project-specific hygiene rules somewhere enforcement can actually reach them — for an AI-driven loop, that’s the CLAUDE.md the agent reads on every session start.

Where this sits in the landscape

Solution	Approach	Limitation
LibreOffice / Collabora	C++ server-side rendering	Breaks Word layouts. Requires server. No pixel-fidelity goal
ZetaOffice	LibreOffice compiled to WASM	100MB+ download. Accuracy = LibreOffice. A port, not a rewrite
ONLYOFFICE	JS canvas rendering	Closest architecture, but AGPL and no COM-measured Word compat
Apryse (PDFTron)	C++ → WASM	Proprietary. Converts to internal format — not native OOXML render
docMentis	Rust+WASM viewer	WASM engine proprietary. Telemetry on by default
Google Docs	Server-rendered	Proprietary. Requires server. Intentionally diverges from Word
docx-rs / rdocx	Rust DOCX libraries	R/W only. No browser layout engine

Oxi’s intersection is “OSS (MIT) + Rust/WASM client-side + dual-format first-class + COM-measured pixel-perfect + zero server cost.” No other project occupies this seat.

Current numbers and roadmap

Parse success: 100% across 504 documents (Japanese government .docx/.xlsx/.pptx + generated). LibreOffice: 99.2% (4 large .xlsx files timed out > 45s)
SSIM mean: 0.8932 (235 docs, 410 pages, GDI baseline)
Phase 1 pass rate: 46/55 (83.6%)
.wasm size: ~1.4 MB for the full suite

Next milestones:

v1.x — .docx SSIM 0.95+, IME (Japanese/CJK input), .xlsx/.pptx layout engines, vertical writing + ruby
v2 — .odt parity. The Ra loop transfers to ODF; the reference renderer changes (LibreOffice headless) but the methodology does not
v2.x — oxi-hyde (TPM 2.0 + ML-KEM outer envelope). .docx.hyde / .odt.hyde are PGP-encrypted-PDF style: encryption is an outer wrapper, decryption restores a plain .docx / .odt openable in any client

Try it

Live demo: https://ryujiyasu.gitlab.io/oxi/
License: MIT
Contributing: every merged PR must improve pixel accuracy of at least one document. That’s the entire acceptance criterion

The methodologically interesting part — and a topic for another post — is that the Ra loop runs autonomously with an AI agent (Claude) in the inner loop: root-cause analysis, COM measurement script generation, fix implementation, verification. The human role is phase-gate review and direction. The falsified axioms above (spec space open-ended, single-context derivation is a trap) are not just layout-engine lessons; they are also a record of where human review must remain when running an AI-driven engineering loop on a hard convergence problem.

If you work on EU public-sector migration, ODF parity, or hardware-anchored document encryption — I’d particularly like to hear from you.

ブラウザで Microsoft Word を pixel 単位再現する Oxi プロジェクト

Rebuilding Microsoft Word’s Layout Engine in Rust+WASM: The Oxi Project at SSIM 0.89 After Six Weeks

TL;DR

Why “Word in the browser”

Architecture — why Rust + WASM

“100% clean-room” — building compatibility without reading the DLL

Findings I actually enjoyed

1. GDI’s integer-pixel rounding accumulates to 10.8pt over 60 characters

2. Word rounds character widths to 10 twips (0.5pt)

3. Multiple line spacing accumulates with CEIL, not ROUND

4. is_fullwidth is broader than OOXML says

5. Information(6) does not return paragraph-start Y

Founding axioms that got falsified

R21 plateau — the bottom-5 gate stops moving

R30 — the measurement API itself had a bug

R33 — a “minimal-case spec-correct” fix regressed 41 pages

Redesign: phase-based gate

The verify pipeline (stale-binary incidents)

Where this sits in the landscape

Current numbers and roadmap

Try it

関連記事

コメント

コメントを残す コメントをキャンセル

投稿をさらに読み込む

Making ‘PQC in a TPM is C-only’ a half-truth — the first Rust-native TPM 2.0 v1.85 PQC (ML-KEM/ML-DSA), byte-checked against C and shipped on crates.io

TPMに、Rustだけでポスト量子を喋らせた ── 世界初のRustネイティブ v1.85 PQC（ML-KEM/ML-DSA）を、Cをオラクルにバイト単位で答え合わせして crates.io に刻むまで

「URLを開くとテレビが映る」を、本当にやる ── 自作ワンセグ復調器#5：Rust復調器をWASMにして、WebUSBでドングルを直結し、ブラウザの中で地デジが映像も音声も鳴るまで

実電波が、映像になった ── 自作ワンセグ復調器#4：壁アンテナでC/Nの壁を越え、逆拡散→RSでMPEG-TSを解き、テレビが映るまで

コメントを残すコメントをキャンセル