TL;DR
Oxi is a Rust + WebAssembly OSS suite that parses, renders, and edits .docx / .xlsx / .pptx / PDF entirely in the browser — no server, no proprietary fonts at runtime, no DLL disassembly. The premise is simple: “LibreOffice breaks Word layouts, Google Docs requires a server, there is an empty seat between them.” Six weeks into the empirical Word-compatibility loop, the engine sits at SSIM mean = 0.8932 over 235 real-world .docx documents (410 pages). This post is the engineering postmortem of how we got here — including the methodology axioms that were empirically falsified along the way.
- Clean-room Word compatibility via COM API black-box measurement (no disassembly)
- Why the “single SSIM gate” methodology broke and got redesigned into phase-based gates
- The GDI integer-rounding cascade: a 0.18pt/char drift becomes 10.8pt over 60 characters
- Founding axioms that survived vs. those that were falsified (R30 measurement bug, R33 41-page regression)
Why “Word in the browser”
As of 2026, EU public-sector de-Microsoft-365 has moved from aspiration to policy. The French DINUM directive (2026-04-08) mandates that 2.5M public-sector PCs migrate to free-software stacks by 2027. The Swiss Federal Chancellery officially announced phased M365 reduction on 2026-04-18. Germany’s ZenDiS OpenDesk is already in production at Schleswig-Holstein, Thüringen, Baden-Württemberg, and — after US sanctions blocked Microsoft access — the International Criminal Court.
Every one of these transitions is missing the same piece: a rendering engine that opens existing .docx files indistinguishably from Microsoft Word. Without that, “migration” stops being “switch the app” and becomes “audit every document for visual divergence” — which is what no organization can staff. LibreOffice’s 20-year struggle in public-sector rollouts isn’t a feature problem; it’s a per-document visual-divergence problem that forces every site into a per-file audit.
Oxi’s goal is not “a better migration tool.” It’s the dissolution of the migration problem, via a mechanical convergence loop toward SSIM = 1.0 against Microsoft Word. Internally we call this loop Ra.
Architecture — why Rust + WASM
crates/ oxi-common/ Shared OOXML utilities (ZIP, XML, relationships) oxidocs-core/ .docx engine — parser, IR, layout, font metrics, editor oxicells-core/ .xlsx engine — parser, IR, editor oxislides-core/ .pptx engine — parser, IR, editor oxipdf-core/ PDF 1.7 engine — parser, text extraction, generator oxihanko/ Japanese digital stamp (hanko) generator + PAdES signer oxi-wasm/ WebAssembly bindings (wasm-bindgen) web/ Web demo (vanilla JS + Canvas)
The IR (Intermediate Representation) is deliberately format-agnostic:
Document → Page → Block (Paragraph | Table | Image) → Run
LibreOffice treats ODF as native and OOXML as an import, so OOXML degrades on round-trip. Word does the inverse to ODF. Oxi was built from day one so that neither format owns the IR — a prerequisite for the v2 dual-format goal (.docx + .odt as equally first-class citizens).
Why WASM, practically:
- Zero server cost — all processing runs client-side. Viable without a SaaS business model
- Privacy by construction — documents never leave the device. Essential for legal / medical / government
- Binary size — the whole suite is ~1.4 MB compiled .wasm
- Memory safety — Word documents can be adversarial inputs (zip-bombs in public-sector .xlsx are a thing). Rust’s safety is a practical, not theoretical, benefit
“100% clean-room” — building compatibility without reading the DLL
Every layout specification in Oxi is derived from exactly two sources:
- Published standards — OOXML (ISO/IEC 29500 / ECMA-376), PDF (ISO 32000)
- Black-box measurement — observed via the Microsoft Office COM API
A typical measurement script looks like this (Word VBA or pywin32):
y1 = doc.Paragraphs(1).Range.Information(6) # wdVerticalPositionRelativeToPage y2 = doc.Paragraphs(2).Range.Information(6) gap = y2 - y1 # = line_height + spacing (in twips)
The critical rule: never use Format.LineSpacing. It returns the configured value, not the rendered line height. Word composes the actual line height from font metrics, docGrid settings, table-cell context, etc., and the result rarely equals the configured setting. The right answer is always subtract two rendered positions.
Microsoft’s Open Specification Promise covers OOXML implementations against patent assertion. Observing only the COM outputs (never the DLL) keeps the project legally and technically clean-room.
Findings I actually enjoyed
1. GDI’s integer-pixel rounding accumulates to 10.8pt over 60 characters
Word’s text engine is built on GDI — a 30-year-old API that rounds advance widths to integer pixels and computes line height by rounding ascent and descent separately before summing. At Calibri 11pt, a 0.18pt-per-character drift accumulates to 10.8pt over 60 characters. That’s enough to change where lines wrap and where pages break.
The implication: a Word-compatible renderer cannot use modern floating-point metrics. It must reproduce GDI’s integer rounding bug-for-bug. Oxi uses a dual font engine — GDI for .docx, DirectWrite for .odt and PDF export — behind a shared FontEngine trait:
FontEngine trait ├── GdiEngine — Word-compatible (integer px rounding) └── DWriteEngine — Cross-platform (floating-point precision)
2. Word rounds character widths to 10 twips (0.5pt)
Not in the OOXML spec, but COM-confirmed across 13 font/size combinations and 181 characters: Word rounds the computed advance to 10 twips (0.5pt):
width_twips = round(advance × fontSize × 20 / UPM) width_10tw = round(width_twips / 10) × 10
Implementing this triggered a cascade of improvements (mid-paragraph page break: +0.041 SSIM; table-cell line_height reset: +0.66) that took the average from 0.7884 to 0.8584 in three weeks.
3. Multiple line spacing accumulates with CEIL, not ROUND
Also undocumented. MS Mincho 10.5pt × 1.15 line spacing nominally computes to 310.5 twips, but Word actually uses 320 twips (16.0pt) — ceiling, not round. And the cumulative position math inherits CEIL through all subsequent lines. COM-confirmed across 8 of 9 measured positions before we shipped the fix.
4. is_fullwidth is broader than OOXML says
Pulling Unicode East_Asian_Width=F/W isn’t enough — Word treats 7 additional blocks (Arrows, Math Operators, Letterlike Symbols, etc.) as fullwidth. Without this fix, “→” gets Western advance width and visually overlaps the adjacent CJK glyph.
5. Information(6) does not return paragraph-start Y
This one was a measurement-side bug and is the core of the R30 incident below. doc.Paragraphs(N).Range.Information(6) returns the active-end position. For paragraphs that span pages, you get “Y on the next page” and the per-paragraph page index is silently off by one. The fix is to collapse the range to zero width first:
y = doc.Range(rng.Start, rng.Start).Information(6)
Founding axioms that got falsified
The Ra loop was originally built on a strong “No Excuses by Design” premise:
- No layout difference has a valid excuse
- Every value is measurable via COM API
- The spec space is finite; measurement results are permanent assets
- Fixing one spec gap improves multiple documents simultaneously (convergent structure)
- Not “cannot do,” only “not yet done”
Sessions 38-45 (about a month of empirical pressure) falsified three of these.
R21 plateau — the bottom-5 gate stops moving
The initial merge gate required the bottom-5 documents’ SSIM sum to strictly increase. Reasonable on its face, but the bottom-5 each carried different structural problems (table charGrid, in-textbox wrap, vertical writing, …), each of which needed a multi-week refactor before any single document moved. The gate locked, and no PRs could land.
R30 — the measurement API itself had a bug
The Information(6) issue above. The axiom “every value is measurable via COM” assumed the API returns what you think it returns. It does not — for multi-page paragraphs it returns the active-end position. Months of measurement data carried a silent one-page drift. The corollary: “measurement results are permanent assets” is wrong; measurement methodology has to be re-validated periodically, not just the values.
R33 — a “minimal-case spec-correct” fix regressed 41 pages
A spec was derived from a minimal repro, the implementation matched the repro exactly, and the production baseline got worse by 41 pages. The cause: the spec was derived in a single context (a standalone paragraph), but Word actually composes the rendering from font-cascade × szCs × per-context wrap rules. What’s correct in the minimal repro can be wrong in other contexts.
The rule that survived: no EXCEPTION stacking. If a “confirmed” spec needs per-document, per-font, or per-context carve-outs, the spec isn’t incomplete — it’s wrong. Re-derive from a richer input space rather than stacking exceptions, because every exception is a future R33 in the making.
Redesign: phase-based gate
On 2026-04-28 we redesigned the methodology. The core admission: SSIM = 1.0 is an outcome, not a signal. Gating on the outcome directly produces R21-style plateaus, because outcomes don’t decompose into per-PR work. Instead, gate on cause-attributable signals staged by phase:
| Phase | Primary gate | Why this signal |
|---|---|---|
| 1 (current) | Pagination correctness — does Word’s paragraph N land on Oxi’s page N? | One page-break bug can cascade into 47 low-SSIM pages. Fixing the root needs a gate that won’t punish you for it |
| 2 | Element IoU mean ≥ 0.99 — bbox IoU per text block / image / cell | Position accuracy (not pixel). The dominant structural error left after pagination is correct |
| 3 | SSIM mean ≥ 0.99 + bottom-N floor | Once positions match, residual pixel diff = font hinting / sub-pixel rendering |
SSIM is tracked at every phase but is the gate only at Phase 3. During Phase 1 it acts as a regression sentinel (any drop > 0.005 requires review). This unblocks fixes whose payoff is “47 low-SSIM pages all shift simultaneously after the root page-break bug is fixed” — exactly the shape of fix the old gate refused.
Five weeks after the redesign:
- Phase 1 pass rate: 25/55 → 46/55 (83.6%)
- SSIM mean: 0.8699 → 0.8932 (+0.0233)
- Individual PRs are merge-eligible when pagination or IoU moves — SSIM doesn’t have to budge
In this window, root-cause fixes that were unmergeable under the old gate landed as one- or two-line changes: vMerge cell height exclusion, fixed table column width preservation, widow control inheritance, grid-snapped lines extending into the bottom margin.
The verify pipeline (stale-binary incidents)
We made the same mistake on 2026-04-26 and again on 2026-05-07. Oxi has three render paths:
- WASM (
crates/oxi-wasm/pkg/) — used by the browser editor - Native GDI renderer (
tools/oxi-gdi-renderer/) — used by the pagination measurer (Phase 1 gate) - Native DWrite renderer (
tools/oxi-dwrite-renderer/) — the default for pipeline.verify
Running cargo build in the gdi-renderer directory does not rebuild the dwrite-renderer crate. We forgot this, ran verify against a stale DWrite binary, and got a false-positive “0 improved / 0 regressed” report — because pre-patch baseline was being compared to pre-patch output. The actual delta turned out to be -0.0911 and we had to revert.
The hygiene is now hard-coded in CLAUDE.md:
- Rebuild both native renderers before every verify, and delete
oxi_png/+pagination_oxi/caches for affected docs - WASM rebuild alone is insufficient — pipeline.verify does not use WASM
- “0 improved, 0 regressed” is a red flag. Stale binaries return identical numbers, so 0 deltas are a stale-output signature
Methodology bugs are nastier than spec bugs: if undetected for a few days, they invalidate the entire measurement corpus accumulated during that window. The only working defense is to write project-specific hygiene rules somewhere enforcement can actually reach them — for an AI-driven loop, that’s the CLAUDE.md the agent reads on every session start.
Where this sits in the landscape
| Solution | Approach | Limitation |
|---|---|---|
| LibreOffice / Collabora | C++ server-side rendering | Breaks Word layouts. Requires server. No pixel-fidelity goal |
| ZetaOffice | LibreOffice compiled to WASM | 100MB+ download. Accuracy = LibreOffice. A port, not a rewrite |
| ONLYOFFICE | JS canvas rendering | Closest architecture, but AGPL and no COM-measured Word compat |
| Apryse (PDFTron) | C++ → WASM | Proprietary. Converts to internal format — not native OOXML render |
| docMentis | Rust+WASM viewer | WASM engine proprietary. Telemetry on by default |
| Google Docs | Server-rendered | Proprietary. Requires server. Intentionally diverges from Word |
| docx-rs / rdocx | Rust DOCX libraries | R/W only. No browser layout engine |
Oxi’s intersection is “OSS (MIT) + Rust/WASM client-side + dual-format first-class + COM-measured pixel-perfect + zero server cost.” No other project occupies this seat.
Current numbers and roadmap
- Parse success: 100% across 504 documents (Japanese government .docx/.xlsx/.pptx + generated). LibreOffice: 99.2% (4 large .xlsx files timed out > 45s)
- SSIM mean: 0.8932 (235 docs, 410 pages, GDI baseline)
- Phase 1 pass rate: 46/55 (83.6%)
- .wasm size: ~1.4 MB for the full suite
Next milestones:
- v1.x — .docx SSIM 0.95+, IME (Japanese/CJK input), .xlsx/.pptx layout engines, vertical writing + ruby
- v2 — .odt parity. The Ra loop transfers to ODF; the reference renderer changes (LibreOffice headless) but the methodology does not
- v2.x — oxi-hyde (TPM 2.0 + ML-KEM outer envelope).
.docx.hyde/.odt.hydeare PGP-encrypted-PDF style: encryption is an outer wrapper, decryption restores a plain .docx / .odt openable in any client
Try it
- Live demo: https://ryujiyasu.gitlab.io/oxi/
- License: MIT
- Contributing: every merged PR must improve pixel accuracy of at least one document. That’s the entire acceptance criterion
The methodologically interesting part — and a topic for another post — is that the Ra loop runs autonomously with an AI agent (Claude) in the inner loop: root-cause analysis, COM measurement script generation, fix implementation, verification. The human role is phase-gate review and direction. The falsified axioms above (spec space open-ended, single-context derivation is a trap) are not just layout-engine lessons; they are also a record of where human review must remain when running an AI-driven engineering loop on a hard convergence problem.
If you work on EU public-sector migration, ODF parity, or hardware-anchored document encryption — I’d particularly like to hear from you.
コメントを残す