Tools

We Built a Document Translator That Doesn't Break Your Formatting (And Why That Took Longer Than Expected)

Dmitriy Hulak
Dmitriy Hulak
18 min read0 views

We Built a Document Translator That Doesn't Break Your Formatting (And Why That Took Longer Than Expected)

There is a particular kind of frustration that everyone who works with documents knows intimately. You spend two hours polishing a proposal. The headings are aligned. The tables look clean. The spacing between sections feels balanced. Then you run it through a translation service, download the result, and suddenly every careful decision you made about structure is just gone. Headers shift. Table cells collapse. Images disappear or land in strange places. Line breaks multiply for no reason. The translated document technically contains your words in another language, but it no longer feels like your document.

That frustration is not new, and it is not small. It is the kind of thing people learn to accept because most of the alternatives are worse. You could hire a professional translator and wait days. You could copy-paste everything into Google Translate and rebuild the layout manually afterward. You could pay for expensive enterprise tools that promise to preserve formatting but still produce weird artifacts half the time. None of these options feel good, so people just pick the least bad one and move on.

We decided that was not good enough. Not because we are obsessed with perfection, but because we kept running into this problem ourselves. CSS-Zone is a tool for people who care about details. Our users are designers, developers, and technical writers who notice when a shadow radius is off by two pixels. They definitely notice when a translated document comes back looking like someone threw it through a text blender. If we were going to add a translation tool to CSS-Zone, it had to actually respect the work people already put into their documents.

The technical challenge here is more subtle than it sounds. Translation is not just about swapping words between languages. It is about maintaining the invisible structure that makes a document readable. That structure lives in a strange mix of XML, styles, relationships, embedded images, header definitions, footer rules, and dozens of other pieces that Microsoft Word decided to wire together in ways that are only half-documented. When you parse a DOCX file, you are not just reading text. You are reading a small filesystem compressed into a ZIP archive, where the actual content is scattered across multiple XML files that reference each other through opaque identifiers.

The naive approach to document translation is to extract all the text, send it somewhere, and dump the result back in. That works if you only care about getting translated words on screen. It fails the moment you care about anything else. Bold text becomes plain. Italics vanish. Tables lose their borders. Lists turn into flat paragraphs. Headers and footers disappear entirely. The output is technically a document, but it is not the same document. It is a ghost of it.

Dmytro Hulak
Dmytro Hulak
Founder & CEO of CSS-Zone

"Most translation tools treat formatting as optional decoration. We treat it as the structure that makes documents usable in the first place."

That quote is not dramatic for effect. It is the core design principle we kept coming back to during development. Formatting is not decoration. It is information. When you bold a word in a contract, you are not making it pretty. You are signaling importance. When you use a numbered list in a technical guide, you are not being aesthetic. You are creating a sequence that readers depend on to follow steps correctly. When you align a table in a financial report, you are making numbers comparable at a glance. All of that meaning lives in the formatting, and if you lose it during translation, you are not just losing style. You are losing clarity.

The first version of our document translator was embarrassingly simple. We used jszip to unpack the DOCX archive, found the main document XML, pulled out the text nodes, sent them to an AI model for translation, and wrote them back. It worked in the sense that it produced output. It failed in every other sense. Paragraphs merged into each other. Styles disappeared. Images turned into broken placeholder text. We knew this was going to happen. We still had to build it and watch it fail to understand exactly where the complexity lived.

The real problem was not extracting text. The real problem was preserving the invisible relationships between content and structure. A DOCX file does not store formatting inline with text the way HTML does. It stores styles in a separate definitions file and references them through IDs. If you want to keep a paragraph bold after translation, you cannot just remember that it was bold. You have to remember which style ID made it bold, preserve that ID through the translation process, ensure the style definition still exists in the output file, and reconnect everything correctly. The same logic applies to tables, images, headers, footers, hyperlinks, bookmarks, and every other piece of structured content that Word supports.

Images were particularly annoying. When you drop an image into a Word document, Word does not just embed it inline. It creates a separate image file inside the DOCX archive, generates a unique relationship ID, references that ID in the main document XML, stores dimension and position metadata, and sometimes splits the image across multiple files if it appears in headers or footers. Translating the surrounding text is easy. Making sure the image stays in the right place, with the right size, in the right section, without breaking any of those internal references? That took three separate rewrites before we got it stable.

Tables were worse. Word tables are not simple grids. They have merged cells, nested tables, conditional borders, cell-specific padding, alignment rules that depend on paragraph styles, and width calculations that reference the page margins. When you translate the text inside a table cell, the text length changes. Different languages have different average word lengths. A three-word English heading might become a seven-word German heading. If you do not adjust the column widths intelligently, the table either overflows the page or looks absurdly stretched. We ended up building a small layout recalculation engine that estimates translated text length and adjusts spacing accordingly. It is not perfect, but it is good enough that most tables still look reasonable after translation.

Headers and footers are conceptually simple but technically fiddly. They live in separate XML files that reference the main document through relationship mappings. If you translate the text in a header but forget to update the relationship file, Word will either show nothing or throw a cryptic error when you try to open the document. We spent an embarrassing amount of time debugging cases where the translated document looked fine in our preview but failed to open in Word because we missed one obscure relationship link buried in a file called _rels/document.xml.rels. The DOCX spec is hundreds of pages long, and most of it is not about the happy path. It is about edge cases, legacy compatibility, and obscure features that nobody uses but still have to work.

The AI part of the translation was almost easy compared to all that. We use Gemini for the actual translation work. The prompt is straightforward: preserve all HTML tags, maintain formatting, translate only the text content, keep the same tone. Gemini is good at this. It understands that <strong>important</strong> should become <strong>важливо</strong> in Ukrainian, not важливо without the tags. It respects line breaks, handles technical terms carefully, and does not try to rewrite sentences into something more "natural" that changes the meaning. The quality is comparable to professional translation for most business documents, and it is instant instead of taking days.

We added language detection so people do not have to manually specify the source language. That was less about convenience and more about reducing errors. If someone uploads a Russian document but accidentally selects Ukrainian as the source, the translation will be garbage. Auto-detection is not perfect, but it catches most obvious mistakes. For edge cases where the language is ambiguous or mixed, users can still override the detection manually.

The quota system is intentionally generous for free users because we think this tool is genuinely useful for people who do not have translation budgets. Five documents per day is enough for casual use. If you need more, the Pro plan gives unlimited translations, which makes sense for professional workflows where you might be translating dozens of contracts, guides, or reports per week. We are not trying to gatekeep functionality. We just need some reasonable limit to prevent abuse and keep server costs manageable.

One design decision we spent a lot of time debating was whether to support other document formats beyond DOCX. PDFs are the obvious candidate. Everyone has PDFs. But PDF is a fundamentally hostile format for editing. It is designed for display, not for structure. Extracting text from a PDF is possible, but preserving layout is borderline impossible unless you are willing to invest months into a full PDF rendering engine. Even then, the results are inconsistent because PDFs can be generated in a thousand different ways, and there is no single standard for how content should be encoded. We decided DOCX was the right starting point because it is editable, widely used in professional contexts, and has a (mostly) documented structure. If enough people ask for PDF support, we will revisit it, but for now, DOCX covers the majority of real use cases.

Another thing we deliberately did not do: machine learning for layout optimization. A lot of translation tools use ML models to predict optimal column widths, paragraph spacing, and image placement based on the translated content. That sounds clever, but in practice it often produces layouts that look algorithmically weird. Humans have strong intuitions about what looks balanced, and ML predictions do not always match those intuitions. We decided to use simpler heuristics that are predictable and easy to understand. If a column needs to be wider because the translated text is longer, we make it wider proportionally. If a paragraph needs more vertical space, we add space based on line count. It is not fancy, but it is reliable, and users can fix edge cases manually if needed.

The hardest part of building this feature was not the technical complexity. It was resisting the temptation to add fifty more options and settings. We could have added controls for translation style, formality level, regional language variants, custom glossaries, batch processing, version history, collaborative review workflows, and a dozen other features that some subset of users would love. But every feature adds cognitive load. Every option is a decision users have to make before they get value. We stripped the interface down to the absolute minimum: upload a document, pick a target language, translate. Advanced users can adjust source language detection or toggle formatting preservation if they want, but the default flow is just three clicks. That restraint was painful but necessary.

The result is a tool that does one thing well instead of twenty things adequately. You upload a DOCX document. It gets translated with formatting intact. You download it and keep working. The quality is good enough that most people can use the translated document immediately without manual cleanup. That is the standard we aimed for, and after several months of iteration, we actually hit it. Not perfectly, but consistently enough that we are comfortable shipping it publicly.

There are still edge cases. Complex nested tables sometimes have minor alignment issues. Custom fonts occasionally fall back to defaults. Documents with heavy macro usage might lose some dynamic behavior. We document these limitations clearly because we would rather be honest about what the tool can and cannot do than overpromise and disappoint people. For the vast majority of business documents, technical guides, reports, and proposals, the translator works as expected. That is enough.

One unexpected benefit of building this tool was how much it forced us to improve our infrastructure. Translation requires server-side processing, which meant we had to build a proper job queue, implement rate limiting, add better error handling, and scale our API layer to handle longer-running requests. Those improvements ripple out to every other part of CSS-Zone. The gradient generator is faster. The shadow previewer is more stable. The entire platform feels more solid because we invested in the unglamorous backend work that translation required.

We also learned a lot about what real users actually need versus what we assumed they needed. Early on, we thought people would want detailed translation logs showing exactly which phrases were translated and how. We built that feature. Almost nobody used it. What people actually wanted was a simple progress bar and a clear indication of when the download was ready. The lesson there is not that users are unsophisticated. It is that most people just want their work done efficiently without having to think about the machinery underneath. The best tools are invisible until you need them to be visible.

The document translator is live now at css-zone.com/document-translator. It is available in twelve languages: English, Ukrainian, Russian, German, French, Spanish, Italian, Polish, Portuguese, Japanese, Chinese, and Korean. The interface is fully localized, so Ukrainian users see Ukrainian text and English users see English text. That small detail matters more than it might seem. Translation is already cognitively demanding. Making users navigate an unfamiliar interface in a foreign language while trying to translate important documents is just cruel. We localized everything because that is what respectful software does.

This is not the most glamorous feature we have ever built. It is not going to trend on social media or win design awards. But it solves a real problem for real people, and it does so without forcing them to sacrifice the quality they already invested in their documents. That is the kind of work that feels worth doing, even when it takes longer than expected and involves far more XML parsing than anyone wanted to deal with. We built it because we needed it ourselves, and because we think other people who care about details will appreciate having one less frustrating tool in their workflow.

If you translate documents regularly and you are tired of layout destruction being the price of multilingual work, give it a try. It is free for casual use and unlimited for Pro subscribers. We think it is good. If you find edge cases or ways to break it, let us know. We will fix them. That is how tools get better.

Related posts

Continue reading on nearby topics.

WCAG in the Messy Middle of Real Frontend: What Teams Miss After the AuditA long practical editorial on WCAG in real products: not theory, but the small interface decisions that quietly break accessibility after sprint pressure, rewrites, and content growth.Responsive CSS Adaptation in Real Projects: Not a Checklist, but a Working HabitA deep practical article about responsive CSS as a product habit: fluid typography, content-first breakpoints, resilient layout systems, and real code patterns that survive growth.Why Solving Frontend Tasks Regularly Matters More Than Watching TutorialsA practical and honest look at why frontend tasks build interview confidence, execution speed, and real engineering thinking better than passive learning.Why Certificates Still Matter in 2026: Real Career Value and Why the CSS-Zone Certificate WorksA practical and human explanation of how certificates influence hiring, confidence, and career growth. Also why the CSS-Zone certificate is more than just a PDF and how to present it correctly in your portfolio and LinkedIn.

Comments

0

Sign in to leave a comment.

No comments yet. Be the first.