Translating a PDF is one of those tasks that sounds straightforward until you try it. Copy the text into a translation tool, paste it back — and you end up with a wall of plain text with no tables, no headings, and no structure. The formatting that made the original document readable is gone.
This guide explains how to translate a PDF while preserving its layout, what the common failure modes are, and how AI-powered translation handles the edge cases that trip up simpler approaches.
Why PDF Translation Is Harder Than It Looks
The core problem is that PDFs do not separate content from layout. When you extract text from a PDF, you typically get a linear stream of characters — the visual structure (columns, tables, headers, footers) is lost. Feeding that stream into a translation engine and pasting it back into a document gives you translated text but not a translated document.
A proper PDF translation workflow needs to: extract the text while preserving its structural context, translate each text segment while maintaining its relationship to the surrounding layout, and reconstruct the document with the translated text in the correct positions, with the correct formatting.
This is a non-trivial engineering problem, and it is why most free translation tools produce poor results on anything more complex than a simple single-column document.
What "Layout Preservation" Actually Means
When we say a translation tool preserves layout, we mean several specific things. Tables remain as tables — rows, columns, and cell boundaries are maintained in the output. The translated text fills the cells correctly, and the table structure is not flattened into plain text. Headings remain as headings with the same visual hierarchy. A bold 18pt heading in the original becomes a bold 18pt heading in the translation, not a normal paragraph. Multi-column layouts are preserved. Text in a two-column report does not merge into a single column after translation. Images and diagrams remain in place. If the original has a chart on page 3, the translated document has the same chart on page 3. Headers and footers are translated separately and remain in their correct positions.
Handling Text Expansion and Contraction
One of the trickiest aspects of document translation is that different languages use different amounts of space to express the same idea. German text is typically 20–30% longer than its English equivalent. Chinese and Japanese are often significantly shorter. This means that a text box that fits perfectly in the original may overflow in the translation, or leave an awkward gap.
A good translation tool handles this automatically by adjusting font size, line spacing, or text box dimensions to accommodate the translated text without breaking the layout. DocuLens uses AI to make these adjustments intelligently, prioritising readability over pixel-perfect reproduction when the two conflict.
Languages Supported
DocuLens supports translation to and from 50+ languages, including all major European languages (French, German, Spanish, Italian, Portuguese, Dutch, Polish, Swedish, Norwegian, Danish, Finnish), Asian languages (Chinese Simplified, Chinese Traditional, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay), Middle Eastern languages (Arabic, Hebrew, Persian, Turkish), and others including Russian, Ukrainian, Hindi, Bengali, and Swahili.
For right-to-left languages (Arabic, Hebrew, Persian), DocuLens automatically adjusts the document direction in the output file, ensuring the translated text reads correctly without manual adjustment.
Scanned Documents
If your PDF is a scan of a physical document, translation requires an additional step: OCR. DocuLens runs OCR automatically before translation when it detects a scanned document, so you do not need to pre-process your files. The OCR step extracts the text with layout information, which is then passed to the translation engine.
The quality of translation on scanned documents depends on the quality of the scan. Clean, high-resolution scans of printed text produce excellent results. Low-quality scans, skewed pages, or handwritten text may produce lower-quality translations due to OCR errors in the source text.
Output Formats
DocuLens outputs translated documents in the same format as the input where possible. A translated PDF is output as a PDF. A translated DOCX is output as a DOCX. For formats that do not natively support translation (like XLSX), the output is a translated XLSX with the same structure as the original.
If you need the translated text in a different format — for example, you want a translated PDF but you also want the text in a DOCX for editing — you can chain the Translate and Convert actions using DocuLens's pipeline builder (Pro and Business).
Professional Translation vs. AI Translation
AI translation has improved dramatically and is now suitable for most business documents. For legal documents, marketing copy, or anything that will be published externally, human review of the AI translation is still recommended. DocuLens produces a high-quality first draft that a human translator can review and refine in a fraction of the time it would take to translate from scratch.
For internal documents, technical documentation, and informational content, AI translation is typically sufficient without human review.