@heynavtoor: Rongxin Ouyang solved the one problem every researcher outside the English-speaking world has been silently suffering f…
Summary
PDFMathTranslate is an open-source tool that translates scientific PDFs while preserving math formulas, charts, tables, and layout, accepted at EMNLP 2025 and freely available under MIT license.
View Cached Full Text
Cached at: 06/22/26, 05:49 PM
Rongxin Ouyang solved the one problem every researcher outside the English-speaking world has been silently suffering from.
It’s called PDFMathTranslate. It translates entire scientific papers while keeping every math formula, every chart, every table, and every layout element perfectly intact.
No copy-pasting into Google Translate. No losing your equations. No reformatting 47 pages by hand.
Think of it as Google Translate meets LaTeX but it actually works on real PDFs.
Here’s what this thing does:
→ Drop in any PDF. Choose your language. Hit translate. → Math formulas, charts, tables, footnotes. All preserved. → Bilingual PDF output. Original and translation side by side. → Google Translate, DeepL, OpenAI, Ollama. Your choice. → Run locally. Your papers never leave your machine. → GUI, CLI, Docker, Zotero plugin, MCP server.
Here’s the wildest part:
Most PDF translators destroy the layout. Formulas become garbled text. Tables lose their structure. Charts disappear. You spend more time fixing the translation than reading it.
PDFMathTranslate uses AI layout detection to understand where every element sits on the page, translates only the text, and reconstructs the entire document with the original formatting intact.
This was accepted at EMNLP 2025, one of the top NLP conferences in the world. This is not a side project. This is peer-reviewed research turned into free software.
222,000+ downloads. 49,000+ Docker pulls. Topped GitHub’s global trending for over a week.
The researchers who built this are from NUS Singapore and Tsinghua University. They open sourced it because language barriers should not stop science.
Professional PDF translation services charge $0.10 to $0.25 per word. A 30-page research paper has roughly 10,000 words. That is $1,000 to $2,500 per paper.
PDFMathTranslate does it in minutes. For $0.
This is the tool every non-English-speaking researcher has been waiting 20 years for.
It runs on your laptop. One command to install.
100% Open Source. MIT License.
Similar Articles
PDFMathTranslate: Scientific Document Translation Preserving Layouts
This paper introduces PDFMathTranslate, an open-source tool for translating scientific documents while preserving their original layout, leveraging large language models and precise layout detection.
Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4
arXiv reports on its ongoing HTML Papers project, highlighting improved conversion fidelity, corpus-scale HTML conversion reaching 75% error-free rate, initial MathML 4 Intent annotations for accessible speech, and a Rust port of LaTeXML to reduce costs.
@atomic_chat_hq: Mistral OCR 4 turned a handwritten calculus exam into clean LaTeX! We gave it a photo of a hand-written exam page. The …
Mistral OCR 4 converts handwritten calculus exams into clean LaTeX, accurately reading formulas and accounting for graphs, though it does not redraw them. The model provides structured output with bounding boxes and confidence scores in 170 languages.
@amiaoapp: Tired of slogging through foreign literature and tech blogs? Every time you copy a PDF paragraph into a web translator, you get gibberish line breaks and messy formatting. Have to manually delete spaces and newlines line by line? Infuriatingly slow. Ditch this archaic copy-paste nightmare and send it to the shredder! Recommend a Git…
Recommend an open-source real-time translation tool, CopyTranslator. It supports cross-platform, automatically handles line-break garbled text from PDF copying, enabling copy-to-translate and boosting foreign literature reading efficiency.
ForMaT: Dataset for Visually-Grounded Multilingual PDF Translation
This paper introduces ForMaT, a parallel corpus of 3,956 PDFs across 15 language pairs designed for visually-grounded multilingual translation, preserving layout metadata to benchmark layout-aware MT systems.