Behind the Scenes

Battle of the Bots: Which AI Translator Is Best? We put three to the test

Battle of the Bots: Which AI Translator Is Best? We put three to the test

As AI translation tools spread into classrooms and newsrooms, we decided to put a few of them to the test. We asked three leading language models – ChatGPT, Claude, and Google Gemini – to translate the same technical tutorial on web typography into  Russian and Chinese.

The results reveal both significant progress and persistent challenges in how machines process language. Let’s take a look.

How We Ran the Test

To keep the process fair, we used identical prompts.

For Russian:

Translate the following article from English to Russian. Keep the same conversational tone as in the original text. For any HTML or CSS code, translate only the commented text between /* and */ and leave all code itself in English. Do not add your own explanations or notes — only translate the text. Keep formatting and paragraph breaks exactly as in the original.

For Chinese:

Translate the following article from English to Simplified Chinese. Keep the same conversational tone as in the original text. For any HTML or CSS code, translate only the commented text between /* and */ and leave all code itself in English. Do not add your own explanations or notes — only translate the text. Keep formatting and paragraph breaks exactly as in the original.

Next, native speakers from our team were asked to choose the best translation out of the three and analyze the errors.

Russian: ChatGPT leads

In Russian, ChatGPT was determined to be the most accurate  of the three translation systems, delivering  the most natural draft. Our  native-language reviewer said it sounded akin  to something a human journalist might write. Sentences flowed well, paragraph breaks felt logical, and the tone matched the friendly tutorial style of the original CSS article.

DON’T MISS  The future of machine learning in journalism

It also handled formatting correctly, translating only the commented text inside the code blocks and leaving the code itself untouched, exactly as requested in the prompt. 

This combination of fluent phrasing and careful formatting meant that only light editing was required before publication.

What went wrong?

Still, ChatGPT’s draft was not flawless. Our  editor noted the following  machine translation tells:

  1. Overuse of long dashes instead of standard punctuation.
  2. Awkward or literal sentence structures when dealing with idiomatic phrases.
  3. Occasional clumsy introductions or missing transitional words that make Russian writing feel natural.

These issues didn’t distort meaning, but they required human revision to make the final article fully polished and ready for  readers.

In Chinese, ChatGPT also Took the lead, though with similar errors

The results in Simplified Chinese mirrored the Russian experience. ChatGPT again stood out for natural flow and correct formatting. It retained  the conversational tone of the English original version, while translating only the commented code text. Native Chinese editors said it required only light polishing on a few technical terms.

Where ChatGPT Fell Short in Chinese

There were still some minor problems, reviewers noted a few literal translations of idiomatic expressions and a few overly formal structures that seemed unusual in casual Mandarin. These issues were easy to fix.

But what about Claude and Gemini?

In both Russian and Chinese, Claude and Gemini consistently lagged behind ChatGPT. Our editors described their drafts as mechanical and stiff, with sentences that felt translated word-for-word rather than written for a human reader.

Claude in particular tended to mirror the English syntax, which resulted in awkward phrasing that broke the natural flow of both Russian and Chinese prose. Gemini showed a similar weakness and sometimes failed to keep code comments separate from code, creating formatting confusion, an especially serious  issue for a step-by-step CSS tutorial.

DON’T MISS  Finding humanity: How Josh Neufeld uses comics journalism to form empathic connections

While the overall meaning of these translations was understandable, native-speaking reviewers agreed that neither system delivered the smooth, journalistic style needed for publication without heavy editing.

What We Learned

when the appropriate prompt is utilized. By using a thoughtfully designed prompt, users can obtain a translation that (almost) seamlessly maintains meaning, structure, and style. 

Among the evaluated tools, ChatGPT performed head and shoulders above its competition, producing the most natural and readable Russian and Chinese draft versions requiring only light edits. Claude and Gemini were useful for basic understanding of the text, but their translations were less fluid and sometimes clumsy. If the goal is a quick sense of the content, those models can help. But for high-quality translation, we can confidently say that ChatGPT remains the most reliable choice based on this experiment.

Why It Matters

AI translation technologies are rapidly becoming integrated  into the operations of newsrooms. When leveraged  properly, they can conserve human labor while ensuring accuracy. Nonetheless, they need careful oversight, including choosing suitable prompts, human evaluation, and labeling.

Our takeaway? AI translation isn’t magic—it’s a powerful tool that demands skill to wield effectively. Learn when to trust it and when to override it, and you’ll unlock something invaluable: more time to focus on the stories, insights, and connections that algorithms can’t create.

Miss our original tutorial last week? Check it out in your language of choice below! 

English version
Russian version
Chinese version

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest from Storybench

Keep up with tutorials, behind-the-scenes interviews and more.