Does Amazon Translate glue words together for you too?

1

I am experiencing an issue with Amazon Translate where words in the translated text sometimes get glued together (i.e. spaces between them are dropped). The bug affects some but not all target languages; unfortunately English is among those affected. It correlates with the presence of paired HTML tags in the source text, and is reliably reproducible when the entire multi-word string is surrounded by a pair of HTML tags like <p></p>: it is precisely the first two words after the opening tag that get glued together. The bug affects both realtime and batch translation in any document format. Realtime example:

"Madness? Thisis Gnisis!"

Since in general I don't speak the language I am translating into, and there are typically hundreds such buggy cases per translation job, the output is effectively useless. Is anyone else experiencing this issue? Has anyone managed to overcome it?

Rulatir
asked 2 years ago405 views
2 Answers
2

I reproduced your case, and have submitted a feedback entry on the console.

At the moment I'd suggest you to translate the text without the HTML tag. Which gives you a better result for your use case, like Marrick mentioned above.

Another alternative is running a grammar check after receiving the translated text. Enter image description here

profile pictureAWS
Mia C
answered 2 years ago
profile picture
SUPPORT ENGINEER
reviewed 2 years ago
  • We need to translate large collections of HTML fragments. Nothing short of Amazon actually fixing the bug will work for us.

  • Any progress?

1

Hey! It seems like you've stumbled on a fairly interesting bug. I've managed to reproduce the issue using the CLI, which shows that this is a bug with Translate itself not just the Console:Bug reproduced in AWS CLI

Translate seems to handle other kinds of brackets fine, but angled brackets definitely trip it up a little bit. According the Translate API doucumenation, strings with angled brackets should be supported. However, angle brackets are used to specify do-not-translate tags and so this issue likely lies with the implementation of this feature.

Anyways, it sounds like you may be trying to translate HTML documents specifically? If so, you may find the example code for translating a web page useful: https://docs.aws.amazon.com/translate/latest/dg/examples-web.html. Instead of translating the raw HTML, it uses an HTML parser to only translate the text parts of the page while leaving the tags unchanged.

Hope that helps!

Marrick.

Marrick
answered 2 years ago
  • We have considered the "translate whole pages" approach in the beginning, and decided against it. Our requirement is that we translate CMS content exhaustively, not just some selected partial views of the content (i.e. "pages"). We crawl the CMS database and extract strings into a catalog, then we upload it to localise.biz, and before switching to machine translation we would just hand the localise.biz project to a human translator. Now we still want to have the catalog on localise.biz for human intervention, but we now export XLIFF from there and batch-translate. It used to work; the bug is new.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions