Does Amazon Translate glue words together for you too?

1

I am experiencing an issue with Amazon Translate where words in the translated text sometimes get glued together (i.e. spaces between them are dropped). The bug affects some but not all target languages; unfortunately English is among those affected. It correlates with the presence of paired HTML tags in the source text, and is reliably reproducible when the entire multi-word string is surrounded by a pair of HTML tags like <p></p>: it is precisely the first two words after the opening tag that get glued together. The bug affects both realtime and batch translation in any document format. Realtime example:

"Madness? Thisis Gnisis!"

Since in general I don't speak the language I am translating into, and there are typically hundreds such buggy cases per translation job, the output is effectively useless. Is anyone else experiencing this issue? Has anyone managed to overcome it?

Rulatir
질문됨 2년 전420회 조회
2개 답변
2

I reproduced your case, and have submitted a feedback entry on the console.

At the moment I'd suggest you to translate the text without the HTML tag. Which gives you a better result for your use case, like Marrick mentioned above.

Another alternative is running a grammar check after receiving the translated text. Enter image description here

profile pictureAWS
Mia C
답변함 2년 전
profile picture
지원 엔지니어
검토됨 2년 전
  • We need to translate large collections of HTML fragments. Nothing short of Amazon actually fixing the bug will work for us.

  • Any progress?

1

Hey! It seems like you've stumbled on a fairly interesting bug. I've managed to reproduce the issue using the CLI, which shows that this is a bug with Translate itself not just the Console:Bug reproduced in AWS CLI

Translate seems to handle other kinds of brackets fine, but angled brackets definitely trip it up a little bit. According the Translate API doucumenation, strings with angled brackets should be supported. However, angle brackets are used to specify do-not-translate tags and so this issue likely lies with the implementation of this feature.

Anyways, it sounds like you may be trying to translate HTML documents specifically? If so, you may find the example code for translating a web page useful: https://docs.aws.amazon.com/translate/latest/dg/examples-web.html. Instead of translating the raw HTML, it uses an HTML parser to only translate the text parts of the page while leaving the tags unchanged.

Hope that helps!

Marrick.

Marrick
답변함 2년 전
  • We have considered the "translate whole pages" approach in the beginning, and decided against it. Our requirement is that we translate CMS content exhaustively, not just some selected partial views of the content (i.e. "pages"). We crawl the CMS database and extract strings into a catalog, then we upload it to localise.biz, and before switching to machine translation we would just hand the localise.biz project to a human translator. Now we still want to have the catalog on localise.biz for human intervention, but we now export XLIFF from there and batch-translate. It used to work; the bug is new.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠