The documentation for custom tags (https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html#custom-tag) says:
Placing a Custom Tag in Your Text
<mark>
This tag is supported by both neural and standard TTS formats.
To put a custom tag within the text, use the <mark> tag. Amazon Polly takes no action on the tag, but returns the location of the tag in the SSML metadata. This tag can be anything you want to call out, as long as it maintains the following format:
<mark name="tag_name"/>
For example, suppose that the tag name is "animal" and the input text is:
<speak>
Mary had a little <mark name="animal"/>lamb.
</speak>
Amazon Polly might return the following SSML metadata:
{"time":767,"type":"ssml","start":25,"end":46,"value":"animal"}
So the above passage is from official AWS documentation
Yet, when I do this:
public PollyResult synthesizeLongSpeechmarks(
String fullText, String outbucket, String customerId, String documentId, PollyParams params, LambdaLogger logger) {
//Replace image tags with SSML format
// String processedText = replaceImageTagsWithSSML(fullText);
String text = ssmlTextService.getSsmlText(params.getDomain(), fullText, params.getSpeekingRate(), logger);
String destinationBucket = outbucket;
String pollyRegion = System.getenv("-----");
if (!System.getenv("AWS_REGION").equals(pollyRegion)) {
destinationBucket = destinationBucket + "." + pollyRegion;
}
String ssmlString = "<speak> Mary had a little <mark name=\"animal\"/>lamb. </speak>";
StartSpeechSynthesisTaskRequest request = new StartSpeechSynthesisTaskRequest()
.withOutputS3BucketName(destinationBucket)
.withOutputS3KeyPrefix("polly_speechmarks/" + customerId + "." + documentId)
.withOutputFormat(OutputFormat.Json)
.withSpeechMarkTypes(SpeechMarkType.Ssml, SpeechMarkType.Sentence)
.withVoiceId(params.getVoiceId())
.withTextType(TextType.Ssml)
.withSampleRate(params.getSampleRate())
.withSnsTopicArn(System.getenv("--- --"))
.withEngine(params.getEngine())
.withLanguageCode(params.getLanguageCode())
.withText(ssmlString);
try {
StartSpeechSynthesisTaskResult result = pollyClient.startSpeechSynthesisTask(request);
SynthesisTask task = result.getSynthesisTask();
return new PollyResult(true, task.getTaskId(), task.getRequestCharacters());
} catch (TextLengthExceededException e) {
return new PollyResult(false, null, null);
}
}
the speech mark file I get is this one
{"time":0,"type":"sentence","start":8,"end":52,"value":"Mary had a little <mark name="animal"/>lamb."}
{"time":937,"type":"ssml","start":26,"end":47,"value":"animal"}
I was going for
{"time":0,"type":"sentence","start":8,"end":52,"value":"Mary had a little lamb."}
{"time":937,"type":"ssml","start":26,"end":47,"value":"animal"}