Ground Truth Text Format


We are trying to setup labeling for text with ground truth. Is it possible to format the source text to be labeled in some way e.g. html, markdown? A new line would already help a lot. I could not find any documentation on this here


asked 4 years ago127 views
3 Answers

When you post tasks to Ground Truth they are displayed to workers using a web interface so HTML is the best bet for formatting the text you want annotated. For example, you could simply use the following to create a multi-line input:

{"source": "Lorem ipsum <br/>dolor sit amet"}

Unfortunately, by default, inputs are HTML escaped to prevent confusion between your variable text and HTML. As a result if you use the Text Classification widget with the text above will just be displayed to workers as Lorem ipsum <br/>dolor sit amet. To pass those values without escaping them you'll need to create a custom template that includes a filter on the variable to prevent it from being escaped.

To setup a custom task, start by creating Lambdas to handle for the pre and post processing required. Information on setting these up can be found at

I've included simple pre and post Lambda examples below that I find are a good place to start for text annotation. Note that the post Lambda doesn't do any answer consolidation, just simply passes back all of the answers provided by Workers.

Create a custom template in Ground Truth and use the Sentiment Analysis template to create a starter task for text annotation. To prevent it from escaping HTML values, update the Liquid variables to include the skip_autoescape filter.

{{ task.input.text | skip_autoescape }}

You can find more info on using Liquid template values here:


def lambda_handler(event, context):
    source = event['dataObject'].get('source')

    if source:
        print("text is {}".format(source))
        print("Missing text in dataObject")
        return {}
    response = {
        "taskInput": {
            "text": source
    return response


import json
import boto3
from urllib.parse import urlparse

def lambda_handler(event, context):

    payload = get_payload(event)

    consolidated_response = []
    for dataset in payload:
        annotations = dataset['annotations']
        responses = []
        for annotation in annotations:
            response = json.loads(annotation['annotationData']['content'])
            if 'annotatedResult' in response:
                response = response['annotatedResult']

                'workerId': annotation['workerId'],
                'annotation': response

            'datasetObjectId': dataset['datasetObjectId'],
            'consolidatedAnnotation' : {
                'content': {
                    event['labelAttributeName']: {
                        'responses': responses

    return consolidated_response

def get_payload(event):
    if 'payload' in event:
        parsed_url = urlparse(event['payload']['s3Uri'])
        s3 = boto3.client('s3')
        text_file = s3.get_object(Bucket=parsed_url.netloc, Key=parsed_url.path[1:])
        return json.loads(text_file['Body'].read())
        return event.get('test_payload',[])
answered 4 years ago

Ah, great. That was exactly what I was missing. I simply changed the template to {{ task.input.taskObject | skip_autoescape }} and left out the pre-processing lambda.

Edited by: nicolasdoodle on Apr 28, 2019 11:56 PM

answered 4 years ago

I'm trying to do the same thing. But couldn't make it work. Did you still use the post-processing lambda? Or did you just change the template.liquid file

Edited by: apoorvsrivastava on Jun 18, 2021 9:28 AM

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions