Ground Truth Text Format


We are trying to setup labeling for text with ground truth. Is it possible to format the source text to be labeled in some way e.g. html, markdown? A new line would already help a lot. I could not find any documentation on this here


已提问 5 年前701 查看次数
3 回答

When you post tasks to Ground Truth they are displayed to workers using a web interface so HTML is the best bet for formatting the text you want annotated. For example, you could simply use the following to create a multi-line input:

{"source": "Lorem ipsum <br/>dolor sit amet"}

Unfortunately, by default, inputs are HTML escaped to prevent confusion between your variable text and HTML. As a result if you use the Text Classification widget with the text above will just be displayed to workers as Lorem ipsum <br/>dolor sit amet. To pass those values without escaping them you'll need to create a custom template that includes a filter on the variable to prevent it from being escaped.

To setup a custom task, start by creating Lambdas to handle for the pre and post processing required. Information on setting these up can be found at

I've included simple pre and post Lambda examples below that I find are a good place to start for text annotation. Note that the post Lambda doesn't do any answer consolidation, just simply passes back all of the answers provided by Workers.

Create a custom template in Ground Truth and use the Sentiment Analysis template to create a starter task for text annotation. To prevent it from escaping HTML values, update the Liquid variables to include the skip_autoescape filter.

{{ task.input.text | skip_autoescape }}

You can find more info on using Liquid template values here:


def lambda_handler(event, context):
    source = event['dataObject'].get('source')

    if source:
        print("text is {}".format(source))
        print("Missing text in dataObject")
        return {}
    response = {
        "taskInput": {
            "text": source
    return response


import json
import boto3
from urllib.parse import urlparse

def lambda_handler(event, context):

    payload = get_payload(event)

    consolidated_response = []
    for dataset in payload:
        annotations = dataset['annotations']
        responses = []
        for annotation in annotations:
            response = json.loads(annotation['annotationData']['content'])
            if 'annotatedResult' in response:
                response = response['annotatedResult']

                'workerId': annotation['workerId'],
                'annotation': response

            'datasetObjectId': dataset['datasetObjectId'],
            'consolidatedAnnotation' : {
                'content': {
                    event['labelAttributeName']: {
                        'responses': responses

    return consolidated_response

def get_payload(event):
    if 'payload' in event:
        parsed_url = urlparse(event['payload']['s3Uri'])
        s3 = boto3.client('s3')
        text_file = s3.get_object(Bucket=parsed_url.netloc, Key=parsed_url.path[1:])
        return json.loads(text_file['Body'].read())
        return event.get('test_payload',[])
已回答 5 年前

Ah, great. That was exactly what I was missing. I simply changed the template to {{ task.input.taskObject | skip_autoescape }} and left out the pre-processing lambda.

Edited by: nicolasdoodle on Apr 28, 2019 11:56 PM

已回答 5 年前

I'm trying to do the same thing. But couldn't make it work. Did you still use the post-processing lambda? Or did you just change the template.liquid file

Edited by: apoorvsrivastava on Jun 18, 2021 9:28 AM

已回答 3 年前

您未登录。 登录 发布回答。

