Ground Truth Text Format

1

Hello
We are trying to setup labeling for text with ground truth. Is it possible to format the source text to be labeled in some way e.g. html, markdown? A new line would already help a lot. I could not find any documentation on this here https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-input.html

Thanks
Nicolas

gefragt vor 5 Jahren707 Aufrufe
3 Antworten
0

When you post tasks to Ground Truth they are displayed to workers using a web interface so HTML is the best bet for formatting the text you want annotated. For example, you could simply use the following to create a multi-line input:

{"source": "Lorem ipsum <br/>dolor sit amet"}

Unfortunately, by default, inputs are HTML escaped to prevent confusion between your variable text and HTML. As a result if you use the Text Classification widget with the text above will just be displayed to workers as Lorem ipsum <br/>dolor sit amet. To pass those values without escaping them you'll need to create a custom template that includes a filter on the variable to prevent it from being escaped.

To setup a custom task, start by creating Lambdas to handle for the pre and post processing required. Information on setting these up can be found at https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step3.html

I've included simple pre and post Lambda examples below that I find are a good place to start for text annotation. Note that the post Lambda doesn't do any answer consolidation, just simply passes back all of the answers provided by Workers.

Create a custom template in Ground Truth and use the Sentiment Analysis template to create a starter task for text annotation. To prevent it from escaping HTML values, update the Liquid variables to include the skip_autoescape filter.

{{ task.input.text | skip_autoescape }}

You can find more info on using Liquid template values here:
https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html#sms-custom-templates-step2-automate

Pre

def lambda_handler(event, context):
    print(event)
    source = event['dataObject'].get('source')

    if source:
        print("text is {}".format(source))
    else:
        print("Missing text in dataObject")
        return {}
    
    response = {
        "taskInput": {
            "text": source
        }
    }
    print(response)
    return response

Post

import json
import boto3
from urllib.parse import urlparse


def lambda_handler(event, context):
    print(json.dumps(event))

    payload = get_payload(event)
    print(json.dumps(payload))

    consolidated_response = []
    for dataset in payload:
        annotations = dataset['annotations']
        responses = []
        for annotation in annotations:
            response = json.loads(annotation['annotationData']['content'])
            if 'annotatedResult' in response:
                response = response['annotatedResult']

            responses.append({
                'workerId': annotation['workerId'],
                'annotation': response
            })

        consolidated_response.append({
            'datasetObjectId': dataset['datasetObjectId'],
            'consolidatedAnnotation' : {
                'content': {
                    event['labelAttributeName']: {
                        'responses': responses
                    }
                }
            }
        })

    print(json.dumps(consolidated_response))
    return consolidated_response


def get_payload(event):
    if 'payload' in event:
        parsed_url = urlparse(event['payload']['s3Uri'])
        s3 = boto3.client('s3')
        text_file = s3.get_object(Bucket=parsed_url.netloc, Key=parsed_url.path[1:])
        return json.loads(text_file['Body'].read())
    else:
        return event.get('test_payload',[])
beantwortet vor 5 Jahren
0

Ah, great. That was exactly what I was missing. I simply changed the template to {{ task.input.taskObject | skip_autoescape }} and left out the pre-processing lambda.

Edited by: nicolasdoodle on Apr 28, 2019 11:56 PM

beantwortet vor 5 Jahren
0

I'm trying to do the same thing. But couldn't make it work. Did you still use the post-processing lambda? Or did you just change the template.liquid file

Edited by: apoorvsrivastava on Jun 18, 2021 9:28 AM

beantwortet vor 3 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen