- Más nuevo
- Más votos
- Más comentarios
When you post tasks to Ground Truth they are displayed to workers using a web interface so HTML is the best bet for formatting the text you want annotated. For example, you could simply use the following to create a multi-line input:
{"source": "Lorem ipsum <br/>dolor sit amet"}
Unfortunately, by default, inputs are HTML escaped to prevent confusion between your variable text and HTML. As a result if you use the Text Classification widget with the text above will just be displayed to workers as Lorem ipsum <br/>dolor sit amet. To pass those values without escaping them you'll need to create a custom template that includes a filter on the variable to prevent it from being escaped.
To setup a custom task, start by creating Lambdas to handle for the pre and post processing required. Information on setting these up can be found at https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step3.html
I've included simple pre and post Lambda examples below that I find are a good place to start for text annotation. Note that the post Lambda doesn't do any answer consolidation, just simply passes back all of the answers provided by Workers.
Create a custom template in Ground Truth and use the Sentiment Analysis template to create a starter task for text annotation. To prevent it from escaping HTML values, update the Liquid variables to include the skip_autoescape filter.
{{ task.input.text | skip_autoescape }}
You can find more info on using Liquid template values here:
https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html#sms-custom-templates-step2-automate
Pre
def lambda_handler(event, context):
print(event)
source = event['dataObject'].get('source')
if source:
print("text is {}".format(source))
else:
print("Missing text in dataObject")
return {}
response = {
"taskInput": {
"text": source
}
}
print(response)
return response
Post
import json
import boto3
from urllib.parse import urlparse
def lambda_handler(event, context):
print(json.dumps(event))
payload = get_payload(event)
print(json.dumps(payload))
consolidated_response = []
for dataset in payload:
annotations = dataset['annotations']
responses = []
for annotation in annotations:
response = json.loads(annotation['annotationData']['content'])
if 'annotatedResult' in response:
response = response['annotatedResult']
responses.append({
'workerId': annotation['workerId'],
'annotation': response
})
consolidated_response.append({
'datasetObjectId': dataset['datasetObjectId'],
'consolidatedAnnotation' : {
'content': {
event['labelAttributeName']: {
'responses': responses
}
}
}
})
print(json.dumps(consolidated_response))
return consolidated_response
def get_payload(event):
if 'payload' in event:
parsed_url = urlparse(event['payload']['s3Uri'])
s3 = boto3.client('s3')
text_file = s3.get_object(Bucket=parsed_url.netloc, Key=parsed_url.path[1:])
return json.loads(text_file['Body'].read())
else:
return event.get('test_payload',[])
Ah, great. That was exactly what I was missing. I simply changed the template to
{{ task.input.taskObject | skip_autoescape }}
and left out the pre-processing lambda.
Edited by: nicolasdoodle on Apr 28, 2019 11:56 PM
I'm trying to do the same thing. But couldn't make it work. Did you still use the post-processing lambda? Or did you just change the template.liquid file
Edited by: apoorvsrivastava on Jun 18, 2021 9:28 AM
Contenido relevante
- OFICIAL DE AWSActualizada hace 3 años
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace 2 años