Textract all image data in one file

0

Hello everyone. After using bulk file upload, is there a way to make a bulk download of the results? Having to open several zip files takes too much time.

Thanks in advance.

preguntada hace un año326 visualizaciones
3 Respuestas
3
Respuesta aceptada

There is no easy method for this method but there is workaround

you can store all output to S3

https://aws.amazon.com/blogs/machine-learning/store-output-in-custom-amazon-s3-bucket-and-encrypt-using-aws-kms-for-multi-page-document-processing-with-amazon-textract/

then

Use AWS Lambda or similar service to combine all the result files into one file. (for example)

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('your_bucket_name')

combined_contents = ""

for obj in bucket.objects.all():
    body = obj.get()['Body'].read()
    combined_contents += body.decode('utf-8') + "\n"  # Assumes text files. You'll need to adjust for other formats.

bucket.put_object(Key='combined_results.txt', Body=combined_contents)

Download the combined file

profile picture
EXPERTO
respondido hace un año
2

To open several zip files you can use command line tools

Windows:

@echo off
for /R "C:\path\to\your\zips" %%I in (*.zip) do (
    "C:\Program Files\7-Zip\7z.exe" x -o"C:\path\to\extract\to" "%%~fI"
)
pause

Linux:

find /path/to/your/zips -name '*.zip' -exec unzip {} -d /path/to/extract/to \;

If you're processing documents in bulk using Amazon Textract and want to store the results for later use, you would typically set up an Amazon S3 bucket to store the documents and the results. When you call Textract to process a document, you can specify the bucket where the document is located, and then store the returned data in another object in the bucket.

To download the results in bulk, you could then download the objects from your S3 bucket. The AWS CLI includes a sync command that can be used to download all objects in a bucket:

aws s3 sync s3://mybucket .
profile picture
EXPERTO
respondido hace un año
0

Thanks for your answer, but is there a way to get all the results in one file/zip? Having 7-8 differents file takes too much time to process.

respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas