UnsupportedDocumentException: Request has unsupported document format

0

I am running into issues with Textract on certain PDF files. I keep getting an unsupported document format error after uploading my PDF file to S3, then calling the AnalyzeDocumentCommand via @aws-sdk/client-textract with Node.js

Is there a certain way Textract expects a PDF to be encoded before uploading to S3? Using the Textract demo on the AWS console UI works perfectly with the same PDF that causes an error when trying to go through the JavaScript SDK.

Riley
asked a year ago1549 views
1 Answer
0

Hi,

Encoded is not required using the SDK. Please find the code example below:

const { TextractClient } = require('@aws-sdk/client-textract');

async function main() {
  const client = new TextractClient({
    region: 'us-east-1',
  });

  const request = {
    Document: {
      S3Object: {
        Bucket: 'my-bucket',
        Name: 'my-file.pdf',
      },
    },
    FeatureTypes: ['TABLES', 'FORMS'],
  };

  try {
    const result = await client.send(new AnalyzeDocumentCommand(request));
    console.log(result);
  } catch (err) {
    console.error(err);
  }
}

main();

This code will send a request to the AWS Textract service to analyze the PDF file stored in the my-bucket S3 bucket with the name my-file.pdf and will return the analysis results to the result variable. The FeatureTypes parameter specifies which types of features to extract from the document, in this case tables and forms. You can find more information about the available feature types in the Textract documentation.

I hope this helps!

AWS
Jady
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions