내용으로 건너뛰기

How I sync more than 50 MB size document in AWS Kendra?

0

Hi, I am unable to upload 50 MB size document like pdf and images in AWS Kendra using AEM as a data source connector. Can you please help me on this as soon as possible

질문됨 일 년 전503회 조회
2개 답변
1

One approach is to split large documents into smaller chunks that meet the size requirements of AWS Kendra. This involves processing the documents to divide them into parts, each smaller than 50 MB. Here's how you can achieve this:

Document Pre-processing: Before uploading documents to AWS Kendra, use a pre-processing script to split large PDFs or images into smaller chunks.

For PDFs, you can use libraries like PyPDF2 or pdfsplit to split the PDF into smaller parts. For images, you can use image processing libraries like Pillow to split or compress the images.

python code: from PyPDF2 import PdfFileReader, PdfFileWriter

def split_pdf(input_pdf_path, output_dir, chunk_size): pdf_reader = PdfFileReader(input_pdf_path) total_pages = pdf_reader.getNumPages()

for i in range(0, total_pages, chunk_size):
    pdf_writer = PdfFileWriter()
    for j in range(i, min(i + chunk_size, total_pages)):
        pdf_writer.addPage(pdf_reader.getPage(j))

    output_pdf_path = f"{output_dir}/chunk_{i // chunk_size + 1}.pdf"
    with open(output_pdf_path, 'wb') as output_pdf:
        pdf_writer.write(output_pdf)

Example usage

split_pdf("large_document.pdf", "output_chunks", 10) # Split into chunks of 10 pages each

If you believe your use case justifies it, contact AWS Support to request an increase in the document size limit. However, note that AWS Kendra's default limits are typically set for performance and reliability reasons, and increasing these limits might not always be feasible or advisable.

전문가
답변함 일 년 전
전문가
검토됨 일 년 전
전문가
검토됨 일 년 전
0

Hi,

The max limit of 50 MBs is in the adjustable quotas of Kendra: see https://docs.aws.amazon.com/kendra/latest/dg/quotas.html

So, it means that you can open (via AWS console) a request to support to extend this limit to a bigger value after explaining you use case.

As an interim solution, you can use a Lambda with PreExtraction hook to reduce the size of the content, which is too big.

See https://docs.aws.amazon.com/kendra/latest/dg/custom-document-enrichment.html#advanced-data-manipulation (see section "Lambda functions: extract and change metadata or content")

Best,

Didier

전문가
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.