Amazon Comprehend limits are documented here https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits.html and indeed for entity detection documents cannot be bigger than 1MB in size. I would suggest you take your latest approach, that is splitting documents in 1MB max chunks and perform entity detection on those. When building the Kendra index you can then aggregate the entities for each chunk and associate them to the original document.
According to https://docs.aws.amazon.com/comprehend/latest/dg/guidelines-and-limits.html#limits-custom-entity-recognition, the max document size for PDF and Word documents should be 50MB and 5MB, respectively, and the max document size for UTF-8 encoded plain-text documents is 1MB.
Error when testing Lexbot with Kendra Search IntentAccepted Answerasked 3 days ago
[Announcement] Amazon Kendra adds spell checker for queriesasked 7 months ago
Where to find the search optionasked a year ago
How to search AWS documentation?asked 2 months ago
Is the actual Kendra confidence score available in the API response.Accepted Answerasked 5 months ago
AWS Kendra - Search PDF with handwritten textAccepted Answerasked 5 months ago
Does the custom entity recognition of Amazon Comprehend does not works with sem-structured data in Spanish?asked 4 months ago
Sagemaker Ground truth pdf annotation tool not rendering anythingasked 6 months ago
How to overcome document (pdf) size limit with Comprehendasked 3 months ago
Adding FAQ URL from Kendra to Lex ResponseAccepted Answerasked 8 months ago