- Newest
- Most votes
- Most comments
Hmm... Strange. I am not able to reply to {message:id=893763} but can reply here.
The workaround mentioned in the other thread involving printing a PDF document to a PDF works for me. However, I needed a programmatic solution so I dug a little deeper. Using PyPDF2, I found that each of my documents that failed with INVALID_DOCUMENT_TYPE also threw the warning Xref table not zero-indexed. ID numbers for objects will be corrected. when reading in PyPDF2. So, I used
PyPDF2.PdfFileReader(my_bad_pdf_stream, strict=False)
which fixed the faulty Xref table. Now all my previously failed PDFs will work.
Note that this is not a complete solution because the PyPDF2 code introduced errors in some files that were not previously problematic. This other problem will have to wait as I can process all my files for now.
Hope that helps.
Edited by: wchan on Jul 25, 2019 10:39 PM
Edited by: wchan on Jul 25, 2019 10:43 PM
Not sure if you ever got an answer to this, but I am running into it as well and I think its due to the fact my PDFs are too wide for the service.
Relevant content
- asked 3 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
This is still a problem. INVALID_DOCUMENT_TYPE provides no information about what is actually wrong, more granular messages would be invaluable. In my case, I think that my documents are failing because they exceed page size limits, but there's no way to know for sure until you "fix" it and see if it succeeds.