- Newest
- Most votes
- Most comments
Hi, The optimal path is to use: AWS Textract to convert your pdf back to text and then train your ML model on this text.
AWS Textract service page: https://aws.amazon.com/textract/
Textract developer guide: https://docs.aws.amazon.com/textract/latest/dg/what-is.html
To have a detailled use case of Textract applied to ML, this video is very interesting: https://www.youtube.com/watch?v=WA0T8dy0aGQ
Finally, to apply to Llama2 fine tuning: https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehensive-case-study-for-tailoring-models-to-unique-applications
Finally, to do that finetuning on SageMaker: https://www.linkedin.com/pulse/enhancing-language-models-qlora-efficient-fine-tuning-vraj-routu
You have a SageMaker notebook for it: https://github.com/philschmid/huggingface-llama-2-samples/blob/master/training/sagemaker-notebook.ipynb
Best,
Didier
Relevant content
- asked 3 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated a year ago