- 최신
- 최다 투표
- 가장 많은 댓글
Hi, The optimal path is to use: AWS Textract to convert your pdf back to text and then train your ML model on this text.
AWS Textract service page: https://aws.amazon.com/textract/
Textract developer guide: https://docs.aws.amazon.com/textract/latest/dg/what-is.html
To have a detailled use case of Textract applied to ML, this video is very interesting: https://www.youtube.com/watch?v=WA0T8dy0aGQ
Finally, to apply to Llama2 fine tuning: https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehensive-case-study-for-tailoring-models-to-unique-applications
Finally, to do that finetuning on SageMaker: https://www.linkedin.com/pulse/enhancing-language-models-qlora-efficient-fine-tuning-vraj-routu
You have a SageMaker notebook for it: https://github.com/philschmid/huggingface-llama-2-samples/blob/master/training/sagemaker-notebook.ipynb
Best,
Didier