Reading AWS Glue table from S3 - GetObject API costs

0

Dear AWS re:Post

For my ETL jobs, I read most of my date from RDS, but some I read directly from a table whose data sit on S3. I only discovered today that each job run generated not insignificant cost using the GetObject API and I'm trying to reconstruct how the calls work.

I have approximately 60 000 files that sit S3 that for this table, but I'm using a push down predicate to read only 6 000 for my ETL.

I think the GetObject cost associated with my ETL is around 50 000 000 Get Object calls (storage class is S3 Standard), i.e. 20$/0.0004*1000.

As I'm only expecting 5000-6000 files to be read, I'm assuming that create_dynamic_frame.from_catalog reads each file on S3 using the partNumber option to cut the files in pieces. As the maximum number of parts is 10 000, that fits more or less my my estimates. However I couldn't find more details on the documentation on how the S3 calls work in create_dynamic_frame.from_catalog.

Thanks a lot for your help !

Aucune réponse

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions