Reading AWS Glue table from S3 - GetObject API costs

0

Dear AWS re:Post

For my ETL jobs, I read most of my date from RDS, but some I read directly from a table whose data sit on S3. I only discovered today that each job run generated not insignificant cost using the GetObject API and I'm trying to reconstruct how the calls work.

I have approximately 60 000 files that sit S3 that for this table, but I'm using a push down predicate to read only 6 000 for my ETL.

I think the GetObject cost associated with my ETL is around 50 000 000 Get Object calls (storage class is S3 Standard), i.e. 20$/0.0004*1000.

As I'm only expecting 5000-6000 files to be read, I'm assuming that create_dynamic_frame.from_catalog reads each file on S3 using the partNumber option to cut the files in pieces. As the maximum number of parts is 10 000, that fits more or less my my estimates. However I couldn't find more details on the documentation on how the S3 calls work in create_dynamic_frame.from_catalog.

Thanks a lot for your help !

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions