Writing from Athena PySpark to S3 is a lot slower today than yesterday. How possible ?

0

I have a 10GB dataset loaded in a PySpark dataframe.

df.coalesce(1).write.mode('overwrite').parquet("s3://xxxxxxxxxx-eu-west-1-athena-results-bucket-h1snx89wnc/output-data-parquet2") 

Yesterday, the parquet file was created in 6 to 7 minutes. Today, it won't even finish, as I am disconnected from AWS console before it completes (so at least 45mn+).

Is it possible or did I do something wrong ? (the source file hasn't changed)

lalvaro
질문됨 일 년 전221회 조회
1개 답변
0

Hello @lalvaro,

Looks like you are referring to an issue you faced during a specific Athena Spark Session. It is tricky to provide any recommendation here without additional information. We will need Calculation ID for further investigation into the parquet file write issue you are facing. If your account has Premium support subscription, please submit a support case with the Session and Calculation IDs and an engineer will work with you on this issue.

On the other hand, if the calculation has already been submitted, it will keep running even after your AWS console disconnects, you will be able to find the result of the calculation in calculation history.

As for the console/role timeout issue, please work with your AWS Account administrator to have your IAM role timeout extended.

AWS
지원 엔지니어
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠