Spark UI logs are imcomplete

0

I have enabled Spark UI in AWS Glue as described in the documentation. Using Docker, I can read the different Spark executions.

However, I see 2 unexpected behaviours:

  • All jobs appear in "incomplete applications".
  • When a job is finished, I cannot see all the Spark task as finished but it seems that the info showed is a snapshot of some seconds before.

How can I fix this?

asked 7 months ago297 views
2 Answers
1

The job doesn't wait for logs to be published, for short jobs of a few minutes (not sure why), it's common for the final log is not published.
What you can do is at the end of the job, add a bit of sleep (e.g. 30 secs) so at least it has the change the update the inprogress log with the job completion.

profile pictureAWS
EXPERT
answered 7 months ago
  • It is not what I expected but I will try that. Thanks!

  • I tried the solution but it works partially. It is true that sometimes you must.

    However, the real problem is that Glue generates 2 files one when is running ("spark-application-1695890646202.inprogress") and another when it has finished ("spark-application-1695890646202"). Then, when you launch SparkUI, it detects both files and only show the incomplete one. In order to fix this, Glue must delete the "inprogress" file before writing the final one.

  • Yes, that's normal. On a local environment Spark would delete the "inprogress" one after creating the final one, but in this case the logs are asynchronously uploaded to s3. On s3 that's normally not an issue because the final one is listed before the inprogress but yes in some case you might have to restart the history server and even delete the inprogress file manually

0

Some times you might get just the complete spark_ui logs instead of incomplete once, Right now there are many conditions in which history files get left around with the ".inprogress" extension. The cleaner doesn't remove these because it can't distinguish between something running and left over abandoned files. This is an unknown, limitation with spark .

As a workaround you can write your script, to remove unnecessary files at once.

profile picture
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions