EMR 7.0.0 on EC2: shell script steps do not start/stay pending

0

Hi,

after EMR 7.0.0 was released in the previous week, we wanted to start using it.

Problem

We have shell script EMR steps that are executed during the start of the cluster. These EMR steps never get started, after the cluster is done bootstrapping and stay "Pending" although the cluster state is "Running". The same happens if we start the cluster without providing the steps during startup and just add them after it was bootstrapped. An example can be seen here:

Step is pending and not started

The execution of the same script in the same way is working with EMR 6.15.0. The only thing changed, is the EMR version. PySpark EMR steps also still work.

Is there a known bug or something that needs to be changed on our side? What can we do to run the shell scripts as previously done?

If any information is missing, please let us know. Thank you in advance!

EMR setup

Amazon EMR version: emr-7.0.0

Installed applications:

  • Hadoop 3.3.6
  • JupyterEnterpriseGateway 2.6.0
  • Livy 0.7.1
  • Spark 3.5.0

Instances: 1 Primary instance m5.2xlarge with 4 32GB EBS stores

EGeist
已提問 4 個月前檢視次數 339 次
2 個答案
2

Hello,

Basically, I do not find any issue in executing the shell script through Step in EMR 7.0.0. I tried both executing the Step as part of cluster provisioning and executing the Step through Add Step API via console & CLI method. Both methods worked as expected.

I presume in your case, there might be a specific shell script blocker or issue at the configuration. I recommend to login into the primary node and try executing the script manually to test if it's working fine or not. You might find the Step logs in /mnt/var/log/hadoop/steps location.

You can also try adding the step through CLI method or below method alternatively via console,

  1. Add Step, choose Type as customized jar
  2. Provide the Step name and Jar Location as s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar. Here region would be chosen based on your cluster region.
  3. In the Argument field, enter your actual shell script location s3://<Your bucket>/scripts/test.sh

If above methods are still not helpful to find the issue, please feel free to reach AWS Premium Support for getting more assistance.

AWS
支援工程師
已回答 4 個月前
  • Hello,

    I added the step exactly like you mentioned via console, with the corresponding region in the Jar location and running the step doesn't start at all - even though nothing else is running on the cluster. As mentioned before, the cluster goes into state "Running", but the step itself stays "Pending" and is not started and therefore doesn't write any logs. The script is probably not even downloaded. I added a screenshot to the initial post.

    I will try reaching out to AWS support - if there are any other ideas - please let me know!

  • I tested in eu-central-1 as well. This did not provide a chance to replicate your issue unfortunately. I suspect there could be network level issue as well. Consulting AWS Support would be worth troubleshooting this issue further.

0
已接受的答案

The problem was the script that we ran.

In the script, that we ran, openssl-devel was first removed and then a more up2date version of it was installed via yum (needed in EMR version 6.15.0 and below to compile newer Python versions).

This removal of openssl-devel lead to a failure of "hadoop-state-pusher", which is apparently responsible for communicating the state of an EMR step back to AWS. As it failed, the cluster was looking all the time as if the EMR step didn't run, although it probably finished already internally.

As the openssl-devel version is newer on EMR 7.0.0 upwards anyways, this is probably not needed anymore. We were able to run our script, by NOT removing openssl-devel.

EGeist
已回答 4 個月前
AWS
支援工程師
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南