I'm trying to cancel an Amazon EMR step. When I run the cancel-steps command, I get the following error: "Cannot cancel the step. It is already RUNNING."
Short description
This error affects Amazon EMR versions 5.27.x and earlier. In these release versions, the cancel-steps command cancels pending steps only. To cancel a running step, cancel either the application ID (for YARN steps) or the process ID (for non-YARN steps).
In Amazon EMR versions 5.28.0 and later, you can use cancel-steps to cancel both pending and running steps. For more information, see Work with steps using the AWS CLI and console.
Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.
Resolution
Use one of the following methods to cancel running steps in Amazon EMR versions 5.27.x and earlier.
Cancel YARN applications
1. Connect to the primary node using SSH.
2. To find the step's application ID, run the following command to list all running applications.
yarn application -list
3. Run the following command to stop the application. Replace application_id with your application ID, such as "application_1505786029486_002."
Note: This command stops all pending steps in the queue.
yarn application -kill application_id
Cancel non-YARN applications
1. Connect to the primary node using SSH.
2. Run the following command to get the process ID (pid). In the following example, replace step_id with your step identifier, such as s-Y9XXXXXXAPMD.
ps -ef |grep -i step_id
In the following example output, the process ID is 2366:
hadoop 2366 4664 0 16:20 ? 00:00:01 /etc/alternatives/jre/bin/java -Xmx1000m -server -XX:OnOutOfMemoryError=kill -9 %p -Dhadoop.log.dir=/mnt/var/log/hado
op/steps/s-2RNURIK9Z2JUH -Dhadoop.log.file=syslog -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.pat
h=:/usr/lib/hadoop-lzo/lib/native:/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.io.tmpdir=/mnt/var/lib/hadoop/st
eps/s-2RNURIK9Z2JUH/tmp -Dhadoop.security.logger=INFO,NullAppender -Dsun.net.inetaddr.ttl=30 org.apache.hadoop.util.RunJar /var/lib/aws/emr/step-runner/hadoop-
jars/command-runner.jar bash -c envsubst < /home/hadoop/truffle_suffle.json.template
3. Run the following command to kill the process. Replace 2366 with the process identifier for your step.
Note: This command stops all pending steps in the queue.
kill -9 2366
The status of the step changes from Running to Failed.
Related information
Canceling steps