Skip to content

EMR Cluster does not terminate when launched from SageMaker

0

I launched an EMR cluster from a CloudFormation template stored as a Service Catalog template from SageMaker. In the template, KeepJobFlowAliveWhenNoSteps was not specified in JobFlowInstancesConfig for the EMR cluster.

According to the documentation here, KeepJobFlowAliveWhenNoSteps should default to false and not keep the cluster running without steps.

Why, then, did the cluster stay alive for a number of days without any steps? There was an IdleTimeout of 1 hour (3600 seconds) set in SageMaker. This seems to be a serious defect - the behavior does not occur when running clusters with the EMR interface directly.

1 Answer
0

Hi jsullivan43,

Let's dig into your issue:


Clarifying the Issue

Your EMR cluster stays alive when launched from SageMaker using a CloudFormation template. Although the KeepJobFlowAliveWhenNoSteps parameter defaults to false (terminating the cluster when no steps are running), the cluster persists beyond the IdleTimeout of 1 hour (3600 seconds) set in SageMaker.

This behavior may stem from how SageMaker’s Service Catalog integration processes EMR configurations. While KeepJobFlowAliveWhenNoSteps defaults to false, inconsistencies can arise when settings aren't explicitly specified.


Key Areas to Investigate

  1. IdleTimeout Setting in SageMaker
    Ensure that the IdleTimeout parameter in your SageMaker configuration aligns with your expectations. If IdleTimeout is not explicitly set in the EMR cluster configuration, it may default to a behavior inconsistent with CloudFormation directly.

  2. CloudFormation Template Review
    Revisit your template stored in Service Catalog. Verify if KeepJobFlowAliveWhenNoSteps is explicitly set to false. Here’s the relevant configuration snippet for clarity:

    "JobFlowInstancesConfig": {
        "KeepJobFlowAliveWhenNoSteps": false
    }

    Even if the default is false, inconsistencies during the template launch may lead to unexpected behavior.

  3. Shimomura Template Reference
    If you’re using the following sample template shared by Tomonori Shimomura, consider modifying it explicitly:

    {
        "AWSTemplateFormatVersion": "2010-09-09",
        "Resources": {
            "EMRCluster": {
                "Type": "AWS::EMR::Cluster",
                "Properties": {
                    "Name": "EMRClusterFromServiceCatalog",
                    "ReleaseLabel": "emr-6.3.0",
                    "Applications": [{"Name": "Hadoop"}],
                    "Instances": {
                        "MasterInstanceGroup": {
                            "InstanceType": "m5.xlarge",
                            "InstanceCount": 1
                        },
                        "CoreInstanceGroup": {
                            "InstanceType": "m5.xlarge",
                            "InstanceCount": 2
                        },
                        "TerminationProtected": false,
                        "KeepJobFlowAliveWhenNoSteps": false
                    },
                    "JobFlowRole": "EMR_EC2_DefaultRole",
                    "ServiceRole": "EMR_DefaultRole",
                    "VisibleToAllUsers": true,
                    "AutoTerminationPolicy": {
                        "IdleTimeout": 3600
                    }
                }
            }
        }
    }

    This template explicitly sets KeepJobFlowAliveWhenNoSteps to false and includes an IdleTimeout of 3600 seconds (1 hour).

  4. SageMaker-Specific Overrides
    Service Catalog or SageMaker may override certain EMR default settings. As a workaround:

    • Explicitly set KeepJobFlowAliveWhenNoSteps in the template.
    • Test the same configuration outside SageMaker to confirm the behavior.
  5. Step Execution Validation
    Confirm that no lingering steps are being queued or pending, which may inadvertently keep the cluster alive.


Next Steps

By explicitly setting KeepJobFlowAliveWhenNoSteps and testing the behavior both within and outside SageMaker, you can determine whether SageMaker's Service Catalog introduces overrides or inconsistencies.

Credit goes to Tomonori Shimomura for sharing the relevant template reference! Let me know if you need further clarification or additional steps. 😊

Cheers! Aaron 🚀

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.