Trouble configuring AWS DataPipeline to use Spot Instances instead of On-Demand Instances

0

Hi Team,

I have set up an AWS DataPipeline to run my EMR jobs on On-Demand instances. However, I now want to switch to using Spot Instances to reduce costs. I have configured the spotBidPrice parameter in my pipeline settings, expecting it to run on Spot Instances. However, it seems that the pipeline is still using On-Demand instances.

Could you please help me understand how I can properly configure my DataPipeline to utilize Spot Instances? Here are my current pipeline settings:

coreInstanceCount: 40
coreInstanceType: r5.4xlarge
keyPair: #{myKeyPair}
masterInstanceType: r5.4xlarge
maximumRetries: 2
region: #{myRegion}
releaseLabel: emr-5.31.0
resourceRole: #{myResourceRole}
role: #{myRole}
subnetId: #{mySubnetId}
taskInstanceType: r5.4xlarge
terminateAfter: 240 Minutes
spotBidPrice: 10.00
useOnDemandOnLastAttempt: true

I appreciate any guidance or suggestions you can provide to help me successfully configure my AWS DataPipeline to use Spot Instances. Thank you!

asked a year ago311 views
2 Answers
0

Hi there!

Can you try using taskInstanceBidPrice instead of spotBidPrice?

I hope this helps.

profile pictureAWS
EXPERT
answered a year ago
  • I set it as the following (using coreInstanceBidPrice and taskInstanceBidPrice), but still not working (it is still running on demand):

      "coreInstanceCount": "40",
      "coreInstanceType": "r5.4xlarge",
      "coreInstanceBidPrice": "10.00",
      "keyPair": "#{myKeyPair}",
      "masterInstanceType": "r5.4xlarge",
      "maximumRetries": "2",
      "region": "#{myRegion}",
      "releaseLabel": "emr-5.31.0",
      "resourceRole": "#{myResourceRole}",
      "role": "#{myRole}",
      "subnetId": "#{mySubnetId}",
      "taskInstanceType": "r5.4xlarge",
      "taskInstanceBidPrice": "10.00",
      "terminateAfter": "240 Minutes",
      "useOnDemandOnLastAttempt": "true"
    },
    
0

Hi,

I can see from the current pipeline settings, that the “useOnDemandOnLastAttempt” is set to “true”. The parameter 'useOnDemandOnLastAttempt' is set to true by default. To avoid getting on demand instances used for EMR cluster, when Spot instances are not available you need to set this parameter to false. Also the maximum attempts for EMR cluster resource is defaulted to "1", you can also change the "maximumRetries" of EMR cluster to more than “1”. Currently you have "maximumRetries: 2” you can increase it to get the spot instances in other attempts.

Scenarios where the spot instances fail to launch

  1. Spot price is low than the minimum required Spot request fulfillment price.
  2. Limitation issue "EXCEEDED_SPOT_INSTANCE_COUNT_LIMIT (USER_ERROR)".
AWS
SUPPORT ENGINEER
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions