By using AWS re:Post, you agree to the Terms of Use

Questions tagged with AWS Batch

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Can't create a AWS Batch JobDefinition JobRoleArn in Cloudformation using a !Ref

I'm trying to create a Batch setup in Cloudformation. I have in Resources an IAM Role: ``` SecretsAndS3AccessRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: batch.amazonaws.com Action: 'sts:AssumeRole' - Effect: Allow Principal: Service: ec2.amazonaws.com Action: 'sts:AssumeRole' - Effect: Allow Principal: Service: ecs-tasks.amazonaws.com Action: 'sts:AssumeRole' ManagedPolicyArns: - 'arn:aws:iam::aws:policy/SecretsManagerReadWrite' - 'arn:aws:iam::aws:policy/AmazonS3FullAccess' - 'arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy' ``` Then in my JobDefinition I have: ``` JobDefinition: Type: 'AWS::Batch::JobDefinition' Properties: Type: container ContainerProperties: Image: uri/to/my/image Vcpus: 2 Memory: 2000 Command: - /simple-test Privileged: true JobRoleArn: !Ref SecretsAndS3AccessRole ExecutionRoleArn: !Ref SecretsAndS3AccessRole Secrets: - Name: MY_SECRET ValueFrom: arn:aws:secretsmanager:us-east-1:123456789:secret:MYSECRET-abcdef RetryStrategy: Attempts: 1 ``` When I try to build the stack, I get: > An error occurred (ClientException) when calling the RegisterJobDefinition operation: Error executing request, Exception : executionRoleArn bothrefs-SecretsAndS3AccessRole-1INAOWFBH2SK2 is not an iam role arn If I remove the `ExecutionRoleArn` line and the Secrets, the stack builds fine, which is to say that `JobRoleArn` is happy with a value of `!Ref SecretsAndS3AccessRole`. (But I need the secrets, and to use secrets you need an execution role.) And if I hardcode the ARN there, it works fine. What is different about `ExecutionRoleArn` that it doesn't allow a `!Ref`? According to [the documentation for JobDefinition/ContainerProperties][1], `JobRoleArn` and `ExecutionRoleArn` seem the same sort of object. If I instead use: ``` ExecutionRoleArn: !GetAtt SecretsAndS3AccessRole.Arn ``` Then it works fine! I tested removing JobRoleArn entirely - that makes my job fail. I tested changing it to also be `JobRoleArn: GetAtt SecretsAndS3AccessRole.Arn` -- that succeeds. So the mystery is: `JobRoleArn` likes its value either in Ref or GetAtt form, but ExecutionRoleArn requires GetAtt form. Why the difference? [1]: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-containerproperties.html
1
answers
0
votes
16
views
asked 7 days ago

AWS Batch job stuck in RUNNABLE state

I'm trying to execute a Batch Job with the help of AWS AMI. The Compute Environment for the job is intended to have certain installations for which I need to use AMI. But whenever I give the AMI reference in Compute Environment, the job gets stuck in RUNNABLE state forever. I have tried all the troubleshooting measures given in https://aws.amazon.com/premiumsupport/knowledge-center/batch-job-stuck-runnable-status/ but even that did not help. I even tried using different AMIs for the same. It has been observed that jobs get stuck whenever any custom AMI is used,. As soon as job is submitted, the instance is launched but the job still fails to execute. Without the use of AMI, I have been able to execute the job successfully. I want to know the possible causes and reasons of this. Following are the details of my Compute Environment: ``` { "computeEnvironmentName": "computeenvironmentname", "type": "MANAGED", "state": "ENABLED", "unmanagedvCpus": 0, "computeResources": { "type": "EC2", "allocationStrategy": "BEST_FIT_PROGRESSIVE", "minvCpus": 0, "maxvCpus": 256, "desiredvCpus": 0, "instanceTypes": [ "p3.2xlarge" ], "imageId": "imageid", "subnets": [ "list of subnets" ], "securityGroupIds": [ "security groups" ], "ec2KeyPair": "keypair", "instanceRole": "instancerole", "tags": { "KeyName": "" }, "placementGroup": "", "bidPercentage": 0, "spotIamFleetRole": "", "launchTemplate": { "launchTemplateId": "", "launchTemplateName": "", "version": "" }, "ec2Configuration": [ { "imageType": "ECS_AL2_NVIDIA", "imageIdOverride": "imageid" } ] }, "serviceRole": "servicerole", "tags": { "KeyName": "" } } ```
0
answers
0
votes
21
views
asked a month ago

AWS Batch - jobs in RUNNABLE for several hours

I'm trying to understand what could cause AWS Batch jobs (using EC2 spot instances) to be at times seemingly stuck in "RUNNABLE" state for several hours, before finally getting picked up. This annoying behaviour seems to come and go over time. Some days, the very same jobs, configured in the very same way, using the same queues and compute environments are almost immediately picked up and processed, and some days they will be in RUNNABLE status for a long time (recently I experienced 3 to 6 hours). The usual troubleshooting documents don't help, as they seem to only cover cases where the job *never* gets picked up (due to configuration issue, or mismatch of vCPU/memory between job and compute environments). What I observe when I hit these issues is that there doesn't seem to be any spot request shown in the EC2 dashboard. The Spot instance pricing at that time, for the type of instance I need (32 vCPU, 64Gb memory) is not spiking (and I've set it to a limit of 100% of on-demand anyway). So one theory is that there is no spot instance available at all at that time but 1) that seems unlikely (I use eu-west-1) and 2) I can't find any way to validate that theory. My account limits on M and R instance types (the ones typically be used when the jobs are running) is very high (1000+), so that's also not the reason as far as I can tell. Anyone with any theory and suggestion? For now, my solution is to change the queue to add a compute environment with on-demand instances, but that more than doubles the price...
0
answers
0
votes
19
views
asked a month ago