Impossible to delete a batch compute environment

0

Hi, I managed to somehow create a Batch compute environment with a role like this: arn:aws:iam::294878777873:role/aws-service-role/batch.amazonaws.com/MyCustomConversionTaskRole

I'm now completely unable to delete this compute environment: CLIENT_ERROR - User: batch.amazonaws.com is not authorized to perform: sts:AssumeRole on resource

Attempting to delete shows a green box at the top saying the environment will be deleted but it just reverts to Invalid status. Following some online resources I tried to recreate this role but since it has an Arn of a batch-create role, I'm simply unable to recreate the role and give it the required permissions to be able to delete the compute environment.

I need help, I want to delete this environment and not keep this lingering in my account forever. Any help appreciated, I'm really stuck,

asked 16 days ago59 views
4 Answers
1

Hello.

Can I create a normal service role with the following AWS CLI command?
If the AWS Batch service role has already been created, the command will fail to execute.

aws iam create-service-linked-role --aws-service-name batch.amazonaws.com

Once the role "AWSServiceRoleForBatch" has been created, try changing the service role from the AWS Batch compute screen.
a

a

Or, you can deploy the following CloudFormation template to recreate "MyCustomConversionTaskRole".

AWSTemplateFormatVersion: 2010-09-09
Resources:
  AWSBatchServiceRole:
    Type: 'AWS::IAM::Role'
    Properties:
      RoleName: MyCustomConversionTaskRole
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - 'batch.amazonaws.com'
            Action:
              - 'sts:AssumeRole'
      Path: /service-role/batch.amazonaws.com/
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole

Deploying the above CloudFormation template will create "arn:aws:iam::294878777873:role/aws-service-role/batch.amazonaws.com/MyCustomConversionTaskRole".

I would recommend opening a case with AWS Support under "Account and billing" and asking about this.
"Account and billing" inquiries are free of charge.
You can contact us at the following URL:
https://console.aws.amazon.com/support

profile picture
EXPERT
answered 16 days ago
  • When I attempt the suggested method (using an equivalent Python boto3 script to update the CE to use the existing AWSServiceRoleForBatch service role via the API), I get the following error, so this does not work:

    "2025-05-03 22:16:52,489 - ERROR - Error updating compute environment 'edgebips-fargate-compute-env': An error occurred (ClientException) when calling the UpdateComputeEnvironment operation: A CE using a user-provided role cannot be updated to use Batch Service Linked Role."

    I asked Gemini to explain the error and it's done a wonderful job at it: https://g.co/gemini/share/e85161381f6c """ Why this error occurs:

    AWS has designed the system such that you cannot switch the fundamental type of IAM role used by a Compute Environment after it has been created. Specifically, you cannot transition from a user-managed IAM role to the AWS-managed Service Linked Role via an update operation.

    The choice between using a user-provided role or the AWS Batch SLR is a foundational decision made during the creation of the Compute Environment. Modifying this setting later is not supported. This restriction likely exists to prevent potential permission conflicts or inconsistencies that could arise from changing the underlying trust and permission model of an active Compute Environment. """

    I believe this is something that can only be dealt with by someone with superuser/force privileges.

    I'm going to try one more thing: I'll try creating a service role like the AWSServiceRoleFor

  • That didn't work either: attempting to create the role programmatically is not allowed:

     ./create-like-service-role.py 
    2025-05-03 22:28:40,399 - INFO - Found credentials in shared credentials file: ~/.aws/credentials
    2025-05-03 22:28:40,441 - INFO - Checking if IAM role 'ORATSConversionTaskRole' exists...
    2025-05-03 22:28:40,573 - INFO - IAM role 'ORATSConversionTaskRole' not found. Creating...
    2025-05-03 22:28:40,601 - ERROR - Error creating or attaching policy to role 'ORATSConversionTaskRole': An error occurred (InvalidInput) when calling the CreateRole operation: Path prefix '/aws-service-role/' can only be used for AWS Service linked Roles
    
  • Have you tried deploying the CloudFormation template below? I have confirmed that I can create a service role in my AWS account by deploying the following CloudFormation template.

    AWSTemplateFormatVersion: 2010-09-09
    Resources:
      AWSBatchServiceRole:
        Type: 'AWS::IAM::Role'
        Properties:
          RoleName: MyCustomConversionTaskRole
          AssumeRolePolicyDocument:
            Statement:
              - Effect: Allow
                Principal:
                  Service:
                    - 'batch.amazonaws.com'
                Action:
                  - 'sts:AssumeRole'
          Path: /service-role/batch.amazonaws.com/
          ManagedPolicyArns:
            - arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole
    
  • Forgive me if I'm wrong, but my sense given what I'm seeing around the APIs is that the AWS APIs are equivalent to each other, no? Calling Python should have the same effect (it's 99% likely the same code handling the endpoint). I can try running the update through CloudFormation to test this out but I have a strong prior it'll produce the same result.

    I think the problem is not that, but that the Arn it's set to is using "aws-service-role" not "service-role" in the path. Here's the full Arn for the role that's stuck on my compute env:

    arn:aws:iam::294878777873:role/aws-service-role/batch.amazonaws.com/ORATSConversionTaskRole

    I'm unable to create a role with the matching name because of that "aws-service-role" path. There seems to be a check in the API endpoint that prevents this. See error message above.

    Here's the Python code I used FWIW: https://gist.github.com/blais/dc4a4ece03c1e2ae6592397b3a2196cf

  • I confirm: attempting to create the said role by using a CloudFormation stack produces the same result. Ends up with CREATE_FAILED with error message: "Resource handler returned message: "Path prefix '/aws-service-role/' can only be used for AWS Service linked Roles (Service: Iam, Status Code: 400, Request ID: 76d9db1d-41bd-4a68-815c-f9ec24ecee6e) (SDK Attempt Count: 1)" (RequestToken: 55f4bb4e-fe13-54cf-8d88-642a9b97afbc, HandlerErrorCode: InvalidRequest)" same as my Python-based attempt.

    Thanks for your help, but I think this one's beyond userspace. I don't know how I managed to screw up this CE but the only way to delete it will require someone with superuser privs.

0
  • I already tried updating the service role from the console; it just won't let me. It lets me select the arn:aws:iam::294878777873:role/aws-service-role/batch.amazonaws.com/AWSServiceRoleForBatch role but when I save it does not get updated.

  • Attempting to do this through the console also doesn't work: I get this error: An error occurred (ClientException) when calling the UpdateComputeEnvironment operation: A CE using a user-provided role cannot be updated to use Batch Service Linked Role.

Please help. I'm unable to resolve this on my own. I actually think this is a bug in the system, nothing I've been able to do to resolve it, including the steps above.

answered 16 days ago
0

FWIW, the environment is edgebips-fargate-compute-env

answered 16 days ago
-1

I understand your frustration with being unable to delete the AWS Batch compute environment. This is a known issue that can occur when there are problems with the associated IAM roles. Here are some steps you can try to resolve this:

  1. First, it's important to note that the role you mentioned (MyCustomConversionTaskRole) is a service-linked role, which is managed by AWS and cannot be manually created or modified. This explains why you're unable to recreate it.

  2. The error message you're seeing suggests that the Batch service doesn't have the necessary permissions to assume the role. This could be because the role has been deleted or its trust relationship has been modified.

  3. To resolve this, you should try updating the compute environment with a correct service role. You can do this through the AWS Batch console:

    • Go to the AWS Batch console
    • Select the invalid compute environment
    • Choose "Update" from the Actions menu
    • In the update form, specify a valid AWS Batch service role
  4. If you don't have a valid AWS Batch service role, you can create one with the necessary permissions. The role should trust the batch.amazonaws.com service and have the AWSBatchServiceRole managed policy attached.

  5. If updating the compute environment doesn't work, you may need to use the AWS Batch UpdateComputeEnvironment API to update the service role. When doing this, make sure to use the AWSServiceRoleForBatch service-linked role, as this is required for updating parameters that need an infrastructure update.

  6. If none of these steps work, you may need to contact AWS Support for further assistance. They have additional tools and permissions to help resolve issues with stuck resources.

Remember, it's crucial to use the supported AWS Batch APIs or console to manage compute environments and avoid manually modifying Batch-managed resources to prevent such issues in the future.
Sources
INVALID compute environment - AWS Batch
UpdateComputeEnvironment - AWS Batch
Cannot delete Computer Enviroments on Batch | AWS re:Post

profile picture
answered 16 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions