By using AWS re:Post, you agree to the Terms of Use

Every stack update tries to optimize gp3 volume


We have a stack which went through the following series of events:

  1. Created EBS volumes of type st1 and attached them to EC2 volumes.
  2. Later decided to convert these to gp3 when it was announced. Changed the VolumeType in the stack and applied an update.
  3. The volumes started optimizing.
  4. ~7 hours after initiating the update, the update failed and the stack went to being stuck in UPDATE_ROLLBACK_FAILED. CloudFormation tried to roll back the change, but could not do so. The status messages for each volume indicates:
    "Volume vol-... cannot be modified in modification state OPTIMIZING (Service: AmazonEC2; Status Code: 400; Error Code: IncorrectModificationState; Request ID: ...; Proxy: null)".
  5. A couple of days later the volume optimization finished. The EBS console shows gp3, and on the EC2 instances, we quite clearly see gp3 performance and not the previous st1. In CloudFormation the volumes show UPDATE_FAILED.
  6. Some time later we had to deploy another stack update to set throughput and IOPS of the gp3 volumes. Could not do so due to being stuck in UPDATE_ROLLBACK_FAILED.
  7. We rolled back the stack and excluded the volumes. Stack was now in UPDATE_ROLLBACK_COMPLETE and volumes were now in UPDATE_COMPLETE, and we deployed the updated stack.
  8. The volumes started optimizing again. It took over a day but eventually optimization finished.
  9. Once again, ~7 hours after starting the stack update, the update failed and the stack went to UPDATE_ROLLBACK_FAILED. Same messages for the volumes.
  10. After the volume optimization finished, the new throughput and IOPS are shown in the console. CloudWatch metrics show that the volume usage reflects with the new values.
  11. Today we had another update to the volumes. In this case we were only changing tags. All of the volumes except 1 started optimizing. The new tags were set, but the stack update failed due to a different reason, CloudFormation tried to roll back, and once again the volumes are now UPDATE_FAILED and the stack is UPDATE_ROLLBACK_FAILED, with the exact same message for the volumes. The volumes are still optimizing ~3 hours later.

We think the original problem was that we hit some kind of internal timeout in CloudFormation. No idea why it tries to optimize the volumes every time - shouldn't the most recent tag-only update not require optimization?

Is there anything we can adjust in the template, or during the update, to force CloudFormation to fully wait for volume optimization, or to bypass the attempt to optimize every time? I've considered creating a wait condition and manually resolving it (using cURL or whatever) once we see that the optimization completes, just to get it out of the way. I've also considered creating a stack policy to prevent updates to the EBS volumes but that doesn't guarantee that we won't run into this exact same problem if we need remove the policy to update the volumes in the future.

asked 2 years ago39 views
1 Answer

I got an answer from support. The issue was that the user credentials used for the operation when applying from the console expire after ~7 hours, but IAM service role accounts have a much longer timeout, so applying the update using the service role allowed the operation to completely wait for the optimization/conversion process.

As for the re-optimization of the volumes, the AWS::EC2::Volume documentation states that a cooldown period is enforced when changing Iops, Size, or VolumeType; my reading of it is that this refers to the optimization process. I saw that the optimization was triggered when changing Iops even after successfully updating the stack to the gp3 version, so I guess that's just the behavior of CloudFormation and/or EBS.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions