Skip to content

Custom engine versions for RDS Custom stuck in status "Creating...)

2

creating an Custom Version Engine for RDS, it failed, not sure why, but also now it is stuck in "Creating..." status, in this status it can't be deleted from console nor aws cli I get " You can't delete custom engine version..... currently being created."

The Error message did say that from what I can remember something along the lines "Something has gone wrong AWS will look into it", but I've no idea if that is true or why the failure.

In summary how can I clean up the failed CEV creation given that options from Console and CLI don't see to be an option.

I have seen another answer here on re:Post about asking for help using account and billing, someone had a similar issue but stuck on Delete.

  • If my answer helped solve your problem, I would appreciate it if you click on “accepted answer”

asked 2 months ago86 views
2 Answers
5

In my experience with RDS Custom for Oracle and also for SQL Server, the primary reason a Custom Engine Version (CEV) gets stuck in the "Creating..." status is a failure in the orchestration layer - specifically, the build instance's inability to communicate with required AWS services. It was a bit frustrating that, during troubleshooting, I basically had to re-create the RDS Custom when the process got stuck. For example, if there was a network issue, I had first to fix the root cause in the VPC setup and then rebuild RDS Custom - at least that was my experience... .

1. "Silent Failure" because of Networking

When you trigger a CEV creation, RDS Custom spins up a temporary EC2 instance in your VPC to build the engine. If this instance cannot communicate with the outside world or AWS service endpoints, the process doesn't just fail - it hangs indefinitely.

  • Internet Access Requirement: Both Oracle and SQL Server engines often need to fetch OS patches, dependencies, or installation scripts from external repositories during the build phase.
  • S3 Connectivity: The instance must pull the database binaries (ISOs or ZIPs) from your S3 bucket. If there is no S3 VPC Gateway Endpoint or NAT Gateway configured, the download never starts.
  • SSM (Systems Manager) Connectivity: This is the most common "gotcha." AWS uses SSM to send commands to the build instance. If the instance cannot reach the SSM endpoints, it remains "unmanaged," and RDS never receives the signal that the installation has started or failed.

see:

2. Required Infrastructure "Checklist"

Before retrying, ensure your VPC subnets have the following in place:

  • Outbound Connectivity: Security Groups must allow HTTPS (443) outbound for the build instance.
  • S3 VPC Endpoint: Required for the instance to pull the installation media from your bucket without leaving the AWS network.
  • Ensure the VPC Endpoint Policy allows the Principal (the build instance's IAM role) to perform the necessary actions (e.g., s3:GetObject for the S3 endpoint or ssm:* for the SSM endpoints).
  • SSM Interface Endpoints: If your subnet is private (no NAT), you must have Interface Endpoints for ssm, ssmmessages, and ec2messages.
  • Routing Tables: Ensure the route table for your subnet actually points to a NAT Gateway or Internet Gateway.

3. IAM & Instance Profile Permissions

The temporary EC2 instance needs an IAM Instance Profile.

  • If this role lacks the AmazonS3ReadOnlyAccess for your specific media bucket or the AmazonSSMManagedInstanceCore policy, the automation will fail.
  • Result: The RDS control plane loses "line-of-sight" to the build process, leading to the "Creating..." hang.

4. The CloudFormation Factor

RDS Custom often uses AWS CloudFormation under the hood to orchestrate these resources.

  • If the networking fails, the CloudFormation stack might get stuck in CREATE_IN_PROGRESS or ROLLBACK_FAILED.
  • As long as the underlying stack exists, the RDS Console will show "Creating..." and will block any attempts to delete the CEV via CLI or Console.

5. Disk Space & Throughput (The "Storage" Trap)

If often seen, the temporary EC2 instance used for the build has a default root volume size. If your Oracle binaries or SQL Server ISOs plus the unzipped installation files exceed this disk space, the installation will crash midway.

  • Why it hangs: Similar to networking, if the disk is 100% full, the SSM agent on the instance might crash. Once the agent is dead, it can't send the "Failed" signal back to RDS, leaving the status at "Creating...".
  • Fix: Ensure that the instance type and storage configuration defined in your CEV request have enough "headroom" (usually at least 2-3x the size of your installation media).

6. Storage & Throughput Constraints

  • Disk Space: The build instance requires significant local storage to download, unzip, and install the database engine. If the root volume or the temporary EBS volumes run out of space, the SSM agent may hang or crash.
  • Result: Since the agent can no longer heartbeat back to the RDS service, the CEV remains stuck in "Creating..." because AWS is waiting for a completion signal that will never arrive.
  • KMS Key Throttling: If you are using a heavily used KMS key for encryption, ensure you aren't hitting service quotas. A failure to attach an encrypted EBS volume due to KMS throttling will also halt the process.

see also:

EXPERT
answered 2 months ago
0

When a Custom Engine Version (CEV) creation fails and becomes stuck in a "Creating..." status, this is a known issue that can occur with RDS Custom. Based on the information available, here's what's happening and how to resolve it:

Why this happens: CEV creation can fail for various reasons, and when it does, RDS Custom issues an event (RDS-EVENT-0198) with details about the failure. However, the CEV can sometimes remain in an intermediate state that prevents normal deletion operations through the console or CLI.

Common causes of CEV creation failures include:

  • The AMI and S3 bucket (if using Bring Your Own Media) not being in the same AWS Region
  • Missing permissions for the IAM user or role
  • Installation media files missing from the S3 bucket or having incorrect SHA-256 checksums
  • Unsupported SQL Server editions or versions
  • Issues with RDS Custom creating necessary S3 buckets for storing CEV artifacts

How to resolve: Unfortunately, when a CEV is stuck in the "Creating..." state and cannot be deleted through normal channels (console or CLI), this requires AWS Support intervention. The standard deletion commands won't work because the system recognizes the CEV as still being in an active creation process.

You should contact AWS Support through your account's support channels (Support Center in the AWS Console or through the account and billing section as you mentioned seeing in another re:Post answer). Provide them with:

  • The CEV name and version
  • The AWS Region
  • Any error messages you received
  • The approximate time when the creation was initiated

AWS Support will be able to manually clean up the stuck CEV from the backend, allowing you to retry the creation process after addressing any underlying issues that caused the initial failure.
Sources
Troubleshooting DB issues for Amazon RDS Custom for SQL Server - Amazon Relational Database Service
Troubleshooting DB issues for Amazon RDS Custom for Oracle - Amazon Relational Database Service
Creating a CEV for RDS Custom for SQL Server - Amazon Relational Database Service

answered 2 months ago
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.