Skip to content

DMS Serverless - Internal server error and no further logs!

1

Hi,

we are testing AWS DMS Serverless to see if is viable to migrate some DMS task we already running in production to Serverless mode (mostly full loads ...).

We get the following generic non-descriptive errors messages in the AWS DMS Serverless console

"Status: Failed" "Failure messages: Replication failed due to an internal failure"

and if when I activate the loging at maxium levels, we get a "failed to fech metadata" error but don't know how to fix the problem.

2023-06-12T14:01:38.041+02:00 {'replication_state':'fetching_metadata', 'message': 'Fetching metadata from your source endpoint to calculate workload capacity.'} 2023-06-12T14:01:39.730+02:00 {'replication_state':'failed', 'message': 'Failed to fetch metadata.', 'failure_message': 'Internal failure.'}

We are pretty sure that Endpoints (target and source) work because we use those same endpoints work on production (which is the same account) and have support in DMS Serverless(S3 and Oracle). The same happengs with the security group, the vpc, the subnet and the subnetgroup, are the SAME as other DMS tasks that are working perfectly fine in the same account.

Finally we also double check if we had the VPC Endpoints for DMS Serverless, we only have S3 VPC Gateway endpoints since that is the only service we are really using.

Does anyone have any insights how we can further debug this problem?

thank you.

Luis.

6 Answers
2

Hi Didier_AWS, Do you know how to stop it, because I am facing the same problem. Enter image description here

answered 2 years ago
1

We are also testing DMS serverless and also getting the same message as the op, at the stage described in logs as ...

Fetching metadata from your source endpoint to calculate workload capacity.

{'replication_state':'failed', 'message': 'Failed to fetch metadata.', 'failure_message': 'Internal failure.'}

Has anyone following this thread discovered any new relevant info regarding this issue?

answered 2 years ago
  • Hey @rhbecker! The problem as I recall was fixed when we granted full permissions to the Oracle User configured for the endpoint. The problem was that that user did not had enough priviledges to read some metadata from Oracle, to properlly calculate the workload capacity.

  • Thanks for responding, @Luis!

    Am I understanding correctly that you're referring to privileges at the database server level, and not to AWS IAM privileges?

    We thought it could be something along those lines, and tried adding the SQL Server login to the SQL Server sysadmin group. That didn't seem to help, but perhaps there's more we can poke at, in terms of database privileges.

  • I also have this "Internal failure" error with a serverless instance, 1y later. https://repost.aws/questions/QUu1lpkXONTs6C15kdSibT2w/dms-serverless-replication-instance-can-t-be-started-reloaded-or-resumed Last month it just worked after more attempts. Now wasn't so lucky. But after 5 fetch failures in a new instance and me creating 3 new instances, it did go to providing capacity, so I can only assume it either fetched the metadata or used a default. I'm used to random Secrets CURL code 28 errors, maybe it's the same cause.

    Did you happen to find any other reasons ?

0

We are also seeing the same exact issue currently, there is not enough information in the logs to know what the problem is. It would be great if DMS could output the actual problem in the logs so that we can address it. We have an open case at the moment, if we find anything out I can post back here.

answered a year ago
0

Hi, DMS serverless is brand new so some glitches may still be present. Did you try to open a Support case from within the console of your AWS account? AWS Service teams usually do not monitor sites like re:Post. So, they won't your issue for which you seem to have solid and tangible evidence.

EXPERT
answered 2 years ago
0

[OP] After some weeks of waiting for the Support case to be handled by "the internal team", Amazon eventually recognized that they have a bug reading the metadata of the DB (while calculating the capacity units to use). The explanation: "Your DMS Serverless replication config "..." failed while provisioning, due to an issue with how your database is interacting with DMS performance optimization processes. We are making a fix for this issue which we expect to be available for you to use by Tuesday 07/16."

After they deployed this fix, I had no more errors, did multiple migrations, and calculating DCUs worked fine (seems it needed a lot of them, 256, maybe related). The poor logging during failure: I don't know if they improved it (from the most recent answer here, it seems that no).

answered a year ago
0

I recently resolved several challenges while setting up AWS DMS Serverless replication, particularly when the source database and the DMS replication instance resided in different VPCs. Issues ranged from initial connectivity failures to less descriptive "internal error failed" messages. Based on my experience, here are key configurations and considerations that proved critical:

**Establish VPC Peering for Cross-VPC Communication: ** Context: AWS DMS Serverless replication instances do not possess public IP addresses. They operate within the subnets you specify in your VPC. Action: If your source or target database is in a different VPC from your DMS Serverless replication, you must establish a VPC Peering connection between the two VPCs. Crucial Step: After establishing the peering connection, update the route tables associated with the relevant subnets in both VPCs. Add routes that direct traffic destined for the peered VPC's CIDR block through the VPC peering connection. Without these routes, the VPCs won't know how to communicate with each other.

**Enable DNS Resolution for the VPC Peering Connection: **Context: If you use DNS hostnames (e.g., RDS endpoint names) to connect to your databases, these hostnames must resolve to private IP addresses across the peered VPCs. Action: In the VPC Peering connection settings, ensure you enable DNS resolution. Specifically, select: "Allow accepter VPC to resolve DNS of hosts in requester VPC to private IP addresses." "Allow requester VPC to resolve DNS of hosts in accepter VPC to private IP addresses." Impact: This allows resources in one VPC to resolve private DNS hostnames in the peered VPC, which is essential for DMS to find your database endpoints.

**Configure VPC Endpoints for AWS Secrets Manager (if used): ** Context: If your DMS replication is configured in private subnets (without a NAT Gateway for internet access) and you are using AWS Secrets Manager to store database credentials, DMS needs a private path to access Secrets Manager. Action: Create a VPC Interface Endpoint for Secrets Manager (service name: com.amazonaws.<region>.secretsmanager) within the VPC where your DMS Serverless replication is running. Benefit: This allows DMS to retrieve credentials securely over the AWS private network without requiring internet access, enhancing security and reliability.

**Set Appropriate DMS Capacity Units (DCUs): ** Context: Insufficiently provisioned DMS Capacity Units (DCUs) for your replication task can lead to obscure errors, making troubleshooting difficult. Symptom: An undersized configuration might result in errors like: Replication failed due to an internal failure. Replication failed in state 'REPLICATION_STARTING'. Setting state to FAILED Action: Carefully consider your data volume, transactional load, and the complexity of transformations when setting the Minimum and Maximum DCUs. Under-provisioning is a common source of unexpected task failures.

answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.