Elastic Disaster Recovery Post Launch Actions stalled

0

Hello, I am testing out using AWS DRS Post-Launch actions. I am able to successfully launch a recovery instance during a drill, however the post launch actions are stuck at "Running, 0/2 done". I can connect to my instance through ssh and both health checks are passing.

I am initiating the drill from a user with Administrator access, and I have the following policies on the instance role:

  • AmazonSSMManagedInstanceCore
  • AWSElasticDisasterRecoveryConsoleFullAccess_v2
  • AWSElasticDisasterRecoveryLaunchActionsPolicy
  • AWSElasticDisasterRecoveryRecoveryInstancePolicy

Does anybody have any troubleshooting tips on how to get the Post Launch actions to work?

  • can you navigate to the AWS Elastic Disaster Recovery (EDR) console and review the Event history for your recovery plan. Look for any error messages or warnings related to the post-launch actions. This may provide additional insights into what went wrong.

  • Hi Adeleke, the event history shows a successful launch of the recovery instance - no errors or warnings at all.

  • The issue ended up being with the DNS server settings on the new recovery instance. The new instance couldn't resolve the vpc endpoints' private IP address since it still had the on premise DNS server IP in /etc/resolv.conf instead of the AmazonProvidedDNS address of VPC CIDR +2. I thought that the conversion server would do something like this, but it does not.

1 Answer
0

Check that the source instances being recovered are not in a recovering or launching state. This is a common error if trying to initiate a drill for instances currently in another operation.

Ensure the IAM role has permission to pass the required role to EC2 instances during recovery. Post launch actions require assuming a role.

Try initiating a recovery for a single instance to isolate any issues. Also check CloudWatch logs for more details on where the drill initiation may be failing.

profile picture
EXPERT
answered a month ago
  • Hi Giovanni, the replication is happening from an on prem server, not another EC2 instance. There are no issues at all with the drill being initiated, as I'm able to connect to the recovery instance and the health checks are 2/2. The only thing that is not working are the post launch actions. Do I need any other IAM Policies on the instance role of the recovery instance besides the ones I have listed above? I am initiating the drill from a user with Admin privileges. Thanks for your response

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions