Why is my Application Migration Service or Elastic Disaster Recovery replication process stuck at 100% with the "Finalizing Initial Sync" message?

7 minute read
0

I use AWS Application Migration Service (AWS MGN) or AWS Elastic Disaster Recovery (AWS DRS). The replication process is stuck at 100% with a console that states "Finalizing Initial Sync".

Short description

When the replication process is stuck at 100% during syncs for Application Migration Service or Elastic Disaster Recovery, you see the following errors:

  • "Finalizing Initial Sync - Flushing Backlog"
  • "Finalizing Initial Sync - Creating First Launchable Snapshot"

Resolution

Troubleshooting the "Finalizing Initial Sync - Flushing Backlog" error

Wait until the backlog completes flushing for the sync to initialize.

If the source machine is write intensive, then the backlog can increase in size. The machine might remain stuck in the Finalizing Initial Sync state on the Application Migration Service or Elastic Disaster Recovery console. If this occurs, then complete the following steps:

  1. Test the replication speed (on the CloudEndure website).
  2. Calculate the required bandwidth for all replicating source machines. Make sure that the network throughput of the replication instance is sufficient.
  3. Under Replication Settings, verify if Network bandwidth throttling is activated. If your configuration requires activating this option, then make sure that you set the value to at least the minimum required bandwidth. For more information, see the bandwidth throttling documentation for Application Migration Service or Elastic Disaster Recovery.
  4. Use Amazon CloudWatch metrics to check the network and disk utilization of the replication server. If a resource throttles the server, then use a dedicated replication server or a larger replication server type. Or, choose SSD-based storage. For more information, see Disk settings (Application Migration Service) or Disk settings (Elastic Disaster Recovery).
  5. To verify which replication server a specific source machine uses, run the netstat command on the source machine as shown in the following example.
    Note the remote IP address that the machine connects to over port 1500:

netstat command for Linux:

$ netstat -anp | grep ":1500"

netstat command for Windows:

netstat -ano | findstr ":1500"

Or, review the agent.log.0 file on the source machine to identify the exact replication server in use:

agent.log.0 for Linux:

$ sudo cat /var/lib/aws-replication-agent/agent.log.0 | grep :1500 | tail -n 1

agent.log.0 for Windows:

findstr /L ":1500" "C:\Program Files (x86)\AWS Replication Agent\agent.log.0"

Troubleshooting the Finalizing Initial Sync - Creating First Launchable Snapshot error

To troubleshoot this error, complete one or more of the following steps:

Verify that the Application Migration Service or Elastic Disaster Recovery user's AWS IAM policy has all permissions to run the required Amazon EC2 APIs

For the Application Migration Service or Elastic Disaster Recovery user’s policy, see the required AWS credentials for Application Migration Service or Elastic Disaster Recovery. Or, you can view the AWS CloudTrail Event history to confirm any API failures for the configured user.

Confirm that the replication server communicates with Amazon EC2 endpoints within the Region

  1. Launch a new Linux machine in the same subnet as your staging area.
  2. To test connectivity, log in to the new machine and run the following commands. In the following example commands, replace us-east-1 with your Region:
$ dig ec2.us-east-1.amazonaws.com  
$ telnet ec2.us-east-1.amazonaws.com 443  
$ wget https://ec2.us-east-1.amazonaws.com

If any of these commands fail, then network connectivity issues exist. Proceed to the following section.

Identify any network connectivity blockers

Verify that the virtual private cloud (VPC), subnet, security group, network access control list (network ACL), and route table settings align with the Replication Settings. A misconfiguration might block communication to Amazon EC2 endpoints from the replication servers.

If the replication server launches in a public subnet, then complete the following steps:

  1. Verify that the security group, network ACLs, and the route table allow communication with Amazon EC2 endpoints on TCP port 443.
  2. Verify that the enableDnsHostnames and enableDnsSupport attributes are set to true at the VPC level:
$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsHostnames  
{   
 "VpcId": "vpc-a01106c2",  
 "EnableDnsHostnames": {   
 "Value": true  
 }   
}
$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsSupport  
   
{  
 "VpcId": "vpc-a01106c2",   
 "EnableDnsSupport": {  
 "Value": true   
 }  
}

If the replication server launches in a private subnet, then complete the following steps:

  1. Verify that the security group, network ACLs, and route table allow communication with Amazon EC2 endpoints on TCP port 443.
  2. If you configured a NAT gateway or instance in the route table, then verify that outbound traffic to the EC2 endpoint on TCP port 443 works.
  3. Check if outbound traffic passes through a transit or virtual private gateway. In this case, make sure that the route table allows traffic to EC2 endpoints on TCP port 443.
  4. Check if the firewall blocks communication.
  5. If the VPC has interface VPC endpoints, then make sure that communication occurs between Amazon EC2 endpoints on TCP port 443 through a private network. To do this, complete the following steps:

Verify that the enableDnsHostnames and enableDnsSupport attributes are set to true at the VPC level. Verify that the PrivateDnsEnabled value is set to true on the VPC interface endpoints:

$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsHostnames --query 'EnableDnsHostnames'  
{   
 "Value": true  
} 
$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsSupport --query 'EnableDnsSupport'  
{   
 "Value": true  
} 
$ aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-088d25a4bbf4a7abc --query 'VpcEndpoints[0].PrivateDnsEnabled'  
true

Check for recent changes in Replication Settings

Search for the UpdateReplicationConfiguration API call to track changes to Replication Settings from the CloudTrail Event history. Then, use the source server to filter Resource name. For example, check if there's a not valid tag inserted in the Replication resources tags field. For a list of allowed characters, see Tag restrictions.

Verify that you're using the correct proxy settings

  1. If your replication servers use a proxy server, then make sure that the proxy settings allow communication with Regional EC2 endpoints on TCP port 443.
  2. Make sure that the allowed list for SSL interception and authentication includes mgn.<region>.amazonaws.com for Application Service Migration and drs.<region>.amazonaws.com for Elastic Disaster Recovery. For more information, see Can a proxy server be used between the source server and the Application Migration Service console? Also, see Can a proxy server be used between the source server and the Elastic Disaster Recovery Console?

Confirm that the Replication Agent works correctly

Confirm that the AWS Replication Agent works correctly on the source machine. You can check the Replication Agent logs for possible errors to help pinpoint any problems. The Replication Agent logs are located in the following file locations:

Linux Replication Agent logs:

/var/lib/aws-replication-agent/agent.log.0

Windows Replication Agent logs:

C:\Program Files (x86)\AWS Replication Agent\agent.log.0

Check for Amazon EC2 service quota issues

Service quota issues or API throttling and rate limit issues might prevent Application Migration Service or Elastic Disaster Recovery from creating the first launchable recovery snapshot. Check the CloudTrail Event history to determine if a service quota or bandwidth throttling issue exists.

AWS OFFICIAL
AWS OFFICIALUpdated a year ago