How do I troubleshoot replication lag or a backlog on my Windows source server for Application Migration Service?
I see a lag or backlog in my Windows source server when I use AWS Application Migration Service to replicate data.
Short description
You experience lag and backlog when you replicate data for the following reasons:
- Slow network connection speed didn't allow the replication process to complete, or your bandwidth limited the amount of data that you can replicate.
- Large spikes in new disk data caused a backlog that the AWS Replication Agent must send with the initial sync.
- High read latency on the source server disks delayed disk replication.
- High CPU, memory, I/O wait, or other resource usage caused replication bottlenecks.
- You chose Amazon Elastic Block Store (Amazon EBS) staging volumes with low throughput or input/output operations per second (IOPS) and servers with limited network bandwidth. This causes latency and performance issues during replication.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Check the source server
Verify the source server status
Make sure that the source server for the migration is booted and running.
Verify that AWS Replication Agent processes are running
To list the running AWS Replication Agent services, run the following command from PowerShell:
get-service | where-object name -like "*AWSR*"
In the output, verify that AWSReplicationService is Running.
Example output:
PS C:\Users\Administrator> get-service | where-object name -like "*AWSR*" Status Name DisplayName ------ ---- ----------- Running AwsReplicationD... AwsReplicationDriverLogger Running AwsReplicationL... AwsReplicationLogger Stopped AwsReplicationP... AwsReplicationPostConvertService Running AwsReplicationS... AwsReplicationService Running AwsReplicationV... AwsReplicationVolumeUpdaterService
Or, press Windows + R, and then enter services.msc. Press Enter, and then verify that AWSReplicationService is Running.
Verify active TCP connections
Verify that there are five active TCP connections established with the replication server on TCP port 1500.
To check TCP port 1500, run the following command as an administrator:
netstat -an | find "1500"
Check the command output for the active connections.
Example output:
TCP 172.31.82.135:50929 Replicator Instance IP:1500 ESTABLISHED TCP 172.31.82.135:50930 Replicator Instance IP:1500 ESTABLISHED TCP 172.31.82.135:50931 Replicator Instance IP:1500 ESTABLISHED TCP 172.31.82.135:50933 Replicator Instance IP:1500 ESTABLISHED TCP 172.31.82.135:50934 Replicator Instance IP:1500 ESTABLISHED
Use Windows Resource Monitor to check the performance on the source server
The AWS Replication Agent operates on one CPU core at a time. If CPU usage is high on the core where the AWS Replication Agent is running, then data replication slows. To check your CPU usage, complete the following steps:
- Open the Task Manager, and then choose the Performance tab. Then, choose Open Resource Monitor.
-or-
Open the Control Panel, and then choose Administrative Tools. Then, choose Resource Monitor.
-or-
Run resmon.exe from the command line or PowerShell.
-or
Choose the Windows icon, and then enter resmon.exe. - Check the CPU usage of the CPU core that the AWS Replication Agent is running on.
If the CPU usage is high on that core, then investigate the process that consumes most of the CPU. If the agent uses at least 5% of the CPU, then verify that there's enough CPU available for the agent to perform the data replication. - Check disk performance on the source server. Under Disk Activity, check the Write (B/sec) and Response Time metrics.
If there's low read throughput on the source disk, then the agent reads and replicates less data. Note any increase in the disk read and disk write metrics.
Note: The required bandwidth to transfer replicated data over TCP port 1500 is based on the write speed of the participating source server. It's a best practice to have a bandwidth that's at least the sum of the average write speed of all replicated source machines. - Check the source server for a spike in write operations. Under Disk Activity, check the Write (B/sec) metric.
As the workload changes, periodically check the disk performance to determine the I/O load. If the write throughput exceeds the provided amount of network throughput, then you experience replication lag. - (Optional) Calculate the required bandwidth from the source server to the replication server.
Note: If your source server is write heavy and writes more than the replication speed, then the backlog continues to grow.
Check replication speed and available bandwidth from source server to the staging area subnet
For information about how to run a speed test, see How can I perform an SSL connectivity and bandwidth test?
Check for a source server that shut down ungracefully
If a source server shuts down ungracefully, then the AWS Replication Agent rescans all the disks after the server reboots. As the AWS Replication Agent rereads the disks, the lag continuously grows until the agent completes the scan. For more information, see Which Windows and Linux OSs support no-rescan upon reboot?
To check how the source machine shut down, complete the following steps:
- Press Windows + R, and then enter eventvwr.msc.
- Press Enter.
- In the navigation pane, double-click Windows Logs to expand the options.
- Open the context (right-click) menu for System.
- Choose Filter Current Log.
- Choose the Event sources down arrow, and then choose USER32.
- For All Event IDs, enter 1074, and then choose OK. Now, the Event Viewer shows you a list of power off (shutdown) and restart Shutdown Type events.
- To see the dates and times of all unexpected computer shutdowns, enter 6008 in the All Event IDs field, and then choose OK.
Verify that you didn't block outbound TCP port 1500 traffic
To confirm that outbound TCP port 1500 traffic from the source server to the replication server isn't blocked, run one of the following commands:
From CMD, run the following command:
telnet replication-subnet-IP-address 1500
From PowerShell, run the following command:
TNC replication-subnet-IP-address -port 1500
Note: Replace replication-subnet-IP-address with your replicator instance IP address.
Make sure that your local firewall allows connectivity from the source server to the replication server over TCP port 443. To activate connectivity on the operation system (OS) firewall, complete the following steps:
- On the source server, open the Windows Firewall console.
- Choose Outbound Rules.
- In the Outbound Rules table, select the rule related to the remote port 1500 connection. Verify that the Enabled status is set to Yes.
- If the Enabled status of the rule is No, then open the context (right-click) menu for the rule. Then, select Enable Rule.
Make sure that your corporate firewall allows traffic over TCP port 1500.
Verify that bandwidth throttling is deactivated in the replication settings on the source server
Deactivate bandwidth throttling on the source server to keep enough bandwidth for data transfers from the source server to the staging area subnet. Bandwidth throttling can cause constant or stagnant lag growth because it limits the data replication from the source server to the replication server.
To check for bandwidth throttling, complete the following steps:
- Open the Application Migration Service console.
- Choose Settings.
- Under Data routing and throttling, select the replication template.
- Select Do not throttle bandwidth to allow replication to use the full available network capacity and reduce migration time.
Note: When you select Throttle bandwidth, Application Migration Service artificially caps data transfer speeds. This creates a bottleneck that slows the replication process. Select this option only if you need to limit network usage for cost control or to protect resources for other critical applications.
Check the staging area resources
Verify that inbound TCP Port 1500 traffic isn't blocked
To confirm that the replication server security groups don't block inbound TCP port 1500 traffic, complete the following steps:
- Open the Amazon Elastic Compute Cloud (Amazon EC2) console.
- In the navigation pane, choose Security groups, and then select the security group that's attached to the replicator instance.
- Verify that the security group allows inbound TCP port 1500 traffic.
Analyze your staging resources
Check the replication instance and staging disk configuration for performance bottlenecks.
Check the snapshot quota in the destination Region
Make sure that your AWS account didn't exceed the snapshot quota in the replication server's AWS Region.
To check your snapshot quota in the Region, run the following get-service-quota AWS CLI command:
aws service-quotas get-service-quota --service-code ebs --quota-code L-309BACF6 --region regionexample --query "Quota.Value"
Note: Replace regionexample with your Region.
Then, run the following describe-snapshots command to check the snapshots in the Region:
aws ec2 describe-snapshots --owner-ids self --region regionexample --query "length(Snapshots)"
Note: Replace regionexample with your Region.
- Language
- English

Valuable article. Thanks sharing.
Relevant content
- asked 2 years ago
- asked 3 years ago
- asked 3 years ago