Very Large ASYNC_NETWORK_IO on SQL Server when migrating data from SQL Server EC2 to Aurora Postgres using DMS

0

We have a database on a SQL Server Standard Edition version 2016 on an EC2 Instance and we have continuous replication for a small portion of the data over to an Aurora Postgres Database using DMS. Sometimes out of the blue, we notice that our data on Aurora has fallen behind, by as much as 90 mins, i.e. data was added in SQL Server but 90 mins later still has not been migrated over by DMS. This can happen times when there is not much a load on either database, i.e. there be only a few transactions to replicate over but nothing is happening. When looking at the logs of the DMS instance we do see this: "Reading from source is paused. Total storage used by swap files exceeded the limit 1048576000 bytes" but as I mentioned there is not much data to replicate and the DMS instance does not seem to be under much stress. Any ideas of what might be causing this would be most welcome. Thanks

2 Answers
1

Hi,

Based on description here seems like there is a CDC Latency issue here.

Basically "Reading from source is paused. Total storage used by swap files exceeded the limit 1048576000 bytes" This is indicating target can not keep up with the change for some reason thus in order to prevent storage full for replication instance, we have a hard code limit for swap file set to 1 GB. When reading pause from source, this can increase CDCLatencySource as well can eventually enter in a bad latency loop.

In such case, you will be able to see high CDCLatencyTarget: [+] https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Monitoring.html#CHAP_Monitoring.Metrics.Task

The gap, in seconds, between the first event timestamp waiting to commit on the target and the current timestamp of the AWS DMS instance. CDCLatencyTarget represents the latency between replication instance and target. When CDCLatencyTarget is high, it indicates the process of applying change events to the target is delayed. To identify latency in an ongoing replication, you can view this metric together with CDCLatencySource. If CDCLatencyTarget is high but CDCLatencySource isn’t high, investigate if:

    No primary keys or indexes are in the target

    Resource bottlenecks occur in the target or replication instance

    Network issues reside between replication instance and target

To resolve the issue I would suggest first go over [+] https://aws.amazon.com/premiumsupport/knowledge-center/dms-high-target-latency/ for general guidance.

Identify any performance bottle neck from target, if performance insight is enabled, that will be helpful in case to troubleshoot query causing wait time block or lock.

Once Target Latency goes down make sure no Source Latency by referring to [+] https://aws.amazon.com/premiumsupport/knowledge-center/dms-high-source-latency/

Also note: If ECA such as readBackupOnly is used for source endpoint, then it is likely to see some source latency. [+] https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html#CHAP_Source.SQLServer.ConnectionAttrib

I hope above information can help resolving the issue.

AWS
SUPPORT ENGINEER
Kevin_Z
answered 2 years ago
0

If there's not a huge load into the source database, this could be a problem with the Replication Instance itself. Can you please open up an AWS support case and they can check up on that.

AWS
Abbas
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions