Very Large ASYNC_NETWORK_IO on SQL Server when migrating data from SQL Server EC2 to Aurora Postgres using DMS

0

We have a database on a SQL Server Standard Edition version 2016 on an EC2 Instance and we have continuous replication for a small portion of the data over to an Aurora Postgres Database using DMS. Sometimes out of the blue, we notice that our data on Aurora has fallen behind, by as much as 90 mins, i.e. data was added in SQL Server but 90 mins later still has not been migrated over by DMS. This can happen times when there is not much a load on either database, i.e. there be only a few transactions to replicate over but nothing is happening. When looking at the logs of the DMS instance we do see this: "Reading from source is paused. Total storage used by swap files exceeded the limit 1048576000 bytes" but as I mentioned there is not much data to replicate and the DMS instance does not seem to be under much stress. Any ideas of what might be causing this would be most welcome. Thanks

2개 답변
1

Hi,

Based on description here seems like there is a CDC Latency issue here.

Basically "Reading from source is paused. Total storage used by swap files exceeded the limit 1048576000 bytes" This is indicating target can not keep up with the change for some reason thus in order to prevent storage full for replication instance, we have a hard code limit for swap file set to 1 GB. When reading pause from source, this can increase CDCLatencySource as well can eventually enter in a bad latency loop.

In such case, you will be able to see high CDCLatencyTarget: [+] https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Monitoring.html#CHAP_Monitoring.Metrics.Task

The gap, in seconds, between the first event timestamp waiting to commit on the target and the current timestamp of the AWS DMS instance. CDCLatencyTarget represents the latency between replication instance and target. When CDCLatencyTarget is high, it indicates the process of applying change events to the target is delayed. To identify latency in an ongoing replication, you can view this metric together with CDCLatencySource. If CDCLatencyTarget is high but CDCLatencySource isn’t high, investigate if:

    No primary keys or indexes are in the target

    Resource bottlenecks occur in the target or replication instance

    Network issues reside between replication instance and target

To resolve the issue I would suggest first go over [+] https://aws.amazon.com/premiumsupport/knowledge-center/dms-high-target-latency/ for general guidance.

Identify any performance bottle neck from target, if performance insight is enabled, that will be helpful in case to troubleshoot query causing wait time block or lock.

Once Target Latency goes down make sure no Source Latency by referring to [+] https://aws.amazon.com/premiumsupport/knowledge-center/dms-high-source-latency/

Also note: If ECA such as readBackupOnly is used for source endpoint, then it is likely to see some source latency. [+] https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html#CHAP_Source.SQLServer.ConnectionAttrib

I hope above information can help resolving the issue.

AWS
지원 엔지니어
Kevin_Z
답변함 2년 전
0

If there's not a huge load into the source database, this could be a problem with the Replication Instance itself. Can you please open up an AWS support case and they can check up on that.

AWS
Abbas
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인