- Newest
- Most votes
- Most comments
In addition to the re:Post Agent answer: I’d like to add three specific points that might be the 'smoking gun' in your SQL Server / EC2 setup:
1. Avoid Task 'Failure': You mentioned the tasks 'eventually fail' when the DB shuts down. This is suboptimal. When a task crashes, DMS might lose its volatile memory state and has to perform a more expensive 'Log Scan' upon restart to find the exact last LSN. Try to gracefully stop the DMS tasks via automation before the DB goes offline.
2. The 1-Hour Gap: Waiting an hour after the DB starts actually works against you. During that hour, SQL Server might already be processing internal maintenance or startup jobs, filling the Transaction Log. DMS then has to fight through an even larger backlog. Ideally, start the DMS task immediately after the DB engine is 'Available'.
3. T-Log Truncation & Backups: Check if you have automated log backups running shortly after the DB starts. If a backup truncates the active log before DMS has read the changes, DMS might have to fetch them from the backup files (S3 or local disk), which is significantly slower than reading from the active log. Also, check the PollingInterval in your MS-SQL source endpoint settings. If it's set too high, DMS 'sleeps' too long between read cycles, which compounds the startup delay.
Thank you for the answer. Since it is a dev environment, I don't foresee many transactions during the first hour but I will try suspending before the shutdown and resume a short time after the restart and see if this helps. Thank you.
The delay you're experiencing is likely due to CDC source latency that occurs when resuming tasks after an extended period. When you stop a DMS task, it saves the position of the last transaction log that was read from the source. When you resume the task, DMS attempts to continue reading from the same transaction log position. If the task has been stopped for several hours (like overnight in your case), DMS needs to consume the entire transaction backlog that accumulated during the downtime, which causes CDC source latency to increase until it finishes processing all those changes.
To address this issue, here are some things you can check and optimize:
-
Monitor your replication instance resources - Ensure your replication instance has adequate CPU, memory, and I/O capacity to process the backlog quickly. Check CloudWatch metrics like FreeMemory and CPUUtilization to identify any resource bottlenecks.
-
Review network bandwidth - Verify that your network connection between the source SQL Server and the replication instance can handle the volume of changes. Check ReadThroughput and WriteThroughput metrics.
-
Optimize task settings - Consider enabling BatchApplyEnabled for better performance on supported endpoints. Also review endpoint settings to ensure resource-intensive features that aren't necessary are turned off.
-
Check swap file usage - If swap files on the replication instance exceed 1 GB, source reading may be paused. You can configure MemoryLimitTotal, MemoryKeepTime, and StatementCacheSize settings to reduce swap file usage.
-
Consider your restart timing - You might need to adjust when you restart the DMS tasks relative to when the database comes back online, or evaluate if there's a way to minimize the transaction backlog that accumulates overnight.
The key issue is that DMS must process all accumulated changes sequentially from the saved position, which takes time proportional to the volume of changes that occurred while the system was down.
Sources
Troubleshooting latency issues in AWS Database Migration Service - AWS Database Migration Service
Resolve CDC failures, performance issues, and sequence errors in AWS DMS | AWS re:Post
Relevant content
- asked 3 years ago
- asked 3 years ago

If my answer helped solve your problem, I would appreciate it if you click on “accepted answer”