How can I troubleshoot T-Log full issues on my RDS for SQL Server source when CDC is turned on for my AWS DMS task?

5 minute read
0

I have an AWS Database Migration Service (AWS DMS) task that has change data capture (CDC) turned on. My task uses Amazon Relational Database Service (Amazon RDS) for SQL Server as a source. I see "SQL Server T-Log full" issues on my AWS DMS task.

Short description

If you're using SQL Server as the source for AWS DMS tasks, then activate MS change data capture (CDC) on your instances to replicate changes. You can use MS-CDC for Amazon RDS SQL Server, on-premises SQL Server, and SQL Server on Amazon Elastic Compute Cloud (Amazon EC2) instances.

After you turn on CDC, it uses the fn_dblog() function to track and read ongoing DML changes from the active log file. For every source table, CDC creates a corresponding CDC change table.

CDC creates two jobs to track changes:

  • Capture job - Scans T-logs, and then uses the sp_replcmds store procedure to capture and record changes in the corresponding change table.
  • Cleanup job - Cleans change tracking based on retention policy.

Run the following stored procedure to view the value set for these settings:

EXEC sys.sp_cdc_help_jobs
  • Maxtrans - The maximum number of transactions to process in each scan cycle.
  • Maxscans - The maximum number of scan cycles to run to extract all rows from the log.
  • Pollinginterval - The number of seconds between log scan cycles.

On Amazon RDS for SQL Server, transaction log backups are performed every 5 minutes by design. During transaction log backup, SQL Server truncates the inactive portion of the transaction log. This portion contains Log Sequence Numbers (LSNs) that AWS DMS didn't read. As a result, AWS DMS might not locate the required LSNs for replicating the desired data changes.

To avoid this issue, AWS DMS has awsdms_truncation_safeguard and pollingInterval options.

awsdms_truncation_safeguard: AWS DMS creates a table called awsdms_truncation_safeguard in the source database. This table prevents the truncation of the transaction log by mimicking a transaction in the database

But, if AWS DMS delays reading the logs for more than 10 minutes, then the task fails again. To avoid this issue, use the pollingInterval parameter. For more information, see Using a Microsoft SQL Server database as a source for AWS DMS.

pollingInterval: The default value is set to 5 seconds for capture jobs. This means that pollingInterval runs every 5 seconds, scans the T-log to read changes, and then marks these logs as replicated. Then, it truncates the logs.

It's a best practice to set the pollingInterval value to >=3599. This prevents capture jobs from running too frequently. It also configures the T-logs so that they aren't truncated for a specified amount of time.

A highly transactional database can result in a T-Log full state. The task then fails with an error similar to the following one:

E: RetCode: SQL_ERROR SqlState: 42000 NativeError: 9002 Message: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The transaction
log for database 'yourdatabase' is full due to 'REPLICATION'

Resolution

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.

Troubleshoot and resolve SQL Server T-Log full issues when CDC is turned on

Complete the following steps to troubleshoot and resolve the SQL Server T-Log full issues when CDC is turned on.

1.    Check the size of your transaction log files:

DBCC SQLPERF(logspace)

2.    If the transaction log is full, then run the following command to see log's contents:

select name, log_reuse_wait_desc from sys.databases where name = ' db_name '
  • Replication: The replication stopped working or isn't reading active transactions.
  • Active_transaction: There's an open transaction.

If there's an active transaction, then check list of open transactions:

select * from sys.sysprocesses where open_tran=1

Note: When AWS DMS is in the Running state, you can't shrink the T-Log of a database that has CDC turned on. Instead, stop the task, and then wait for log backup to truncate the transaction logs. Then, shrink the T-Log of the database.

Shrink the T-log files and RESTART the task

After you clean up the logs, shrink the T-log files, and then restart the task:

1.    Stop the task.

2.    Check the status of CDC, and then turn off CDC.

Check the status of CDC:

select b.name,a.run_date,a.run_status from msdb.dbo.sysjobhistory as a
join msdb.dbo.sysjobs as b on a.job_id=b.job_id
where b.name like '%cdc%capture%';

Turn off CDC at the database level:

exec msdb.dbo.rds_cdc_disable_db '<db_name>'

Find tables that are part of CDC:

USE db_name
GO
SELECT s.name AS Schema_Name, tb.name AS Table_Name
, tb.object_id, tb.type, tb.type_desc, tb.is_tracked_by_cdc
FROM sys.tables tb
INNER JOIN sys.schemas s on s.schema_id = tb.schema_id
WHERE tb.is_tracked_by_cdc = 1

Turn off CDC at table level:

use <db_name>
EXEC sys.sp_cdc_disable_table
@source_schema = N'<schema>',
@source_name = N'<table>',
@capture_instance = N'<schema_table>'

3.    Verify that CDC is turned off:

select is_cdc_enabled, * from sys.databases where name = 'dbname' Value 1- enabled and 0 -Disabled

4.    After you confirm that CDS is turned off, wait for at least 10 minutes, and then shrink your T-logs:

use <dbname>
SELECT file_id, name
FROM sys.database_files;
GO
USE <dbname>
GO
DBCC SHRINKFILE (N'dbname_log' , 0, TRUNCATEONLY)
GO

5.    Confirm the size of the logs:

DBCC SQLPERF(logspace)

6.    Turn on CDC again, and then restart the task. Resuming the task might cause missing transactions. Amazon RDS for SQL Server doesn't support accessing the archived T-log.

Related information

Prerequisites for using ongoing replication (CDC) from a SQL Server source

Recommended settings when using Amazon RDS for SQL Server as a source for AWS DMS

AWS OFFICIAL
AWS OFFICIALUpdated a year ago