DMS Random Termination

0

I have setup a CDC replication task using AWS Database Migration Service (DMS) capturing changes from a Postgres Database and writing them into Kinesis. This generally works fine but the DMS task seems to terminate randomly after some time (mainly during the night) without being able to resume again. An example log of such a "random" termination:

2023-07-30T22:01:43 [SOURCE_CAPTURE  ]I:  Heartbeat was signaled successfully  (postgres_endpoint_util.c:1785)
2023-07-30T22:06:43 [SOURCE_CAPTURE  ]I:  Heartbeat was signaled successfully  (postgres_endpoint_util.c:1785)
2023-07-30T22:24:34 [AT_GLOBAL       ]I:  Task Server Log - ais2-pdp-cdc-task  (V3.4.7.R905 ip-172-23-0-99 Linux 4.14.320-242.534.amzn2.x86_64 #1 SMP Wed Jul 12 19:43:51 UTC 2023 x86_64 64-bit, PID: 1311159) started at Sun Jul 30 22:24:34 2023  (at_logger.c:3051)
2023-07-30T22:24:34 [DATA_STRUCTURE  ]I:  SQLite version is 3.31.1  (at_sqlite.c:174)
2023-07-30T22:24:34 [VALIDATOR       ]I:  validation_util_class_initialize  (validation_util.c:71)
2023-07-30T22:24:34 [VALIDATOR       ]I:  Creating Table Def Mutex  (validation_util.c:75)

Note that between 22:06:43 - 22:24:34 nothing at all seems to be running, not even the heartbeat that should run all 5 minutes.

Further below DMS seems to try and resume the task but fails because the replication slot is already occupied by the earlier uncleanly terminated task:

2023-07-30T22:24:36 [SOURCE_CAPTURE  ]I:  Slot has plugin 'test_decoding'  (postgres_test_decoding.c:237)
2023-07-30T22:24:36 [SOURCE_CAPTURE  ]E:  Slot 'ais2_pdp_cdc_tas_00025600_ed91b868_07df_42c6_941d_2ebb04d30481' state found as 'already active' while expected as 'inactive'. [1020461]  (postgres_endpoint_capture.c:355)
2023-07-30T22:24:36 [TASK_MANAGER    ]I:  Task - ais2-pdp-cdc-task is in ERROR state, updating starting status to AR_NOT_APPLICABLE  (repository.c:5102)
2023-07-30T22:24:36 [SOURCE_CAPTURE  ]E:  Error executing source loop [1020461]  (streamcomponent.c:1873)
2023-07-30T22:24:36 [TASK_MANAGER    ]E:  Stream component failed at subtask 0, component st_0_VPO5NKIVXTDIZSNUG75H5RV2POSZ7O3FGQ4VQFY [1020461]  (subtask.c:1414)

I could not find anyone else with a similar problem. Is it a known issue? Has anyone used DMS successfully for a change data capturing from Postgres to Kinesis?

Below I am including some DMS Task configuration details that might be relevant for this issue:

 "StreamBufferSettings": {
        "StreamBufferCount": 3,
        "CtrlStreamBufferSizeInMB": 5,
        "StreamBufferSizeInMB": 8
    },
    "ErrorBehavior": {
        "FailOnNoTablesCaptured": true,
        "ApplyErrorUpdatePolicy": "LOG_ERROR",
        "FailOnTransactionConsistencyBreached": false,
        "RecoverableErrorThrottlingMax": 1800,
        "DataErrorEscalationPolicy": "SUSPEND_TABLE",
        "ApplyErrorEscalationCount": 0,
        "RecoverableErrorStopRetryAfterThrottlingMax": false,
        "RecoverableErrorThrottling": true,
        "ApplyErrorFailOnTruncationDdl": false,
        "DataTruncationErrorPolicy": "LOG_ERROR",
        "ApplyErrorInsertPolicy": "LOG_ERROR",
        "EventErrorPolicy": "IGNORE",
        "ApplyErrorEscalationPolicy": "LOG_ERROR",
        "RecoverableErrorCount": -1,
        "DataErrorEscalationCount": 0,
        "TableErrorEscalationPolicy": "STOP_TASK",
        "RecoverableErrorInterval": 5,
        "ApplyErrorDeletePolicy": "IGNORE_RECORD",
        "TableErrorEscalationCount": 0,
        "FullLoadIgnoreConflicts": true,
        "DataErrorPolicy": "LOG_ERROR",
        "TableErrorPolicy": "SUSPEND_TABLE"
    },
    "TTSettings": {
        "TTS3Settings": null,
        "TTRecordSettings": null,
        "EnableTT": false
    },
    "FullLoadSettings": {
        "CommitRate": 10000,
        "StopTaskCachedChangesApplied": false,
        "StopTaskCachedChangesNotApplied": false,
        "MaxFullLoadSubTasks": 8,
        "TransactionConsistencyTimeout": 600,
        "CreatePkAfterFullLoad": false,
        "TargetTablePrepMode": "DROP_AND_CREATE"
    },
    "TargetMetadata": {
        "ParallelApplyBufferSize": 100,
        "ParallelApplyQueuesPerThread": 1,
        "ParallelApplyThreads": 0,
        "TargetSchema": "",
        "InlineLobMaxSize": 0,
        "ParallelLoadQueuesPerThread": 1,
        "SupportLobs": true,
        "LobChunkSize": 64,
        "TaskRecoveryTableEnabled": false,
        "ParallelLoadThreads": 0,
        "LobMaxSize": 200,
        "BatchApplyEnabled": false,
        "FullLobMode": false,
        "LimitedSizeLobMode": true,
        "LoadMaxFileSize": 0,
        "ParallelLoadBufferSize": 0
    },
    "BeforeImageSettings": {
        "EnableBeforeImage": true,
        "ColumnFilter": "all",
        "FieldName": "before-image"
    },
    "ControlTablesSettings": {
        "historyTimeslotInMinutes": 5,
        "HistoryTimeslotInMinutes": 5,
        "StatusTableEnabled": false,
        "SuspendedTablesTableEnabled": false,
        "HistoryTableEnabled": false,
        "ControlSchema": "",
        "FullLoadExceptionTableEnabled": false
    },
    "LoopbackPreventionSettings": null,
    "CharacterSetSettings": null,
    "FailTaskWhenCleanTaskResourceFailed": false,
    "ChangeProcessingTuning": {
        "StatementCacheSize": 50,
        "CommitTimeout": 1,
        "BatchApplyPreserveTransaction": true,
        "BatchApplyTimeoutMin": 1,
        "BatchSplitSize": 0,
        "BatchApplyTimeoutMax": 30,
        "MinTransactionSize": 1000,
        "MemoryKeepTime": 60,
        "BatchApplyMemoryLimit": 500,
        "MemoryLimitTotal": 1024
    },
    "ChangeProcessingDdlHandlingPolicy": {
        "HandleSourceTableDropped": true,
        "HandleSourceTableTruncated": true,
        "HandleSourceTableAltered": true
    },
    "PostProcessingRules": null
Yannick
已提問 9 個月前檢視次數 131 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南