DMS Random Termination

0

I have setup a CDC replication task using AWS Database Migration Service (DMS) capturing changes from a Postgres Database and writing them into Kinesis. This generally works fine but the DMS task seems to terminate randomly after some time (mainly during the night) without being able to resume again. An example log of such a "random" termination:

2023-07-30T22:01:43 [SOURCE_CAPTURE  ]I:  Heartbeat was signaled successfully  (postgres_endpoint_util.c:1785)
2023-07-30T22:06:43 [SOURCE_CAPTURE  ]I:  Heartbeat was signaled successfully  (postgres_endpoint_util.c:1785)
2023-07-30T22:24:34 [AT_GLOBAL       ]I:  Task Server Log - ais2-pdp-cdc-task  (V3.4.7.R905 ip-172-23-0-99 Linux 4.14.320-242.534.amzn2.x86_64 #1 SMP Wed Jul 12 19:43:51 UTC 2023 x86_64 64-bit, PID: 1311159) started at Sun Jul 30 22:24:34 2023  (at_logger.c:3051)
2023-07-30T22:24:34 [DATA_STRUCTURE  ]I:  SQLite version is 3.31.1  (at_sqlite.c:174)
2023-07-30T22:24:34 [VALIDATOR       ]I:  validation_util_class_initialize  (validation_util.c:71)
2023-07-30T22:24:34 [VALIDATOR       ]I:  Creating Table Def Mutex  (validation_util.c:75)

Note that between 22:06:43 - 22:24:34 nothing at all seems to be running, not even the heartbeat that should run all 5 minutes.

Further below DMS seems to try and resume the task but fails because the replication slot is already occupied by the earlier uncleanly terminated task:

2023-07-30T22:24:36 [SOURCE_CAPTURE  ]I:  Slot has plugin 'test_decoding'  (postgres_test_decoding.c:237)
2023-07-30T22:24:36 [SOURCE_CAPTURE  ]E:  Slot 'ais2_pdp_cdc_tas_00025600_ed91b868_07df_42c6_941d_2ebb04d30481' state found as 'already active' while expected as 'inactive'. [1020461]  (postgres_endpoint_capture.c:355)
2023-07-30T22:24:36 [TASK_MANAGER    ]I:  Task - ais2-pdp-cdc-task is in ERROR state, updating starting status to AR_NOT_APPLICABLE  (repository.c:5102)
2023-07-30T22:24:36 [SOURCE_CAPTURE  ]E:  Error executing source loop [1020461]  (streamcomponent.c:1873)
2023-07-30T22:24:36 [TASK_MANAGER    ]E:  Stream component failed at subtask 0, component st_0_VPO5NKIVXTDIZSNUG75H5RV2POSZ7O3FGQ4VQFY [1020461]  (subtask.c:1414)

I could not find anyone else with a similar problem. Is it a known issue? Has anyone used DMS successfully for a change data capturing from Postgres to Kinesis?

Below I am including some DMS Task configuration details that might be relevant for this issue:

 "StreamBufferSettings": {
        "StreamBufferCount": 3,
        "CtrlStreamBufferSizeInMB": 5,
        "StreamBufferSizeInMB": 8
    },
    "ErrorBehavior": {
        "FailOnNoTablesCaptured": true,
        "ApplyErrorUpdatePolicy": "LOG_ERROR",
        "FailOnTransactionConsistencyBreached": false,
        "RecoverableErrorThrottlingMax": 1800,
        "DataErrorEscalationPolicy": "SUSPEND_TABLE",
        "ApplyErrorEscalationCount": 0,
        "RecoverableErrorStopRetryAfterThrottlingMax": false,
        "RecoverableErrorThrottling": true,
        "ApplyErrorFailOnTruncationDdl": false,
        "DataTruncationErrorPolicy": "LOG_ERROR",
        "ApplyErrorInsertPolicy": "LOG_ERROR",
        "EventErrorPolicy": "IGNORE",
        "ApplyErrorEscalationPolicy": "LOG_ERROR",
        "RecoverableErrorCount": -1,
        "DataErrorEscalationCount": 0,
        "TableErrorEscalationPolicy": "STOP_TASK",
        "RecoverableErrorInterval": 5,
        "ApplyErrorDeletePolicy": "IGNORE_RECORD",
        "TableErrorEscalationCount": 0,
        "FullLoadIgnoreConflicts": true,
        "DataErrorPolicy": "LOG_ERROR",
        "TableErrorPolicy": "SUSPEND_TABLE"
    },
    "TTSettings": {
        "TTS3Settings": null,
        "TTRecordSettings": null,
        "EnableTT": false
    },
    "FullLoadSettings": {
        "CommitRate": 10000,
        "StopTaskCachedChangesApplied": false,
        "StopTaskCachedChangesNotApplied": false,
        "MaxFullLoadSubTasks": 8,
        "TransactionConsistencyTimeout": 600,
        "CreatePkAfterFullLoad": false,
        "TargetTablePrepMode": "DROP_AND_CREATE"
    },
    "TargetMetadata": {
        "ParallelApplyBufferSize": 100,
        "ParallelApplyQueuesPerThread": 1,
        "ParallelApplyThreads": 0,
        "TargetSchema": "",
        "InlineLobMaxSize": 0,
        "ParallelLoadQueuesPerThread": 1,
        "SupportLobs": true,
        "LobChunkSize": 64,
        "TaskRecoveryTableEnabled": false,
        "ParallelLoadThreads": 0,
        "LobMaxSize": 200,
        "BatchApplyEnabled": false,
        "FullLobMode": false,
        "LimitedSizeLobMode": true,
        "LoadMaxFileSize": 0,
        "ParallelLoadBufferSize": 0
    },
    "BeforeImageSettings": {
        "EnableBeforeImage": true,
        "ColumnFilter": "all",
        "FieldName": "before-image"
    },
    "ControlTablesSettings": {
        "historyTimeslotInMinutes": 5,
        "HistoryTimeslotInMinutes": 5,
        "StatusTableEnabled": false,
        "SuspendedTablesTableEnabled": false,
        "HistoryTableEnabled": false,
        "ControlSchema": "",
        "FullLoadExceptionTableEnabled": false
    },
    "LoopbackPreventionSettings": null,
    "CharacterSetSettings": null,
    "FailTaskWhenCleanTaskResourceFailed": false,
    "ChangeProcessingTuning": {
        "StatementCacheSize": 50,
        "CommitTimeout": 1,
        "BatchApplyPreserveTransaction": true,
        "BatchApplyTimeoutMin": 1,
        "BatchSplitSize": 0,
        "BatchApplyTimeoutMax": 30,
        "MinTransactionSize": 1000,
        "MemoryKeepTime": 60,
        "BatchApplyMemoryLimit": 500,
        "MemoryLimitTotal": 1024
    },
    "ChangeProcessingDdlHandlingPolicy": {
        "HandleSourceTableDropped": true,
        "HandleSourceTableTruncated": true,
        "HandleSourceTableAltered": true
    },
    "PostProcessingRules": null
Nessuna risposta

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande