Skip to content

How do I optimize AWS DMS memory usage for migration?

6 minute read
1

I have an AWS Database Migration Service (AWS DMS) task that uses more or less memory than expected. I want to optimize the memory usage of my replication instance.

Short description

An AWS DMS replication instance uses memory to run the replication engine. This engine runs SELECT statements on the source engine during the full load phase. Also, the replication engine reads from the source engine's transaction log during the change data capture (CDC) phase. AWS DMS migrates these records to the target, and then compares them against the corresponding records on the target database during the validation process.

AWS DMS also uses memory for task configuration and for the flow of data from source to target.

Resolution

Optimize memory for tasks with limited LOB settings

When you use an AWS DMS task with limited large binary object (LOB) settings to migrate data, AWS DMS allocates memory based on the LobMaxSize for each LOB column. If you set this value too high, then your task might fail. This failure occurs because of an "Out of Memory (OOM)" error, based on the number of records that you're migrating and the CommitRate.

If you configure your task with high values, then check your task settings to make sure that the AWS DMS instance has enough memory.

Example JSON task settings:

{
  "TargetMetadata": {
  "SupportLobs": true,
  "FullLobMode": false,
  "LobChunkSize": 0,
  "LimitedSizeLobMode": true,
  "LobMaxSize": 63,
  "InlineLobMaxSize": 0,
  }

For more information, see Setting LOB support for source databases in an AWS DMS task.

Optimize memory for tasks with ValidationEnabled

If you use an AWS DMS task that has the setting ValidationEnabled=true during migration, then additional memory usage might occur. This happens because AWS DMS retrieves ThreadCount and PartitionSize records from both the source and target databases. Then, AWS DMS compares the corresponding data on the replication instance. So, you see additional memory usage on the replication instance, source database, and target database during migration.

To limit the amount of memory in use, use SkipLobColumns to ignore LOB columns. You can also perform validation separately from the migration task if you use a separate replication instance or AWS DMS task. To do this, use the ValidationOnly setting:

"ValidationSettings": {
  "EnableValidation": true,
  "ThreadCount": 5,
  "PartitionSize": 10000,
  "ValidationOnly": false,
  "SkipLobColumns": false,
  },

For more information, see AWS DMS data validation.

Optimize memory for tasks with parallel threads in full load and CDC phases

When you use a non-relational database (RDBMS) target, then ParallelLoadThreads and ParallelLoadBufferSize determine the number of threads and the size of data transfer to the target. ParallelApplyThreads and ParallelApplyBufferSize determine the number of threads and the size of data transfer during the CDC phase. AWS DMS holds the data that you pull from the source in ParallelLoadQueuesPerThread and ParallelApplyQueuesPerThread. When you tune these settings, make sure that the AWS DMS instance and target have the capacity to handle the workload.

Example settings:

{
  "TargetMetadata": {
    "ParallelLoadThreads": 0,
    "ParallelLoadBufferSize": 0,
    "ParallelLoadQueuesPerThread": 0,
    "ParallelApplyThreads": 0,
    "ParallelApplyBufferSize": 0,
    "ParallelApplyQueuesPerThread": 0
  },

For more information on these settings, see Target metadata task settings.

Optimize memory for tasks with batch apply settings

When you use an AWS DMS task with batch apply settings, the default batch configuration can handle the normal workload. In batch process, the BatchApplyTimeoutMin, BatchApplyTimeoutMax, and BatchApplyMemoryLimit settings determine the size of the batch and the frequency of the batch's application on the target. These settings work together to apply changes in batch. If you need to tune these settings because of heavy workload on the source, make sure that the AWS DMS instance has enough memory. Otherwise, an OOM error might occur.

Make sure that you don't set BatchApplyMemoryLimit to more than the size of the replication instance memory or an OOM error might occur. Note the other tasks that run simultaneously with the AWS DMS task that you're using for migration when you set BatchApplyMemoryLimit.

If you set BatchApplyPreserveTransaction to true across multiple batches, then AWS DMS retains long-running transactions in memory. This can also cause OOM errors, based on the memory settings.

To set the number of changes to include in every batch and to limit memory consumption, use the BatchSplitSize setting.

Example settings:

{
  "TargetMetadata": {
    "BatchApplyEnabled": false,
  },
},
  "ChangeProcessingTuning": {
    "BatchApplyPreserveTransaction": true,
    "BatchApplyTimeoutMin": 1,
    "BatchApplyTimeoutMax": 30,
    "BatchApplyMemoryLimit": 500,
    "BatchSplitSize": 0,
  },

Use memory-related task settings

During the CDC phase, MinTransactionSize determines how many changes happen in each transaction. MemorylimitTotal controls the size of transactions on the replication instance. When you run multiple CDC tasks that need a lot of memory, use these settings with values based on each task's transactional workload.

To limit the memory that long-running transactions on the source consume, use the MemoryKeepTime setting. Or, if large batches of INSERT or UPDATE statements are running on the source, then increase this time. Increase this time to retain the changes from processing in the net changes table.

To control the number of prepared statements that AWS DMS stores on the replication instance, use the StatementCacheSize setting.

If your AWS DMS replication instance contains a large volume of free memory, then tune the settings similar to the following example:

"ChangeProcessingTuning": {
    "MinTransactionSize": 1000,
    "CommitTimeout": 1,
    "MemoryLimitTotal": 1024,
    "MemoryKeepTime":
  60,
    "StatementCacheSize": 50
  },

This example uses settings that allow AWS DMS to handle the workload in memory itself. This means that AWS DMS doesn't frequently flush to storage.

For more information on these settings, see Change processing tuning settings.

Monitor the memory usage of your replication instance

To monitor the memory usage of your replication instance, sort your tasks by MemoryUsage to isolate the single task that consumes the most memory. To learn why the task is holding memory, compare CDCChangesMemorySource and CDCChangesMemoryTarget. Then, troubleshoot the respective endpoint.

The replication instance uses minimal memory to run the replication engine. To check if additional AWS DMS tasks can run on the replication instance, review the AvailableMemory metric in Amazon CloudWatch. Then, create a new task to use the amount of FreeMemory available. When you run the AWS DMS task, monitor FreeMemory and SwapUsage to see if resource contention is a concern. For more information, see Replication instance metrics.

Test memory usage

To gauge how much memory that your AWS DMS task uses, test an instance with the same configuration in a development or staging environment.

Perform a proof of concept migration before you migrate production data.

Related information

Choosing the right AWS DMS replication instance for your migration

Selecting the best size for a replication instance

AWS OFFICIALUpdated 9 months ago