Amazon DMS table mapping tranformation

0

Details:

Source: Postgres Target: S3 Format: parquet

Hi Guys! Currently, in default mode, the DMS task is reading a boolean column in the source and automatically converting to a char format in the target.

Theoretically Amazon DMS provides a step to transfor data in the mapping rules task.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Selections.html

However after some attempts the data was not correctly converted in the target. The follow mapping transformations was tried.

{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "test_schema",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "transformation",
            "rule-id": "2",
            "rule-name": "2",
            "rule-action": "change-data-type",
            "rule-target": "column",
            "object-locator": {
                "schema-name": "test_schema",
                "table-name": "table_test",
                "column-name": "column1"
            },
            "data-type": {
                "type": "int8"
            }
        },
        {
            "rule-type": "transformation",
            "rule-id": "3",
            "rule-name": "3",
            "rule-action": "change-data-type",
            "rule-target": "column",
            "object-locator": {
                "schema-name": "test_schema",
                "table-name": "table_test",
                "column-name": "doc"
            },
            "data-type": {
                "type": "boolean"
            }
        }
    ]
}

I undertood that DMS automatically convert the from Boolean type to the char type, following the doc below.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html

Ok, great...so the "doc" column is currently a "char" type.

But, what's the reason the "doc" column has not converted to boolean type in the target when used the transformation above?

Would it be for that reason?

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.html

"AWS DMS supports column data type transformations for the following DMS data types: "bytes", "date", "time", "datetime", "int1", "int2", "int4", "int8", "numeric", "real4", "real8", "string", "uint1", "uint2", "uint4", "uint8", "wstring", "blob", "nclob", "clob", "boolean", "set", "list" "map", "tuple""

asked 2 years ago2598 views
2 Answers
1

Hi,

From the mapping rule mentioned, it seems like you want to achieve

  1. Source PG (boolean) --> Target S3 (int 8)
  2. Source PG (boolean) --> Target S3 (boolean)

Correct me if I am wrong here. Based on documentation: "AWS DMS supports column data type transformations for the following DMS data types"

So the key point here is DMS data types.

Using 1 as example, source pg having boolean, this will convert to CHAR (5) as (DMS data type) https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html#CHAP_Source-PostgreSQL-DataTypes. To get to the target as int 8 then s3 (DMS datatype) should be "INT 1" https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.DataTypes

{
  "rule-type": "transformation",
  "rule-id": "1",
  "rule-name": "1",
  "rule-action": "change-data-type",
  "rule-target": "column",
  "object-locator": {
    "schema-name": "my_db",
    "table-name": "my_table",                
    "column-name": "col",    
    "data-type": "char5"       // DMS Internal data type for pg, boolean -> char5
  },
  "data-type": {
    "type": "int1"               // s3 int8, that is, int1 -> int8
  }
}

However as you pointed correctly, from the list of the supported DMS datatypes: "AWS DMS supports column data type transformations for the following DMS data types: "bytes", "date", "time", "datetime", "int1", "int2", "int4", "int8", "numeric", "real4", "real8", "string", "uint1", "uint2", "uint4", "uint8", "wstring", "blob", "nclob", "clob", "boolean", "set", "list" "map", "tuple"

"char5" is not in the list thus I am suspecting this might cause some issue.

From the release note DMS 3.3.3, it mentioned that: Fixed an issue where boolean columns were transformed to incorrect types with Apache Parquet output.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReleaseNotes.html#CHAP_ReleaseNotes.DMS333

To investigate further, Please confirm:

  1. What is the DMS version
  2. May you also share a sample Source PG DDL
  3. DMS task settings that being used (JSON format)
  4. Source and target endpoint settings/ECAs that are being used
  5. complete mapping if above is not the complete version.

If you have support plan, please raise a support case and let me know the support case number in case I need to review the resources. If not I will try to address the issue through repost.

Found a similar post from: https://forums.aws.amazon.com/thread.jspa?threadID=233738&tstart=0

AWS
SUPPORT ENGINEER
Kevin_Z
answered 2 years ago
0

Hi, thank you for analysis. I am enriching with asked information. I changed some names for security, but I believe that it will not impact your analysis.

Just a point, in the mapping rule just "doc" column is a "boolean" datatype in the source and should be a "boolean" in the s3 too. The column "column1" is a "decimal" type in the source and should be a "int8" in the s3 target. However transformation is not working in both.

***That what I would like to achieve.


Source PG - Column "column1" (decimal) --> Target S3 (int 8)
Source PG - Column "doc" (boolean) --> Target S3 (boolean)

*1 - Amazon DMS Engine Version: 3.4.6 * *2 - May you also share a sample Source PG DDL *

CREATE TABLE test_schema.test_table (
	test_id int4 NOT NULL,
	test2 bool NULL,
	doc bool NULL,  # Column in the mapping
	test3 bool NULL,
	test4 bool NULL,
	test5 varchar NULL,
	test6 bool NULL,
	test7 timestamp NOT NULL DEFAULT '2018-11-13 20:24:11.077775'::timestamp without time zone,
	test8 timestamp NULL,
	test9 bool NOT NULL DEFAULT false,
	column1 numeric NULL,  # Column in the mapping
	CONSTRAINT test_table_id_key UNIQUE (test_id)
);

*3 - Amazon DMS Task Settings *

{
    "TargetMetadata": {
        "TargetSchema": "",
        "SupportLobs": true,
        "FullLobMode": false,
        "LobChunkSize": 250,
        "LimitedSizeLobMode": true,
        "LobMaxSize": 8000,
        "InlineLobMaxSize": 0,
        "LoadMaxFileSize": 0,
        "ParallelLoadThreads": 0,
        "ParallelLoadBufferSize": 0,
        "BatchApplyEnabled": false,
        "TaskRecoveryTableEnabled": false,
        "ParallelLoadQueuesPerThread": 0,
        "ParallelApplyThreads": 0,
        "ParallelApplyBufferSize": 0,
        "ParallelApplyQueuesPerThread": 0
    },
    "FullLoadSettings": {
        "TargetTablePrepMode": "DROP_AND_CREATE",
        "CreatePkAfterFullLoad": false,
        "StopTaskCachedChangesApplied": false,
        "StopTaskCachedChangesNotApplied": false,
        "MaxFullLoadSubTasks": 16,
        "TransactionConsistencyTimeout": 1800,
        "CommitRate": 10000
    },
    "Logging": {
        "EnableLogging": true,
        "LogComponents": [
            {
                "Id": "TRANSFORMATION",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "SOURCE_UNLOAD",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "IO",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "TARGET_LOAD",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "PERFORMANCE",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "SOURCE_CAPTURE",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "SORTER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "REST_SERVER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "VALIDATOR_EXT",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "TARGET_APPLY",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "TASK_MANAGER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "TABLES_MANAGER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "METADATA_MANAGER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "FILE_FACTORY",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "COMMON",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "ADDONS",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "DATA_STRUCTURE",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "COMMUNICATION",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "FILE_TRANSFER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            }
        ],
        "CloudWatchLogGroup": "dms-tasks-dev-use1-rep-instance",
        "CloudWatchLogStream": "dms-task-5MVMA7VE4IBVH2O2QQJBL76BMVFDAJ4D6PJ42UI"
    },
    "ControlTablesSettings": {
        "historyTimeslotInMinutes": 5,
        "ControlSchema": "dms",
        "HistoryTimeslotInMinutes": 5,
        "HistoryTableEnabled": false,
        "SuspendedTablesTableEnabled": false,
        "StatusTableEnabled": false,
        "FullLoadExceptionTableEnabled": false
    },
    "StreamBufferSettings": {
        "StreamBufferCount": 3,
        "StreamBufferSizeInMB": 8,
        "CtrlStreamBufferSizeInMB": 5
    },
    "ChangeProcessingDdlHandlingPolicy": {
        "HandleSourceTableDropped": true,
        "HandleSourceTableTruncated": true,
        "HandleSourceTableAltered": true
    },
    "ErrorBehavior": {
        "DataErrorPolicy": "LOG_ERROR",
        "DataTruncationErrorPolicy": "LOG_ERROR",
        "DataErrorEscalationPolicy": "SUSPEND_TABLE",
        "DataErrorEscalationCount": 0,
        "TableErrorPolicy": "SUSPEND_TABLE",
        "TableErrorEscalationPolicy": "LOG_ERROR",
        "TableErrorEscalationCount": 10,
        "RecoverableErrorCount": -1,
        "RecoverableErrorInterval": 5,
        "RecoverableErrorThrottling": true,
        "RecoverableErrorThrottlingMax": 1800,
        "RecoverableErrorStopRetryAfterThrottlingMax": false,
        "ApplyErrorDeletePolicy": "IGNORE_RECORD",
        "ApplyErrorInsertPolicy": "LOG_ERROR",
        "ApplyErrorUpdatePolicy": "LOG_ERROR",
        "ApplyErrorEscalationPolicy": "LOG_ERROR",
        "ApplyErrorEscalationCount": 0,
        "ApplyErrorFailOnTruncationDdl": false,
        "FullLoadIgnoreConflicts": true,
        "FailOnTransactionConsistencyBreached": false,
        "FailOnNoTablesCaptured": false
    },
    "ChangeProcessingTuning": {
        "BatchApplyPreserveTransaction": true,
        "BatchApplyTimeoutMin": 1,
        "BatchApplyTimeoutMax": 30,
        "BatchApplyMemoryLimit": 500,
        "BatchSplitSize": 0,
        "MinTransactionSize": 1000,
        "CommitTimeout": 1,
        "MemoryLimitTotal": 1024,
        "MemoryKeepTime": 60,
        "StatementCacheSize": 50
    },
    "PostProcessingRules": null,
    "CharacterSetSettings": null,
    "LoopbackPreventionSettings": null,
    "BeforeImageSettings": null,
    "FailTaskWhenCleanTaskResourceFailed": false,
    "TTSettings": null
}

*4 - Source and target endpoint settings/ECAs that are being used *

  • Source Endpoint
{
    "DatabaseName": "data_samples",
    "Port": 999999,
    "ServerName": "test-test2-postgresql.dwghr5cbjksyu.us-east-1.rds.amazonaws.com",
    "Username": "test_dms"
}
  • Target Endpoint
{
    "ServiceAccessRoleArn": "arn:aws:iam::98998898989898:role/dev-test-testdms-target-data",
    "ExternalTableDefinition": "",
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": ",",
    "BucketFolder": "test/test/data-samples",
    "BucketName": "dev-test-data-test",
    "CompressionType": "GZIP",
    "EnableStatistics": true,
    "DatePartitionEnabled": false
}

*5 - complete mapping if above is not the complete version. * The complete mapping is in the above Comment.

answered 2 years ago
  • Thanks for your input here. This is really helpful in troubleshooting it further. Currently, service team is aware of the issue and it has been raised as a feature request that DMS need to support (PG boolean to boolean migration).

    No work around at the moment as change-data-type not supporting char5

    You may keep track of the new release note through: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReleaseNotes.html

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions