Amazon DMS table mapping tranformation

0

Details:

Source: Postgres Target: S3 Format: parquet

Hi Guys! Currently, in default mode, the DMS task is reading a boolean column in the source and automatically converting to a char format in the target.

Theoretically Amazon DMS provides a step to transfor data in the mapping rules task.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Selections.html

However after some attempts the data was not correctly converted in the target. The follow mapping transformations was tried.

{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "test_schema",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "transformation",
            "rule-id": "2",
            "rule-name": "2",
            "rule-action": "change-data-type",
            "rule-target": "column",
            "object-locator": {
                "schema-name": "test_schema",
                "table-name": "table_test",
                "column-name": "column1"
            },
            "data-type": {
                "type": "int8"
            }
        },
        {
            "rule-type": "transformation",
            "rule-id": "3",
            "rule-name": "3",
            "rule-action": "change-data-type",
            "rule-target": "column",
            "object-locator": {
                "schema-name": "test_schema",
                "table-name": "table_test",
                "column-name": "doc"
            },
            "data-type": {
                "type": "boolean"
            }
        }
    ]
}

I undertood that DMS automatically convert the from Boolean type to the char type, following the doc below.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html

Ok, great...so the "doc" column is currently a "char" type.

But, what's the reason the "doc" column has not converted to boolean type in the target when used the transformation above?

Would it be for that reason?

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.html

"AWS DMS supports column data type transformations for the following DMS data types: "bytes", "date", "time", "datetime", "int1", "int2", "int4", "int8", "numeric", "real4", "real8", "string", "uint1", "uint2", "uint4", "uint8", "wstring", "blob", "nclob", "clob", "boolean", "set", "list" "map", "tuple""

gefragt vor 2 Jahren2666 Aufrufe
2 Antworten
1

Hi,

From the mapping rule mentioned, it seems like you want to achieve

  1. Source PG (boolean) --> Target S3 (int 8)
  2. Source PG (boolean) --> Target S3 (boolean)

Correct me if I am wrong here. Based on documentation: "AWS DMS supports column data type transformations for the following DMS data types"

So the key point here is DMS data types.

Using 1 as example, source pg having boolean, this will convert to CHAR (5) as (DMS data type) https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html#CHAP_Source-PostgreSQL-DataTypes. To get to the target as int 8 then s3 (DMS datatype) should be "INT 1" https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.DataTypes

{
  "rule-type": "transformation",
  "rule-id": "1",
  "rule-name": "1",
  "rule-action": "change-data-type",
  "rule-target": "column",
  "object-locator": {
    "schema-name": "my_db",
    "table-name": "my_table",                
    "column-name": "col",    
    "data-type": "char5"       // DMS Internal data type for pg, boolean -> char5
  },
  "data-type": {
    "type": "int1"               // s3 int8, that is, int1 -> int8
  }
}

However as you pointed correctly, from the list of the supported DMS datatypes: "AWS DMS supports column data type transformations for the following DMS data types: "bytes", "date", "time", "datetime", "int1", "int2", "int4", "int8", "numeric", "real4", "real8", "string", "uint1", "uint2", "uint4", "uint8", "wstring", "blob", "nclob", "clob", "boolean", "set", "list" "map", "tuple"

"char5" is not in the list thus I am suspecting this might cause some issue.

From the release note DMS 3.3.3, it mentioned that: Fixed an issue where boolean columns were transformed to incorrect types with Apache Parquet output.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReleaseNotes.html#CHAP_ReleaseNotes.DMS333

To investigate further, Please confirm:

  1. What is the DMS version
  2. May you also share a sample Source PG DDL
  3. DMS task settings that being used (JSON format)
  4. Source and target endpoint settings/ECAs that are being used
  5. complete mapping if above is not the complete version.

If you have support plan, please raise a support case and let me know the support case number in case I need to review the resources. If not I will try to address the issue through repost.

Found a similar post from: https://forums.aws.amazon.com/thread.jspa?threadID=233738&tstart=0

AWS
SUPPORT-TECHNIKER
Kevin_Z
beantwortet vor 2 Jahren
0

Hi, thank you for analysis. I am enriching with asked information. I changed some names for security, but I believe that it will not impact your analysis.

Just a point, in the mapping rule just "doc" column is a "boolean" datatype in the source and should be a "boolean" in the s3 too. The column "column1" is a "decimal" type in the source and should be a "int8" in the s3 target. However transformation is not working in both.

***That what I would like to achieve.


Source PG - Column "column1" (decimal) --> Target S3 (int 8)
Source PG - Column "doc" (boolean) --> Target S3 (boolean)

*1 - Amazon DMS Engine Version: 3.4.6 * *2 - May you also share a sample Source PG DDL *

CREATE TABLE test_schema.test_table (
	test_id int4 NOT NULL,
	test2 bool NULL,
	doc bool NULL,  # Column in the mapping
	test3 bool NULL,
	test4 bool NULL,
	test5 varchar NULL,
	test6 bool NULL,
	test7 timestamp NOT NULL DEFAULT '2018-11-13 20:24:11.077775'::timestamp without time zone,
	test8 timestamp NULL,
	test9 bool NOT NULL DEFAULT false,
	column1 numeric NULL,  # Column in the mapping
	CONSTRAINT test_table_id_key UNIQUE (test_id)
);

*3 - Amazon DMS Task Settings *

{
    "TargetMetadata": {
        "TargetSchema": "",
        "SupportLobs": true,
        "FullLobMode": false,
        "LobChunkSize": 250,
        "LimitedSizeLobMode": true,
        "LobMaxSize": 8000,
        "InlineLobMaxSize": 0,
        "LoadMaxFileSize": 0,
        "ParallelLoadThreads": 0,
        "ParallelLoadBufferSize": 0,
        "BatchApplyEnabled": false,
        "TaskRecoveryTableEnabled": false,
        "ParallelLoadQueuesPerThread": 0,
        "ParallelApplyThreads": 0,
        "ParallelApplyBufferSize": 0,
        "ParallelApplyQueuesPerThread": 0
    },
    "FullLoadSettings": {
        "TargetTablePrepMode": "DROP_AND_CREATE",
        "CreatePkAfterFullLoad": false,
        "StopTaskCachedChangesApplied": false,
        "StopTaskCachedChangesNotApplied": false,
        "MaxFullLoadSubTasks": 16,
        "TransactionConsistencyTimeout": 1800,
        "CommitRate": 10000
    },
    "Logging": {
        "EnableLogging": true,
        "LogComponents": [
            {
                "Id": "TRANSFORMATION",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "SOURCE_UNLOAD",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "IO",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "TARGET_LOAD",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "PERFORMANCE",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "SOURCE_CAPTURE",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "SORTER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "REST_SERVER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "VALIDATOR_EXT",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "TARGET_APPLY",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "TASK_MANAGER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "TABLES_MANAGER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "METADATA_MANAGER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "FILE_FACTORY",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "COMMON",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "ADDONS",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "DATA_STRUCTURE",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "COMMUNICATION",
                "Severity": "LOGGER_SEVERITY_ERROR"
            },
            {
                "Id": "FILE_TRANSFER",
                "Severity": "LOGGER_SEVERITY_ERROR"
            }
        ],
        "CloudWatchLogGroup": "dms-tasks-dev-use1-rep-instance",
        "CloudWatchLogStream": "dms-task-5MVMA7VE4IBVH2O2QQJBL76BMVFDAJ4D6PJ42UI"
    },
    "ControlTablesSettings": {
        "historyTimeslotInMinutes": 5,
        "ControlSchema": "dms",
        "HistoryTimeslotInMinutes": 5,
        "HistoryTableEnabled": false,
        "SuspendedTablesTableEnabled": false,
        "StatusTableEnabled": false,
        "FullLoadExceptionTableEnabled": false
    },
    "StreamBufferSettings": {
        "StreamBufferCount": 3,
        "StreamBufferSizeInMB": 8,
        "CtrlStreamBufferSizeInMB": 5
    },
    "ChangeProcessingDdlHandlingPolicy": {
        "HandleSourceTableDropped": true,
        "HandleSourceTableTruncated": true,
        "HandleSourceTableAltered": true
    },
    "ErrorBehavior": {
        "DataErrorPolicy": "LOG_ERROR",
        "DataTruncationErrorPolicy": "LOG_ERROR",
        "DataErrorEscalationPolicy": "SUSPEND_TABLE",
        "DataErrorEscalationCount": 0,
        "TableErrorPolicy": "SUSPEND_TABLE",
        "TableErrorEscalationPolicy": "LOG_ERROR",
        "TableErrorEscalationCount": 10,
        "RecoverableErrorCount": -1,
        "RecoverableErrorInterval": 5,
        "RecoverableErrorThrottling": true,
        "RecoverableErrorThrottlingMax": 1800,
        "RecoverableErrorStopRetryAfterThrottlingMax": false,
        "ApplyErrorDeletePolicy": "IGNORE_RECORD",
        "ApplyErrorInsertPolicy": "LOG_ERROR",
        "ApplyErrorUpdatePolicy": "LOG_ERROR",
        "ApplyErrorEscalationPolicy": "LOG_ERROR",
        "ApplyErrorEscalationCount": 0,
        "ApplyErrorFailOnTruncationDdl": false,
        "FullLoadIgnoreConflicts": true,
        "FailOnTransactionConsistencyBreached": false,
        "FailOnNoTablesCaptured": false
    },
    "ChangeProcessingTuning": {
        "BatchApplyPreserveTransaction": true,
        "BatchApplyTimeoutMin": 1,
        "BatchApplyTimeoutMax": 30,
        "BatchApplyMemoryLimit": 500,
        "BatchSplitSize": 0,
        "MinTransactionSize": 1000,
        "CommitTimeout": 1,
        "MemoryLimitTotal": 1024,
        "MemoryKeepTime": 60,
        "StatementCacheSize": 50
    },
    "PostProcessingRules": null,
    "CharacterSetSettings": null,
    "LoopbackPreventionSettings": null,
    "BeforeImageSettings": null,
    "FailTaskWhenCleanTaskResourceFailed": false,
    "TTSettings": null
}

*4 - Source and target endpoint settings/ECAs that are being used *

  • Source Endpoint
{
    "DatabaseName": "data_samples",
    "Port": 999999,
    "ServerName": "test-test2-postgresql.dwghr5cbjksyu.us-east-1.rds.amazonaws.com",
    "Username": "test_dms"
}
  • Target Endpoint
{
    "ServiceAccessRoleArn": "arn:aws:iam::98998898989898:role/dev-test-testdms-target-data",
    "ExternalTableDefinition": "",
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": ",",
    "BucketFolder": "test/test/data-samples",
    "BucketName": "dev-test-data-test",
    "CompressionType": "GZIP",
    "EnableStatistics": true,
    "DatePartitionEnabled": false
}

*5 - complete mapping if above is not the complete version. * The complete mapping is in the above Comment.

beantwortet vor 2 Jahren
  • Thanks for your input here. This is really helpful in troubleshooting it further. Currently, service team is aware of the issue and it has been raised as a feature request that DMS need to support (PG boolean to boolean migration).

    No work around at the moment as change-data-type not supporting char5

    You may keep track of the new release note through: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReleaseNotes.html

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen