AWD DMS Replace Character

0

Hi !

I configured a AWS DMS containing an Oracle as Source database and a S3 Bucket as target.

Problem Statement: My Oracle database contains Horizontal Tab and Line feed characters on the row content. I would like to replace/remove those characters.

I tried to add CharacterSetSettings to Migration Task Settings but it's not supported for S3 as target. Also the rfc4180 property add quotes to the value but it doesn't solve the problem when the consumers try to load the row in a redshift.

Is there another way that I can remove those characters without setting an extra resources/infrastructure?

Source endpoint settings:

{
    "AddSupplementalLogging": true,
    "ExtraArchivedLogDestIds": [],
    "AllowSelectNestedTables": true,
    "DatabaseName": "MyDatabase",
    "UseBFile": true,
    "UseLogminerReader": false,
    "SecretsManagerAccessRoleArn": "arn:aws:iam::0000011111:role/SynchroReplicationInfrastructureSecretsManagerReadWriteRole",
    "SecretsManagerSecretId": "arn:aws:secretsmanager:us-east-1:0000011111:secret:beta/xyzedf"
}

Target endpoint settings:

{
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": "\\t",
    "BucketFolder": "MyFolder",
    "BucketName": "MyBucketName",
    "CompressionType": "NONE",
    "DataFormat": "csv",
    "EnableStatistics": true,
    "IncludeOpForFullLoad": true,
    "TimestampColumnName": "transactionCommitTimestamp",
    "DatePartitionEnabled": false,
    "AddColumnName": false,
    "MaxFileSize": 2000
}
Marcus
posta 2 anni fa501 visualizzazioni
2 Risposte
0

Have you tried using replace function to modify the contents as per https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Expressions.html#:~:text=%3A%2050%0A%7D%0A%7D-,Using%20SQLite%20functions%20to%20build%20expressions,-You%20use%20table

replace(x,y,z)

The replace(x,y,z) function returns a string formed by substituting string z for every occurrence of string y in string x.

I hope it helps.

AWS
con risposta 2 anni fa
  • What is the right syntax for the replace expression? I tried this, but didn't work:

        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "rule-action": "include-column",
          "value": null,
          "old-value": null
        }
    
  • "rule-action": "add-column" needed

    which could look like this. It will select table, remove COL_NAME and lastly add new column UPDATED_COL_NAME with expression replace($COL_NAME, '\n','') this should work unless DMS is not supporting escape character \ in that case you can try updating the source table data to remove the \n

    Or create a view which is replacing the \n in column but in that case only full load is supported.

    {
      "rules": [
        {
          "rule-type": "selection",
          "rule-id": "1",
          "rule-name": "1",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME"
          },
          "rule-action": "include"
        },
        {
          "rule-type": "transformation",
          "rule-id": "2",
          "rule-name": "2",
          "rule-action": "remove-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          }
        },
        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-action": "add-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "value": "UPDATED_COL_NAME",
          "data-type": {
            "type": "string",
            "length": 50
          }
        }
      ]
    }                   
    
  • It worked partially, I couldn't make it work for more than one character replacement. I tried "expression": "replace(replace($COL_NAME, '\n',''), '\t', ' ')" and replace($COL_NAME, '/[\t\n]/g,''). Do you know how can I replace multiple characters in the same string?

0

To solve this issue I added lambda to remove those characters.

I used to have the following flow:

Oracle DB -> AWS DMS -> S3 Bucket

Current:

Oracle DB -> AWS DMS -> S3 Bucket1 -> Lambda -> S3 Bucket2

Marcus
con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande