Data Pipeline error when using RegEx data format

Question

I'm working on a sample to output an Aurora query to a fixed-width file (each column is converted to a specific width in the file, regardless of column data length), but every attempt to use the RegEx data format results in a `This format cannot be written to` error. **Using the preset CSV or TSV formats are successful**. I'm currently first outputting the Aurora query to CSV (stored in S3), then pulling that file and attempting to do the conversion via RegEx. I'm following the example at https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-regex.html but for some reason it's not working for me.

**Does anyone have any thoughts on what I'm missing or other things I could try to get this working?**

## Aurora
For this sample, it's just a simple 3 column table: INT, VARCHAR, VARCHAR

## CSV
Again, really simple:

> 1,xxxxxxxxxxx,yyyyyyyy
> 
> 2,xxxxxxxxxxx,yyyyyyyyy
> 
> 3,xxxxxxxxx,yyyyyyyy

## RegEx
* Input Reg Ex: `(.)`
    * Above is just the most minimal grouping I could come up with after trying multiple others (including escaping the `\` and `,` in the example below
    * The real regex I would expect to use is `([^,]*)\,([^,]*)\,([^,]*)
`
* Output Format: `%1$s`
    * Ideally, I'd expect to see something like `%1$15s %2$15s %3$15s`
* Columns: tried every combination both with and without columns, which made no difference

## Pipeline Definition

{
      "objects": [
        {
          "s3EncryptionType": "NONE",
          "maximumRetries": "3",
          "dataFormat": {
            "ref": "DataFormatId_A0LHb"
          },
          "filePath": "s3://xxxx, 'YYYY-MM-dd-HH-mm-ss')}",
          "name": "Fixed Width File",
          "id": "DataNodeId_b4YBg",
          "type": "S3DataNode"
        },
        {
          "s3EncryptionType": "NONE",
          "dataFormat": {
            "ref": "DataFormatId_fIUdS"
          },
          "filePath": "s3://xxxxx, 'YYYY-MM-dd-HH-mm-ss')}",
          "name": "S3Bucket",
          "id": "DataNodeId_0vkYO",
          "type": "S3DataNode"
        },
        {
          "output": {
            "ref": "DataNodeId_0vkYO"
          },
          "input": {
            "ref": "DataNodeId_axAKH"
          },
          "name": "CopyActivity",
          "id": "CopyActivityId_4fSv7",
          "runsOn": {
            "ref": "ResourceId_B2kdU"
          },
          "type": "CopyActivity"
        },
        {
          "inputRegEx": "(.)",
          "name": "Fixed Width Format",
          "id": "DataFormatId_A0LHb",
          "type": "RegEx",
          "outputFormat": "%1$s"
        },
        {
          "subnetId": "subnet-xxxx",
          "resourceRole": "DataPipelineDefaultResourceRole",
          "role": "DataPipelineDefaultRole",
          "securityGroupIds": "sg-xxx",
          "instanceType": "t1.micro",
          "actionOnTaskFailure": "terminate",
          "name": "EC2",
          "id": "ResourceId_B2kdU",
          "type": "Ec2Resource",
          "terminateAfter": "5 Minutes"
        },
        {
          "name": "CSV Format",
          "id": "DataFormatId_fIUdS",
          "type": "CSV"
        },
        {
          "output": {
            "ref": "DataNodeId_b4YBg"
          },
          "input": {
            "ref": "DataNodeId_0vkYO"
          },
          "dependsOn": {
            "ref": "CopyActivityId_4fSv7"
          },
          "maximumRetries": "3",
          "name": "Change Format",
          "runsOn": {
            "ref": "ResourceId_B2kdU"
          },
          "id": "CopyActivityId_YytI2",
          "type": "CopyActivity"
        },
        {
          "failureAndRerunMode": "CASCADE",
          "resourceRole": "DataPipelineDefaultResourceRole",
          "role": "DataPipelineDefaultRole",
          "pipelineLogUri": "s3://xxxxx/",
          "scheduleType": "ONDEMAND",
          "name": "Default",
          "id": "Default"
        },
        {
          "connectionString": "jdbc:mysql://xxxxxxx",
          "*password": "xxxx",
          "name": "Aurora",
          "id": "DatabaseId_HL9uz",
          "type": "JdbcDatabase",
          "jdbcDriverClass": "com.mysql.jdbc.Driver",
          "username": "xxxx"
        },
        {
          "database": {
            "ref": "DatabaseId_HL9uz"
          },
          "name": "Aurora Table",
          "id": "DataNodeId_axAKH",
          "type": "SqlDataNode",
          "selectQuery": "select * from policy",
          "table": "policy"
        }
      ],
      "parameters": []
    }

Answer

Thank you for bringing this up. Could you please raise a support ticket with Data Pipeline Premium Support so that we can help you on this issue.

Data Pipeline error when using RegEx data format

Aurora

CSV

RegEx

Pipeline Definition

関連するコンテンツ