Data Pipeline error when using RegEx data format

0

I'm working on a sample to output an Aurora query to a fixed-width file (each column is converted to a specific width in the file, regardless of column data length), but every attempt to use the RegEx data format results in a This format cannot be written to error. Using the preset CSV or TSV formats are successful. I'm currently first outputting the Aurora query to CSV (stored in S3), then pulling that file and attempting to do the conversion via RegEx. I'm following the example at https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-regex.html but for some reason it's not working for me.

Does anyone have any thoughts on what I'm missing or other things I could try to get this working?

Aurora

For this sample, it's just a simple 3 column table: INT, VARCHAR, VARCHAR

CSV

Again, really simple:

1,xxxxxxxxxxx,yyyyyyyy

2,xxxxxxxxxxx,yyyyyyyyy

3,xxxxxxxxx,yyyyyyyy

RegEx

  • Input Reg Ex: (.)
    • Above is just the most minimal grouping I could come up with after trying multiple others (including escaping the \ and , in the example below
    • The real regex I would expect to use is ([^,]*)\,([^,]*)\,([^,]*)\n
  • Output Format: %1$s
    • Ideally, I'd expect to see something like %1$15s %2$15s %3$15s
  • Columns: tried every combination both with and without columns, which made no difference

Pipeline Definition

{
  "objects": [
    {
      "s3EncryptionType": "NONE",
      "maximumRetries": "3",
      "dataFormat": {
        "ref": "DataFormatId_A0LHb"
      },
      "filePath": "s3://xxxx, 'YYYY-MM-dd-HH-mm-ss')}",
      "name": "Fixed Width File",
      "id": "DataNodeId_b4YBg",
      "type": "S3DataNode"
    },
    {
      "s3EncryptionType": "NONE",
      "dataFormat": {
        "ref": "DataFormatId_fIUdS"
      },
      "filePath": "s3://xxxxx, 'YYYY-MM-dd-HH-mm-ss')}",
      "name": "S3Bucket",
      "id": "DataNodeId_0vkYO",
      "type": "S3DataNode"
    },
    {
      "output": {
        "ref": "DataNodeId_0vkYO"
      },
      "input": {
        "ref": "DataNodeId_axAKH"
      },
      "name": "CopyActivity",
      "id": "CopyActivityId_4fSv7",
      "runsOn": {
        "ref": "ResourceId_B2kdU"
      },
      "type": "CopyActivity"
    },
    {
      "inputRegEx": "(.)",
      "name": "Fixed Width Format",
      "id": "DataFormatId_A0LHb",
      "type": "RegEx",
      "outputFormat": "%1$s"
    },
    {
      "subnetId": "subnet-xxxx",
      "resourceRole": "DataPipelineDefaultResourceRole",
      "role": "DataPipelineDefaultRole",
      "securityGroupIds": "sg-xxx",
      "instanceType": "t1.micro",
      "actionOnTaskFailure": "terminate",
      "name": "EC2",
      "id": "ResourceId_B2kdU",
      "type": "Ec2Resource",
      "terminateAfter": "5 Minutes"
    },
    {
      "name": "CSV Format",
      "id": "DataFormatId_fIUdS",
      "type": "CSV"
    },
    {
      "output": {
        "ref": "DataNodeId_b4YBg"
      },
      "input": {
        "ref": "DataNodeId_0vkYO"
      },
      "dependsOn": {
        "ref": "CopyActivityId_4fSv7"
      },
      "maximumRetries": "3",
      "name": "Change Format",
      "runsOn": {
        "ref": "ResourceId_B2kdU"
      },
      "id": "CopyActivityId_YytI2",
      "type": "CopyActivity"
    },
    {
      "failureAndRerunMode": "CASCADE",
      "resourceRole": "DataPipelineDefaultResourceRole",
      "role": "DataPipelineDefaultRole",
      "pipelineLogUri": "s3://xxxxx/",
      "scheduleType": "ONDEMAND",
      "name": "Default",
      "id": "Default"
    },
    {
      "connectionString": "jdbc:mysql://xxxxxxx",
      "*password": "xxxx",
      "name": "Aurora",
      "id": "DatabaseId_HL9uz",
      "type": "JdbcDatabase",
      "jdbcDriverClass": "com.mysql.jdbc.Driver",
      "username": "xxxx"
    },
    {
      "database": {
        "ref": "DatabaseId_HL9uz"
      },
      "name": "Aurora Table",
      "id": "DataNodeId_axAKH",
      "type": "SqlDataNode",
      "selectQuery": "select * from policy",
      "table": "policy"
    }
  ],
  "parameters": []
}
질문됨 2년 전281회 조회
1개 답변
0

Thank you for bringing this up. Could you please raise a support ticket with Data Pipeline Premium Support so that we can help you on this issue.

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠