I have a bit of an issue with databrew job pushing result file into the incorrect location.
The use case:
I am working on the step function which is masking the data using Data Brew. I have a parquet input file in S3 with path: folder1/file.parquet in the Account A. My Databrew job is configured to mask the data in the file and push it to S3 in Account B under the same path as original file.
The config is:
"Outputs": [
{
"Format": "PARQUET",
"MaxOutputFiles": 1,
"Overwrite": "TRUE",
"Location": {
"Bucket": "BucketInAccountB",
"BucketOwner": "AccountBOwnerId",
"Key.$": "States.Format('{}/{}/{}/{}/{}',States.ArrayGetItem(States.StringSplit($.detail.object.key, '/'), 0),States.ArrayGetItem(States.StringSplit($.detail.object.key, '/'), 1), States.ArrayGetItem(States.StringSplit($.detail.object.key, '/'), 2), States.ArrayGetItem(States.StringSplit($.detail.object.key, '/'), 3), $.Dataset.filename)"
}
}
],
The actual behaviour is that output file name is folder1/file.parquet/NameOfTheRecopeJob_time_part000.parquet
The crazy bit is that when I check the job in DataBrew console, in Job run history and open the Output, it actually shows the expected path - https://us-east-1.console.aws.amazon.com/s3/object/BucketInAccountB?region=ap-southeast-2&prefix=folder1/file.parquet