By using AWS re:Post, you agree to the Terms of Use
/AWS Data Pipeline/

Questions tagged with AWS Data Pipeline

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

3
answers
0
votes
20
views
Alexander
asked a month ago

Data Pipeline stops processing files in S3 bucket

I have a Data Pipeline which reads CSV files from an S3 bucket and copies the data into an RDS database. I specify the bucket/folder name and it goes through each CSV file in the bucket/folder and processes it. When it is done, a ShellCommandActivity moves the files to another 'folder' in the S3 bucket. That's how it works in testing. With the real data it just stops after a few files. The last line in the logs is `07 Dec 2021 09:57:55,755 [INFO] (TaskRunnerService-resource:df-1234xxx1_@Ec2Instance_2021-12-07T09:53:00-0) df-1234xxx1 amazonaws.datapipeline.connector.s3.RetryableS3Reader: Reopening connection and advancing 0` The logs show that it usually downloads the CSV file, then it writes the 'Reopening connection and advancing 0' line, then it deletes a temp file, then goes onto the the next file. But on the seventh file it just stops on 'Reopening connection and advancing 0'. It isn't the next file that is the problem, as it will process fine on it's own. I've already tried making the files smaller - originally it was stopping on the second file but now the filesizes are about 1.7MB it's getting through six of them before it stops. The status of each task (both DataLoadActivity and ShellCommandActivity) show 'CANCELLED' after one attempt (3 attempts are allowed) and there is no error message. I'm guessing this is some sort of timeout. How can I make the pipeline reliable so that it will process all of the files?
2
answers
0
votes
8
views
erc_aws
asked a month ago
  • 1
  • 90 / page