How can I use AWS Glue to split a file by number of lines?

0

I have a file that contains about 1000 lines currently stored in an S3 bucket, and I want to split this file into smaller files (about 200 - 500 lines/file). I have searched for internet and only found solution to merge files into a larger file only. Can I use Glue to custom output file by lines? Or I should use any other method? I would very thankful if you can guide me the procedure.

Thank you so much!

질문됨 2년 전1247회 조회
1개 답변
1
수락된 답변

Hello,

As you mentioned you have only 1000 lines file which is quite small and for this purpose instead of using Glue ETL Spark job(It is recommended to process large amount of data) I would suggest to use below shell command which you can execute on Ec2 instance.

aws s3 cp s3://sourcebucket/csv/nycflights13.csv - | split -d -l 200 --filter "aws s3 cp - \"s3://destbucket/csv/bigdata_\$FILE.csv\""

Also you can use Glue python shell job and execute above shell command. [1]

Reference:

[1] Execute shell command using Python: https://www.codingninjas.com/blog/2021/06/25/how-to-execute-shell-commands-with-python/#:~:text=The%20naive%20approach%20to%20run,function%20that%20executes%20shell%20commands.

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠