Redirect spark-submit output to s3
I don't seem to be able to redirect stderr from spark-submit to a file on S3...
1. As a basic test, this works:
sh-4.2$ cat ./testit.sh #!/bin/sh echo hello
sh-4.2$ ./testit.sh | aws s3 cp - s3://mybucket/test.out
2. This creates a file in s3 upon submission, but it is empty file when the command completes:
sudo spark-submit \ --class ${PACKAGE}.${CLASS} \ --deploy-mode client \ --master yarn \ --jars $JARS \ --executor-cores 2 \ --conf spark.ui.enabled=false \ --conf spark.network.timeout=1200000 \ $JAR \ $ARGS \ | aws s3 cp - s3://mybucket/test.out
3. This does not even create a file at all:
sudo spark-submit \ --class ${PACKAGE}.${CLASS} \ --deploy-mode client \ --master yarn \ --jars $JARS \ --executor-cores 2 \ --conf spark.ui.enabled=false \ --conf spark.network.timeout=1200000 \ $JAR \ $ARGS \ 2> aws s3 cp - s3://a206739-etls3-dsdc-ci-use1/test.out
4. This creates a file locally. I could then copy to s3, but that's not what I want to do:
sudo spark-submit \ --class ${PACKAGE}.${CLASS} \ --deploy-mode client \ --master yarn \ --jars $JARS \ --executor-cores 2 \ --conf spark.ui.enabled=false \ --conf spark.network.timeout=1200000 \ $JAR \ $ARGS \ 2> test.out
I don't understand why #2 creates the file, but does not send output to it. Is there a way to make this work like that?
Hello,
Thank you for contacting AWS. We were able to test and confirm the behavior as mentioned in your question.
Unfortunately, there is no workaround for this behavior and this is the only option from spark side to route the content to S3.
Please feel free to write back to me if you have any further queries. I would be more than happy to help.
Relevant questions
Bucket redirection rules not working with folders
asked 2 months agoTrailing slashes and clean urls
Accepted Answerasked 3 years agoDownloaded files are corrupted when publishing on AWS lambda
asked 3 years agoCloudfront with a Lambda@Edge pointing to a private S3
asked 2 years agoCloudFront Function - HTTP Methods
asked 4 months agoLambda@Edge Redirect when URL ends with #
Accepted Answerasked a year agoRedirect spark-submit output to s3
asked 2 months agoObject metadata x-amz-website-redirect-location not working via API
asked 3 years agoAWS System Manager Running AWS-RunRemoteScript doesnt redirect stream output to file
Accepted Answerasked 2 months agoS3 Static Website RoutingRules when using Cloudfront and a domain name
asked 3 years ago