Redirect spark-submit output to s3

0

I don't seem to be able to redirect stderr from spark-submit to a file on S3...

1. As a basic test, this works:

sh-4.2$ cat ./testit.sh #!/bin/sh echo hello

sh-4.2$ ./testit.sh | aws s3 cp - s3://mybucket/test.out

2. This creates a file in s3 upon submission, but it is empty file when the command completes:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
| aws s3 cp - s3://mybucket/test.out

3. This does not even create a file at all:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> aws s3 cp - s3://a206739-etls3-dsdc-ci-use1/test.out

4. This creates a file locally. I could then copy to s3, but that's not what I want to do:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> test.out

I don't understand why #2 creates the file, but does not send output to it. Is there a way to make this work like that?

1 Answer
0

Hello,

Thank you for contacting AWS. We were able to test and confirm the behavior as mentioned in your question.

Unfortunately, there is no workaround for this behavior and this is the only option from spark side to route the content to S3.

Please feel free to write back to me if you have any further queries. I would be more than happy to help.

AWS
SUPPORT ENGINEER
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions