Redirect spark-submit output to s3

0

I don't seem to be able to redirect stderr from spark-submit to a file on S3...

1. As a basic test, this works:

sh-4.2$ cat ./testit.sh #!/bin/sh echo hello

sh-4.2$ ./testit.sh | aws s3 cp - s3://mybucket/test.out

2. This creates a file in s3 upon submission, but it is empty file when the command completes:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
| aws s3 cp - s3://mybucket/test.out

3. This does not even create a file at all:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> aws s3 cp - s3://a206739-etls3-dsdc-ci-use1/test.out

4. This creates a file locally. I could then copy to s3, but that's not what I want to do:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> test.out

I don't understand why #2 creates the file, but does not send output to it. Is there a way to make this work like that?

1 個回答
0

Hello,

Thank you for contacting AWS. We were able to test and confirm the behavior as mentioned in your question.

Unfortunately, there is no workaround for this behavior and this is the only option from spark side to route the content to S3.

Please feel free to write back to me if you have any further queries. I would be more than happy to help.

AWS
支援工程師
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南