Redirect spark-submit output to s3

0

I don't seem to be able to redirect stderr from spark-submit to a file on S3...

1. As a basic test, this works:

sh-4.2$ cat ./testit.sh #!/bin/sh echo hello

sh-4.2$ ./testit.sh | aws s3 cp - s3://mybucket/test.out

2. This creates a file in s3 upon submission, but it is empty file when the command completes:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
| aws s3 cp - s3://mybucket/test.out

3. This does not even create a file at all:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> aws s3 cp - s3://a206739-etls3-dsdc-ci-use1/test.out

4. This creates a file locally. I could then copy to s3, but that's not what I want to do:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> test.out

I don't understand why #2 creates the file, but does not send output to it. Is there a way to make this work like that?

1 回答
0

Hello,

Thank you for contacting AWS. We were able to test and confirm the behavior as mentioned in your question.

Unfortunately, there is no workaround for this behavior and this is the only option from spark side to route the content to S3.

Please feel free to write back to me if you have any further queries. I would be more than happy to help.

AWS
支持工程师
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则