Redirect spark-submit output to s3

0

I don't seem to be able to redirect stderr from spark-submit to a file on S3...

1. As a basic test, this works:

sh-4.2$ cat ./testit.sh #!/bin/sh echo hello

sh-4.2$ ./testit.sh | aws s3 cp - s3://mybucket/test.out

2. This creates a file in s3 upon submission, but it is empty file when the command completes:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
| aws s3 cp - s3://mybucket/test.out

3. This does not even create a file at all:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> aws s3 cp - s3://a206739-etls3-dsdc-ci-use1/test.out

4. This creates a file locally. I could then copy to s3, but that's not what I want to do:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> test.out

I don't understand why #2 creates the file, but does not send output to it. Is there a way to make this work like that?

1回答
0

Hello,

Thank you for contacting AWS. We were able to test and confirm the behavior as mentioned in your question.

Unfortunately, there is no workaround for this behavior and this is the only option from spark side to route the content to S3.

Please feel free to write back to me if you have any further queries. I would be more than happy to help.

AWS
サポートエンジニア
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ