Redirect spark-submit output to s3

0

I don't seem to be able to redirect stderr from spark-submit to a file on S3...

1. As a basic test, this works:

sh-4.2$ cat ./testit.sh #!/bin/sh echo hello

sh-4.2$ ./testit.sh | aws s3 cp - s3://mybucket/test.out

2. This creates a file in s3 upon submission, but it is empty file when the command completes:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
| aws s3 cp - s3://mybucket/test.out

3. This does not even create a file at all:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> aws s3 cp - s3://a206739-etls3-dsdc-ci-use1/test.out

4. This creates a file locally. I could then copy to s3, but that's not what I want to do:

sudo spark-submit
--class ${PACKAGE}.${CLASS}
--deploy-mode client
--master yarn
--jars $JARS
--executor-cores 2
--conf spark.ui.enabled=false
--conf spark.network.timeout=1200000
$JAR
$ARGS
2> test.out

I don't understand why #2 creates the file, but does not send output to it. Is there a way to make this work like that?

1 réponse
0

Hello,

Thank you for contacting AWS. We were able to test and confirm the behavior as mentioned in your question.

Unfortunately, there is no workaround for this behavior and this is the only option from spark side to route the content to S3.

Please feel free to write back to me if you have any further queries. I would be more than happy to help.

AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions