How to redirect entire output of spark-submit to s3

0

using spark submit to command how can we send complete application logs which we are going to see when submited spark submit command.

anudeep
질문됨 일 년 전369회 조회
1개 답변
0

First, make sure you have the proper AWS credentials for the worker running the spark-submit job. This will depend on what you're using to run the task (for example, Glue Job execution role, Fargate execution role, EC2 instance profile, etc.). Once you have this you can set the Amazon S3 bucket you want to save to as the output path parameter. You can use Spark's 'save' method to write the results to this output path. For example:

val outputDataFrame: data = // your data
outputDataFrame.write.parquet("s3://yourbucket/output")

Depending on where you run your job, you can gather your application logs in CloudWatch. EC2, Glue, Fargate, EKS, ECS all integrate with Amazon CloudWatch so you can enable the execution role to write job to CloudWatch. You can find your application logs there. It's then up to you if you want to send those logs to other storage destinations like S3, Splunk, DataDog, etc.

profile pictureAWS
전문가
pechung
답변함 일 년 전
AWS
지원 엔지니어
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인