Spark artifacts location not found in EMR

1

Hi,

One of my dev team members, asking to share the emr spark artifacts s3 location for building a Java application. I referred this doc https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-artifact-repository.html for finding the maven location. I see hadoop & hive artifacts available. However spark is missing. Anyone let me know if the path I referred is valid or share me right one ?

Thanks in advance

Vaas
질문됨 4달 전157회 조회
1개 답변
3
수락된 답변

Hello,

Please note that the Spark jars are currently not available publicly in AWS EMR maven repository at S3 unfortunately, only Apache Hadoop and Apache Hive libraries and dependencies are available. I recommend to reach out your TAM with business justification with highlighting the feature request requirement for Spark artifacts availability.

On the other hand, the main Spark project publishes Spark libraries and dependencies that are mostly compatible with EMR Spark versions. You can browse them on the Spark Packages site or Maven Central. Just search for "spark" and filter by the desired Spark version and inter-relate with Amazon's EMR release guide which lists the major components and versions included in each EMR release.

Alternatively, you can launch an EMR cluster and analyse the libraries installed there. Basically, you can download the spark jars from EMR primary node present in (/usr/lib/spark/) to build your custom application as EMR uses public Spark releases with additional customization so these versions should be obtainable. Please ensure you check licensing if reusing or distributing.

AWS
지원 엔지니어
답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인