How to fix the issue "Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME?"

0

I tried to create a cluster for running Spark, but after connecting pyspark, I have the following message:

Python 3.7.9 (default, Feb 18 2021, 03:10:35) [GCC 7.3.1 20180712 (Red Hat 7.3.1-12)] on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

22/04/24 03:10:03 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ / / .__/_,// //_\ version 3.1.1-amzn-0 /_/

Using Python version 3.7.9 (default, Feb 18 2021 03:10:35)

Spark context Web UI available at http://ip-172-31-8-242.us-east-2.compute.internal:4040 Spark context available as 'sc' (master = yarn, app id = application_1650768456809_0004). SparkSession available as 'spark'.

Because of the warning " Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME", I can't load a local file by using sc.textFile("file:///home/hadoop/filename")

How to remove this warning message?

asked 2 years ago3150 views
2 Answers
0

Two things you can set the spark log in session or log4j files which will take effect for all the invocations.

  1. for session level continue to use the sc.setLogLevel(newLevel) - newLevel like OFF and see it that helps.
  2. secondly if session setting is not working use log4j file to tuen off . go to file like $SPARK_HOME/conf/log4j.properties or $SPARK_HOME/conf/log4j.properties.template ( move it to $SPARK_HOME/conf/log4j.properties) and set log4j.rootCategory to OFF (default was DEBUG change to OFF). more details on this check the below link :- https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-logging.html https://community.cloudera.com/t5/Support-Questions/Config-log4j-in-Spark/td-p/34968
AWS
NishAWS
answered 2 years ago
  • Thank you for your answers. How can we make these changes from the AWS EMR console?

0

For editing log4j either you log into EMR master by SSH to ec2 machine for existing running EMR cluster and use VI editor for editing the log4j.properties. if you are spinning up a new EMR and configure the log4j without logging into master node then use EMR configuration. Below link state how to use the EMR configuration https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html EMR property is "spark-log4j". Please refer the section "Configuration classifications" in below link https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-650-release.html ( you can browse the configuration based on the EMR version you are using)

AWS
NishAWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions