1 Resposta
- Mais recentes
- Mais votos
- Mais comentários
1
Hello Harish, The observation that you have experienced is an expected behavior. I have tried the below and one hack that you can do is
#!/usr/bin/python
import sys
import os
sys.path.append('/usr/lib/spark/python/lib/pyspark.zip')
sys.path.append('/usr/lib/spark/python/lib/py4j-src.zip')
os.environ['SPARK_HOME'] = '/usr/lib/spark'
import pyspark.sql.types as spark_type
import pyspark.sql.functions as spark_func
from pyspark.sql import Row
from pyspark.sql import SparkSession
My tests:
- in EMR master node, created script
test.py
[hadoop@ip-172-31-41-141 ~]$ cat test.py
#!/usr/bin/python
import sys
import os
sys.path.append('/usr/lib/spark/python/lib/pyspark.zip')
sys.path.append('/usr/lib/spark/python/lib/py4j-src.zip')
os.environ['SPARK_HOME'] = '/usr/lib/spark'
import pyspark.sql.types as spark_type
import pyspark.sql.functions as spark_func
from pyspark.sql import Row
from pyspark.sql import SparkSession
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master('yarn') \
.appName('pythonSpark') \
.enableHiveSupport() \
.getOrCreate()
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]
df = spark.createDataFrame(data)
df.show()
- From notebook
- From YARN RM UI
The reason is the notebook is run on JupyterEnterpriseGateway (JEG) and EMR cluster is accessed via livy.
In many cases %run
is being used to execute a different notebook see here instead of directly calling the python file.
But, generally with EMR its recommend to use %execute_notebook to execute ipynb files
respondido há 7 dias
Conteúdo relevante
- AWS OFICIALAtualizada há 2 anos
- AWS OFICIALAtualizada há um ano
- AWS OFICIALAtualizada há um ano
- AWS OFICIALAtualizada há um ano