Loading geopandas library in Glue job

0

I am trying to load geopandas library in Glue job , I have tried below approaches.

  1. Creating geopandas python wheel and adding it Job Details -->Libraries-->Python library path-->${S3_Path_to_geopandas_wheel}
  2. Creating geopandas library zip and loading it on script as below
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

#glueContext.add_file('s3://python-glue-lib/geopandas.zip')

# Specify the path to the ZIP archive containing geopandas
geopandas_path = 's3://python-glue-lib/geopandas.zip'

# Add the custom library to the Python path
sc.addPyFile(geopandas_path)
logger = glueContext.get_logger()
logger.warn(str(sys.path))
# Now you should be able to import geopandas in your script
import geopandas as gpd
  1. But when running the job still I am getting error as "No Module Found" geopandas
Ajit
質問済み 5ヶ月前161ビュー
2回答
0

You don't have to do sc.addPyFile(geopandas_path) and I doubt that will work.
The best way to install it is adding a parameter --additional-python-modules=geopandas (or point to the wheel if you don't want to install from Pypi), that way it makes sure it install any dependencies needed. (I just tested it and installs correctly this way).
The Python library box is intended for your own modules (should work if you package the zip correctly AND doesn't require any native bindings)

profile pictureAWS
エキスパート
回答済み 5ヶ月前
0

I have tested with the sc.addPyFile(geopandas_path) and indeed does not work. The best way would be to use --additional-python-modules job parameter as specified by Gonzalo.

profile pictureAWS
サポートエンジニア
Chaitu
回答済み 5ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ