Loading geopandas library in Glue job

0

I am trying to load geopandas library in Glue job , I have tried below approaches.

  1. Creating geopandas python wheel and adding it Job Details -->Libraries-->Python library path-->${S3_Path_to_geopandas_wheel}
  2. Creating geopandas library zip and loading it on script as below
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

#glueContext.add_file('s3://python-glue-lib/geopandas.zip')

# Specify the path to the ZIP archive containing geopandas
geopandas_path = 's3://python-glue-lib/geopandas.zip'

# Add the custom library to the Python path
sc.addPyFile(geopandas_path)
logger = glueContext.get_logger()
logger.warn(str(sys.path))
# Now you should be able to import geopandas in your script
import geopandas as gpd
  1. But when running the job still I am getting error as "No Module Found" geopandas
Ajit
已提问 5 个月前161 查看次数
2 回答
0

You don't have to do sc.addPyFile(geopandas_path) and I doubt that will work.
The best way to install it is adding a parameter --additional-python-modules=geopandas (or point to the wheel if you don't want to install from Pypi), that way it makes sure it install any dependencies needed. (I just tested it and installs correctly this way).
The Python library box is intended for your own modules (should work if you package the zip correctly AND doesn't require any native bindings)

profile pictureAWS
专家
已回答 5 个月前
0

I have tested with the sc.addPyFile(geopandas_path) and indeed does not work. The best way would be to use --additional-python-modules job parameter as specified by Gonzalo.

profile pictureAWS
支持工程师
Chaitu
已回答 5 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则