- Newest
- Most votes
- Most comments
You can upgrade the boto3 version with below steps.
- Upload boto3 wheel file to your S3 bucket. Boto3 wheel file is available in pypi.org. (https://pypi.org/project/boto3/#files)
- Configure your Glue Python shell job with specifying the wheel file S3 path in 'Python library path' in the job configuration.
- Insert below codes at the beginning of your python script. (The print statements can obviously be omitted)
import sys
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
if 'boto' in k:
del sys.modules[k]
import boto3
print('boto3 version')
print(boto3.__version__)
- Then you can import boto3 and start scripting with newer boto3.
For example, you can use Athena ListDataCatalogs which is not available in default boto3 yet.
athena = boto3.client("athena")
res = athena.list_data_catalogs()
Edited by: NoritakaS-AWS on Oct 22, 2020 9:59 PM
Hi,
We got AWS Glue Python Shell working with all dependency as follows. The Glue has awscli dependency as well along with boto3
AWS Glue Python Shell with Internet
Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.
Download the following whl files
-
awscli-1.18.183-py2.py3-none-any.whl https://pypi.org/project/awscli/#files
-
boto3-1.16.23-py2.py3-none-any.whl https://pypi.org/project/boto3/#files
-
Upload the files to s3 bucket in your given python library path
-
Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma
AWS Glue Python Shell without Internet connectivity
Reference: AWS Wrangler Glue dependency build https://github.com/corvuslee/public/blob/master/awswrangler_glue.md
-
We followed the steps mentioned above for awscli and boto3 whl files
-
Below is the latest requirements.txt compiled for the newest versions
colorama==0.4.3
docutils==0.15.2
rsa==4.5.0
s3transfer==0.3.3
PyYAML==5.3.1
botocore==1.19.23
pyasn1==0.4.8
jmespath==0.10.0
urllib3==1.26.2
python_dateutil==2.8.1
six==1.15.0
- Download the dependencies to libs folder
pip download -r requirements.txt -d libs
-
Move the original main whl files also to the lib directory
-
awscli-1.18.183-py2.py3-none-any.whl https://pypi.org/project/awscli/#files
-
boto3-1.16.23-py2.py3-none-any.whl https://pypi.org/project/boto3/#files
-
Package as a zip file
cd libs zip ../boto3-depends.zip * -
Upload the boto3-depends.zip to s3 and add the path to Glue jobs Referenced files path
Note: It is Referenced files path and not Python library path -
Placeholder code to install latest awcli and boto3 and load into AWS Python Glue Shell.
import os.path
import subprocess
import sys
# borrowed from https://stackoverflow.com/questions/48596627/how-to-import-referenced-files-in-etl-scripts
def get_referenced_filepath(file_name, matchFunc=os.path.isfile):
for dir_name in sys.path:
candidate = os.path.join(dir_name, file_name)
if matchFunc(candidate):
return candidate
raise Exception("Can't find file: ".format(file_name))
zip_file = get_referenced_filepath("awswrangler-depends.zip")
subprocess.run(["unzip", zip_file])
# Can't install --user, or without "-t ." because of permissions issues on the filesystem
subprocess.run(["pip3 install --no-index --find-links=. -t . *.whl"], shell=True)
#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
if 'boto' in k:
del sys.modules[k]
import boto3
print('boto3 version')
print(boto3.version)
- Check if the code is working with latest AWS CLI API
Thanks,
Sarath
Edited by: SarathMohan on Nov 23, 2020 10:32 AM
Edited by: SarathMohan on Nov 23, 2020 10:35 AM
Edited by: SarathMohan on Nov 23, 2020 10:41 AM
In case anyone else ends up on this thread, Glue v2.0 simplified this process drastically with the addition of the --additional-python-modules parameter. See
https://docs.aws.amazon.com/glue/latest/dg/reduced-start-times-spark-etl-jobs.html
The following job parameter made the Athena client functions in boto3 v1.17.12 available without any need to add extra code modifying the python path.
"--additional-python-modules", "botocore>=1.20.12,boto3>=1.17.12"
This doesn't work for some reason...
After importing the newly updated boto3 library, and checking the version:
print(boto3.__version__)
"1.17.9" is printed
But when I try to access the list_data_catalogs() method:
res = athena.list_data_catalogs()
I receive the following error: "AttributeError: module 'boto3' has no attribute 'list_data_catalogs'"
Edited by: g-crmf on Feb 23, 2021 11:43 AM
Edited by: g-crmf on Feb 23, 2021 11:43 AM
Relevant content
- asked 2 years ago
- asked 7 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
The last version of botocore that can run on 3.6 is 1.23.10 https://pypi.org/project/boto3/1.23.10/#files