Questions tagged with AWS Glue

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Bug: Glue 4.0 | Python 3.10 | pandas library

I have been experimenting using Glue 4 which supports Python 3.10 and pandas. I am adding pandas as a zipped library through the `--extra-py-files` functionality for a `gluetl` job. When running my job, it fails importing pandas (version 1.4.3) (`import pandas as pd`) with the following which I copy-pasted from the cloudwatch logs: ``` 2022-12-06 16:49:09,450 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(77)): Error from Python:Traceback (most recent call last): File "/tmp/database_monitoring.py", line 2, in <module> import pandas as pd File "/home/spark/.local/lib/python3.10/site-packages/pandas/__init__.py", line 48, in <module> from pandas.core.api import ( File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/api.py", line 47, in <module> from pandas.core.groupby import ( File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/groupby/__init__.py", line 1, in <module> from pandas.core.groupby.generic import ( File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/groupby/generic.py", line 76, in <module> from pandas.core.frame import DataFrame File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 170, in <module> from pandas.core.generic import NDFrame File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 147, in <module> from pandas.core.describe import describe_ndframe File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/describe.py", line 45, in <module> from pandas.io.formats.format import format_percentiles File "/home/spark/.local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 105, in <module> from pandas.io.common import ( File "/home/spark/.local/lib/python3.10/site-packages/pandas/io/common.py", line 8, in <module> import bz2 File "/usr/local/lib/python3.10/bz2.py", line 17, in <module> from _bz2 import BZ2Compressor, BZ2Decompressor ModuleNotFoundError: No module named '_bz2' 2022-12-06 16:49:09,450 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(77)): Error from Python:Traceback (most recent call last): File "/tmp/database_monitoring.py", line 2, in <module> import pandas as pd File "/home/spark/.local/lib/python3.10/site-packages/pandas/__init__.py", line 48, in <module> from pandas.core.api import ( File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/api.py", line 47, in <module> from pandas.core.groupby import ( File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/groupby/__init__.py", line 1, in <module> from pandas.core.groupby.generic import ( File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/groupby/generic.py", line 76, in <module> from pandas.core.frame import DataFrame File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 170, in <module> from pandas.core.generic import NDFrame File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 147, in <module> from pandas.core.describe import describe_ndframe File "/home/spark/.local/lib/python3.10/site-packages/pandas/core/describe.py", line 45, in <module> from pandas.io.formats.format import format_percentiles File "/home/spark/.local/lib/python3.10/site-packages/pandas/io/formats/format.py", line 105, in <module> from pandas.io.common import ( File "/home/spark/.local/lib/python3.10/site-packages/pandas/io/common.py", line 8, in <module> import bz2 File "/usr/local/lib/python3.10/bz2.py", line 17, in <module> from _bz2 import BZ2Compressor, BZ2Decompressor ModuleNotFoundError: No module named '_bz2'``` I believe this is a bug in AWS Glue 4.0 as opposed to a user issue. Is anyone able to advise or confirm? And if so, is there a bug fix planned for this?
0
answers
0
votes
7
views
JDay
asked 7 hours ago