Issues with Deploying a Python AWS Lambda Function that Uses Pandas

0

I am currently working on an AWS Lambda function that requires the use of the Pandas library for processing CSV files. I understand that due to the large size of Pandas, it is not included in the standard AWS Lambda Python environment. Therefore, I have been attempting to create a deployment package with the necessary libraries.

Despite multiple attempts, I am encountering a consistent error message stating "Unable to import module 'lambda_function': No module named 'pandas'". This suggests that the Pandas module is not being included properly in the deployment package.

During the creation of the deployment package, I installed the required libraries in a virtual environment, packaged the libraries along with my Lambda function code, and then uploaded this as a .zip file to AWS Lambda. However, I am still facing the above-mentioned issue.

I am developing my function in Python 3.11.3 locally, and I understand that AWS Lambda now supports up to Python 3.10. Could the issue be arising due to the version mismatch between my local development environment and AWS Lambda's runtime?

I have also come across AWS Glue for Ray and wonder if it could assist with this situation. However, I am unclear on how it could be used to resolve the current issue.

Could you please provide guidance on how to successfully deploy an AWS Lambda function that uses Pandas? Any help regarding the proper structuring of the deployment package or insights on how to use AWS Glue for Ray in this context would be greatly appreciated.

  • Hi rePost-User-6379718, If this answer helped, please accept the answer for better community experience. Thank you.

preguntada hace 9 meses2117 visualizaciones
3 Respuestas
2

Typical approach to create the layer would be as follows:

mkdir myproject

cd myproject

virtualenv v-env

source ./v-env/bin/activate

pip install pandas

deactivate

#Now creating layer #Make sure directory name is python nothing else

mkdir python

cd python

#Just check the list-packages or site-packages path one directory above and update accordingly on next command

cp -r ../v-env/lib64/python3.10/dist-packages/* .

cd ..

zip -r pandas_layer.zip python

#Create layer through CLI or Console #Keep compatible run time from python 3.7 to 3.10 and lambda run time as python 3.10

There are few pointers which may help you in this case:

First: If you are using mac for creating pandas package, try ubuntu or linux(most of times this helps)

Second: Try deploying the function and package in python3.10 at local and then upload it in AWS

Third: if you are familiar with CI/CD, best to deploy with codepipeline where AWS would create docker environment and install pandas library along with it's dependency(pytz, numpy, tzdata) through requirements.txt where you would specify pandas==1.2.3(example)

Lastly(best if not using CI/CD): Follow this post step by step https://repost.aws/knowledge-center/lambda-python-function-layer and deploy the lambda function separately without pandas package zipped with it and update the layer for this lambda, which would be created following above post.

profile pictureAWS
EXPERTO
respondido hace 9 meses
  • I was looking around to find best way of doing it as I also faced some issues using pandas with lambda functions. I followed the last option of using AWS Serverless Application Repository Console and it worked absolutely fine. I wasn’t aware of this, thanks for sharing.

  • @alok, glad it helped.

  • @rePost-User-6379718, curious, if any of the options suggested here worked for you? If not, what's the issue you are facing?

  • @rePost-User-6379718, If it didn't work for you let me know what are the challenges?

0

Hi,

There are multiple other options than creating the layer, to add Pandas in the Lambda layer:

  1. Directly use the Pandas library ARN from this repository: https://github.com/keithrozario/Klayers/tree/master/deployments/python3.10
  2. You could use the default Lambda layer "AWSDataWrangler"

Thanks!

Himal
respondido hace 9 meses
0

Take a look at using a Lambda layer for pandas to see if it fits your use case. https://aws-sdk-pandas.readthedocs.io/en/stable/layers.html has a good summary of layers available.

AWS
Bit
respondido hace 9 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas