How to handle long running process (such as processing of 40GB video file) using lambda function?

0

Hi There :-),

Context:

  1. Two files will be uploaded to s3, 1.2. around 40GB avi format video file 1.2. around 100mB txt file
  2. Whenever there's an upload happens, I want to pick those both files and apply some transformation logic (python code that takes these files as inputs - I already have these).
  3. So, Created a Lambda with Python 3.10 runtime

Explanation: Assume, The upload happens for 10KB avi file and 7MB txt file, it will trigger the lambda function, below are the steps am following,

  1. download the avi into lambda environment from s3
  2. apply transformation logic
  3. store it in another path of lambda execution environment
  4. video stored in #3 will be picked by another transformation logic and produce a CSV file Now, When I upload 1GB avi video file, the first step of transformation logic takes more than 15minutes. So, lambda execution times out.

FYI: for storing the large files (downloading from s3 as mentioned in explaination #1), I connected lambda function with EFS. And that EFS is mounted on EC2.

Questions:

  1. So, How do I get the long running process running using lambda?
  2. I would like to increase the timeout of lambda function to 5Hrs. Can I get help from AWS team and get the timeout increased to 5Hrs?
  3. Or is there any other ways to achieve this?

Thanks.

2 個答案
1

So, How do I get the long running process running using lambda?

Lambda cannot be used for processes that run for more than 15 minutes.
So I think we need to consider other AWS services.
Examples include AWS Batch and ECS.
We could also consider AWS Glue, which is suited for ETL and other processes.
https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html
https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html

I would like to increase the timeout of lambda function to 5Hrs. Can I get help from AWS team and get the timeout increased to 5Hrs?

Unfortunately, contacting AWS support will not create a Lambda that will run for more than 15 minutes.

Or is there any other ways to achieve this?

Please consider services other than Lambda here as explained above.
In this case, you may want to use AWS Glue since you already have the Python code.
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python.html

profile picture
專家
已回答 1 年前
  • Hi Riku Kobayashi, Thank you for your input.

    Need two clarifications regarding using custom libraries in Glue.

    Context: I am currently exploring Glue and how best would it fit my requirement. For the data transformation logic I mentioned earlier, I need four python packages

    1. ffmpeg-python
    2. pandas
    3. opencv-python
    4. numpy

    issue with pandas libary:

    1. in glue job details -> advanced properties -> Python library path info section (click on "info" next to this title)
    2. in that info section, its clearly stated that "Only pure Python libraries can be used. Libraries that rely on C extensions, such as pandas (Python data analysis) library, are not yet supported."
    3. Actually I need to use pandas in the data transformation process.

    Question: Could you please guide me how can I handle this scenario?

    issue with ffmpeg-python libary: Unable to import ffmpeg-python into glue code even after linking the package from s3.

    steps I did to link package to glue:

    1. As suggested in online articles, installed ffmpeg-python in local directory using "pip3 install --target . ffmpeg-python" command.
    2. Then, zipped this directory and uploaded to s3.
    3. And pasted the s3 URI of this library zipped file in python library section in glue configuration.
    4. Then added "import ffmpeg" in python code But getting "ModuleNotFoundError: No module named 'ffmpeg'" when I run glue.

    question: Is there any steps I missed to take?

  • I believe AWS Glue 4.0 can use pandas. https://aws.amazon.com/about-aws/whats-new/2022/11/introducing-aws-glue-4-0/?nc1=h_ls
    It appears to be a problem as it appears to follow the procedure in this document. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-zipping
    By the way, what command did you use to zip it?

0

As stated, Lambda functions can only run for up to 15 minutes. There is no way to extend that.

Your options are:

  1. Use a different service such as ECS Fargate.
  2. Break the processing into multiple, smaller chunks (not always possible), e.g., take the large file, break it into a few smaller files. Then a Lambda function can processes each chunk. You will use Step Functions to orchestrate the entire process.
profile pictureAWS
專家
Uri
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南