request.get() method is not working for few specific websites in AWS Lambda Python shell

0

I am trying to execute request.get( ) method for few web urls. And the code is executing perfectly in my local system pycharm, jupyter notebook, google colab notebook etc. Only when I am executing the code in AWS Lambda function, it's working for few urls and throwing timeout exception for other urls. the list of urls are:

'https://www.google.com/', 'https://stackoverflow.com/', 'https://www.nseindia.com/', 'https://www.makemytrip.com/'

The code I've developed is

import json
import requests

def lambda_handler(event, context):
    url1 = 'https://www.google.com/'
    url2 = 'https://stackoverflow.com/'
    url3 = 'https://www.nseindia.com/'
    url4 = 'https://www.makemytrip.com/'
    header = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'
    }
    response = requests.get(url=url3, headers=header,allow_redirects=False,timeout=60) 
    for i in response.cookies: 
        print(i.name +"==>>", i.value)
    return

The code is working for url1 and url2 but not working for url3 and url4 in AWS Lambda python shell only, but all the urls are working in google colab or jupyter notebook.

asked 7 months ago324 views
2 Answers
1

Hi,

The timeout you got could indicate that the site you are trying to reach has some firewall configured. There are many different ways to block traffic, such as by IP address/network, user agent (I see you providing a custom), country and many others. You mentioned that code is working from the other sites, but none are based on AWS.

profile picture
EXPERT
answered 7 months ago
  • That's correct. But if I want to extract cookie of such websites using AWS cloud solution, then what are the possible options?

0

If I recall correctly, I think the default timeout that is set on new lambda functions is 3 seconds. If the request is working for the first two URLs try bumping the timeout of your function to something higher like 5 minutes and rerun it.

https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-timeout-console

answered 7 months ago
  • I set it to 15 minutes, but still no response for url3 and url4. But in my local machine jupyter notebook or in google colab, it ususaly takes 1 or 2 sec to response.

  • Try setting the max retries to 5. If that does not work, you'll have to dig into Cloudwatch logs. Perhaps there is some more information in there as to why the function is timing out

  • The site is blocking the AWS IP. curl https://www.makemytrip.com/ in EC2 is not working.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions