Skip to content

Selenium Timeout Exception , when using docker in ec2 (instance type - t3.small)

0

I am scraping a website using selenium. The problem that I'm facing is, the python file works correctly in my local machine. When I dockerize it and upload to my ec2 machine, it gives Timeout exception. The timeout specifically occurs here WebDriverWait(driver, 20).until( EC.presence_of_element_located((By.CSS_SELECTOR, "tr.group.text-sm.text-gray-300")) )

The page is partially loaded and this is the part which isn't loaded at all. I checked this by taking screenshots of the page. I have also added sleep timer upto 200sec ,still the issue persists. I am using headless chrome.

chrome_options = Options()
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--lang=en_US")
chrome_options.add_argument("--window-size=1200,842")
chrome_options.add_argument(
    "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=chrome_options)

Dockerfile

FROM python:3.10.4
WORKDIR /app
COPY . /app
RUN pip install --trusted-host pypi.python.org -r requirements.txt
RUN apt-get update && apt-get install -y wget unzip && \
    wget http://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
    apt install -y ./google-chrome-stable_current_amd64.deb && \
    rm google-chrome-stable_current_amd64.deb && \
    apt-get clean

CMD ["python", "gecko_scraper.py"]

Error -

 WebDriverWait(driver, 200).until(
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 95, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 
Stacktrace:
#0 0x55f6851a8e3a <unknown>
#1 0x55f684e9245c <unknown>
#2 0x55f684ede5b5 <unknown>
#3 0x55f684ede671 <unknown>
#4 0x55f684f22f14 <unknown>
#5 0x55f684f014dd <unknown>
#6 0x55f684f202cc <unknown>
#7 0x55f684f01253 <unknown>
#8 0x55f684ed11c7 <unknown>
#9 0x55f684ed1b3e <unknown>
......

PS. I have checked other questions , couldn't find the answer hence asking.

  • What happens if you run the docker version on your local machine?

  • I'm using M2. selenium requirements for docker are different in my machine that amd64 arch. (for eg, google-chrome_stable etc). For this reason, the dockerfile doesn't run successfully on my laptop as I have given amd64 requirements in the file, which are supposed on run only on amd64 machines. I have run the file standalone, that works perfectly fine.

  • on m2 its, Docker version 24.0.5, build ced0996600 on x64 windows its, Docker version 26.1.1, build 4cf5afa

asked 2 years ago975 views
1 Answer
0
AWS
EXPERT
answered 2 years ago
  • Issue still persists! The problem is not about process freezing. The problem is page is only partially loading. The part which is not loading contains the element tag which is needed to be scraped.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.