I am scraping a website using selenium. The problem that I'm facing is, the python file works correctly in my local machine. When I dockerize it and upload to my ec2 machine, it gives Timeout exception. The timeout specifically occurs here
WebDriverWait(driver, 20).until( EC.presence_of_element_located((By.CSS_SELECTOR, "tr.group.text-sm.text-gray-300")) )
The page is partially loaded and this is the part which isn't loaded at all. I checked this by taking screenshots of the page.
I have also added sleep timer upto 200sec ,still the issue persists. I am using headless chrome.
chrome_options = Options()
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--lang=en_US")
chrome_options.add_argument("--window-size=1200,842")
chrome_options.add_argument(
"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=chrome_options)
Dockerfile
FROM python:3.10.4
WORKDIR /app
COPY . /app
RUN pip install --trusted-host pypi.python.org -r requirements.txt
RUN apt-get update && apt-get install -y wget unzip && \
wget http://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
apt install -y ./google-chrome-stable_current_amd64.deb && \
rm google-chrome-stable_current_amd64.deb && \
apt-get clean
CMD ["python", "gecko_scraper.py"]
Error -
WebDriverWait(driver, 200).until(
File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 95, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
#0 0x55f6851a8e3a <unknown>
#1 0x55f684e9245c <unknown>
#2 0x55f684ede5b5 <unknown>
#3 0x55f684ede671 <unknown>
#4 0x55f684f22f14 <unknown>
#5 0x55f684f014dd <unknown>
#6 0x55f684f202cc <unknown>
#7 0x55f684f01253 <unknown>
#8 0x55f684ed11c7 <unknown>
#9 0x55f684ed1b3e <unknown>
......
PS. I have checked other questions , couldn't find the answer hence asking.
What happens if you run the docker version on your local machine?
I'm using M2. selenium requirements for docker are different in my machine that amd64 arch. (for eg, google-chrome_stable etc). For this reason, the dockerfile doesn't run successfully on my laptop as I have given amd64 requirements in the file, which are supposed on run only on amd64 machines. I have run the file standalone, that works perfectly fine.
on m2 its, Docker version 24.0.5, build ced0996600 on x64 windows its, Docker version 26.1.1, build 4cf5afa