Airflow webserver not installing python requirements

2

The Airflow 2.2.2 webserver on MWAA is not installing the packages in the requirements.txt. The packages install just fine on the workers and scheduler; but not the webserver. pip fails with a connection timeout error.

Here's the from the CloudWatchrequirements_install_* log stream for the webserver

WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7ff05a174050>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/apache-airflow-providers-mysql/
ERROR: Could not find a version that satisfies the requirement apache-airflow-providers-mysql==2.1.1 (from versions: none)

I didn't run into this issue when I setup MWAA in my dev VPC so I'm not sure what's going on here; I've run the MWAA verify environment support tool but it didn't flag any configuration issues and now I'm at my wit's end.

Any insight would be greatly appreciated!

asked 2 years ago4295 views
5 Answers
1

Outbound internet access was removed from the private web server option.

If you wish to continue to install requirements from public repositories such as http://pypi.org on the webserver that is private, you may do so by downloading and packaging Python WHL files in plugins.zip, or changing your webserver to public.

answered 2 years ago
  • Is there any way to enable this? It seems like a regression to me because in many disconnected environments, folks host their own PyPI repositories. We already have a cordoned off MWAA instance behind an AWS network firewall and are explicitly allowing outbound connection to our private PyPI repository.

  • Was this mentioned in a changelog somewhere? And if so, was the fact that our previous deployment that had the Private network enabled, but was able to install remote dependencies over the Internet, a security bug?

0

It turns out the issue may have been due to the webserver being in private mode. After switching it to private mode, the package installation succeeded. I think when private network mode is selected for the Airflow webserver, the AWS-managed service VPC in which it runs does not also allow outbound Internet access - hence the connection timeouts

answered 2 years ago
  • Hi, we are trying to setup MWAA in private mode. Even for us the same issue is happening with python dependency installation. The pypi.org is accessible from other EC2 on the same subnet so there should not be any network/FW issues. Does anyone has any idea what might be wrong or what other configuration may be needed to make this work? Happy to provide additional details if required.

0

I have the same problem, MWAA woudn't update totally! after updating requrements.txt file. My webserver access is set to public but it doesn't work!

Keri
answered 9 months ago
0

Perhaps we don't need the packages installed on the Airflow Web Server, only on the Workers and Schedulers, so the pip install timeouts on the web server could probably be ignored. Unless we need some Airflow UI plugins installed. Has anyone verified this? Does the Web Server also need all the packages in the requirements.txt file installed? Surely the Scheduler and Workers do, but the Web Server?

Dan
answered 6 months ago
0

Faced the same issue with version 2.2.2 . After investigating further, I found that webserver is using python -m pip install -r requirements.txt. While the rest (worker and scheduler) are using pip install -r requirements.txt.

Basudev
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions