Airflow webserver not installing python requirements

2

The Airflow 2.2.2 webserver on MWAA is not installing the packages in the requirements.txt. The packages install just fine on the workers and scheduler; but not the webserver. pip fails with a connection timeout error.

Here's the from the CloudWatchrequirements_install_* log stream for the webserver

WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7ff05a174050>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/apache-airflow-providers-mysql/
ERROR: Could not find a version that satisfies the requirement apache-airflow-providers-mysql==2.1.1 (from versions: none)

I didn't run into this issue when I setup MWAA in my dev VPC so I'm not sure what's going on here; I've run the MWAA verify environment support tool but it didn't flag any configuration issues and now I'm at my wit's end.

Any insight would be greatly appreciated!

posta 2 anni fa4366 visualizzazioni
5 Risposte
1

Outbound internet access was removed from the private web server option.

If you wish to continue to install requirements from public repositories such as http://pypi.org on the webserver that is private, you may do so by downloading and packaging Python WHL files in plugins.zip, or changing your webserver to public.

con risposta 2 anni fa
  • Is there any way to enable this? It seems like a regression to me because in many disconnected environments, folks host their own PyPI repositories. We already have a cordoned off MWAA instance behind an AWS network firewall and are explicitly allowing outbound connection to our private PyPI repository.

  • Was this mentioned in a changelog somewhere? And if so, was the fact that our previous deployment that had the Private network enabled, but was able to install remote dependencies over the Internet, a security bug?

0

It turns out the issue may have been due to the webserver being in private mode. After switching it to private mode, the package installation succeeded. I think when private network mode is selected for the Airflow webserver, the AWS-managed service VPC in which it runs does not also allow outbound Internet access - hence the connection timeouts

con risposta 2 anni fa
  • Hi, we are trying to setup MWAA in private mode. Even for us the same issue is happening with python dependency installation. The pypi.org is accessible from other EC2 on the same subnet so there should not be any network/FW issues. Does anyone has any idea what might be wrong or what other configuration may be needed to make this work? Happy to provide additional details if required.

0

I have the same problem, MWAA woudn't update totally! after updating requrements.txt file. My webserver access is set to public but it doesn't work!

Keri
con risposta 10 mesi fa
0

Perhaps we don't need the packages installed on the Airflow Web Server, only on the Workers and Schedulers, so the pip install timeouts on the web server could probably be ignored. Unless we need some Airflow UI plugins installed. Has anyone verified this? Does the Web Server also need all the packages in the requirements.txt file installed? Surely the Scheduler and Workers do, but the Web Server?

Dan
con risposta 6 mesi fa
0

Faced the same issue with version 2.2.2 . After investigating further, I found that webserver is using python -m pip install -r requirements.txt. While the rest (worker and scheduler) are using pip install -r requirements.txt.

Basudev
con risposta 6 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande