Installing codeartifact library on MWAA PythonVirtualenvOperator

0

Hi, I'm creating the Pythonvirtualoperator like this:

virtualenv_task = PythonVirtualenvOperator(
    task_id="virtualenv_python",
    dag=dag,
    op_args=redshift_con,
    python_callable=callable_virtualenv,
    requirements=["pandas==1.4.4", "numpy==1.23.5","my-library==1.0"],
    system_site_packages=False,
)

how would you configure codeartifact inside MWAA to use on Operator and VirtualenvOperator?, is there a good practice? It's a lot of code to put on every dag, maybe I should install the library into MWAA directly? And how could I do that on a scalable way (allowing to update easily the package to new versions) Thanks!!!

3 Answers
2

Hi anupamk36

Look at this solution it is useful for your query 

  • Setting up CodeArtifact: First, you need to set up a CodeArtifact repository in your AWS account. This repository will act as a central location for storing and managing your Python packages.
  • Configure MWAA to use CodeArtifact: MWAA supports customizing the Python environment using a requirements.txt file. You can specify the Code Artifact repository URL as a source for Python packages in the requirements.txt file.
  • **Here's an example requirements.txt file: **
--index-url=https://aws.my-codeartifact-domain.com/pypi/my-repository/simple
pandas==1.4.4
numpy==1.23.5
my-library==1.0

Replace https://aws.my-codeartifact-domain.com/pypi/my-repository/simple with the actual CodeArtifact repository URL.

  • Use the requirements.txt file with MWAA: Upload the requirements.txt file to an S3 bucket accessible by MWAA. Configure MWAA to use this requirements.txt file for setting up the Python environment. You can do this through the MWAA console or AWS CLI. Update the PythonVirtualenvOperator: **Modify your PythonVirtualenvOperator to reference the requirements.txt file: **
virtualenv_task = PythonVirtualenvOperator(
    task_id="virtualenv_python",
    dag=dag,
    op_args=redshift_con,
    python_callable=callable_virtualenv,
    requirements=["s3://path/to/requirements.txt"],
    system_site_packages=False,
)

Replace "s3://path/to/requirements.txt" with the S3 path where you uploaded your requirements.txt file.

By configuring MWAA to use Code Artifact and referencing the requirements.txt file in your PythonVirtualenvOperator, you centralize package management and ensure scalability and ease of updating packages. When you need to update packages to new versions, simply update the requirements.txt file and redeploy your DAGs. MWAA will automatically fetch the updated packages from Code Artifact during environment setup.

answered 9 days ago
0

Hello,

Thank you for reaching out to AWS.

I understand that you would like to use CodeArtificat with MWAA PythonVirtualenvOperator.

Proceeding further, please find some of the reference articles below.

[] Creating a custom plugin for Apache Airflow PythonVirtualenvOperator - https://docs.aws.amazon.com/mwaa/latest/userguide/samples-virtualenv.html

Further, in the requirements.txt you can specify the Code Artifact repository URL as a source for Python packages

// YOUR_S3_BUCKET/dags/codeartifact.txt --index-url https://aws:123abc@mwaa-12345678910.d.codeartifact.eu-west-1.amazonaws.com/pypi/mwaa_repo/simple/

(modify as per your artifact URL)

Please refer this link for more details. https://aws.amazon.com/blogs/opensource/amazon-mwaa-with-aws-codeartifact-for-python-dependencies/

Lastly, for IAM permissions, please ensure below permissions in the documentation are allowed.

[] Domain policies - Enable cross-account access to a domain - https://docs.aws.amazon.com/codeartifact/latest/ug/domain-policies.html#enabling-cross-acount-access-to-a-domain [] Repository policies - Create a resource policy to grant read access - https://docs.aws.amazon.com/codeartifact/latest/ug/repo-policies.html#creating-a-resource-policy-to-grant-read-access

Also, for refreshing the CodeArtifact token please refer this link - [] https://docs.aws.amazon.com/mwaa/latest/userguide/samples-code-artifact.html

Hoping that the above helps.

AWS
SUPPORT ENGINEER
answered 8 days ago
  • Are you suggesting that I put plain text credentials on an txt file on S3?

0

In my actual case codeartifact is in another AWS account (forgot to tell that, sorry), how would you recommend authenticating? Thanks!

answered 9 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions