By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Containers

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

SKLearn Processing Container - Error: "WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager."

Hey all, I am trying to run the script below in the writefile titled "vw_aws_a_bijlageprofile.py". This code has worked for me using other data sources, but now I am getting the following error message from the CloudWatch Logs: "***2022-08-24T20:09:19.708-05:00 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv***" Any idea how I get around this error? Full code below. Thank you in advance!!!! ``` %%writefile vw_aws_a_bijlageprofile.py import os import sys import subprocess def install(package): subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package]) install('awswrangler') install('tqdm') install('pandas') install('botocore') install('ruamel.yaml') install('pandas-profiling') import awswrangler as wr import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from string import Template import gc import boto3 from pandas_profiling import ProfileReport client = boto3.client('s3') session = boto3.Session(region_name="eu-west-2") def run_profile(): query = """ SELECT * FROM "intl-euro-archmcc-database"."vw_aws_a_bijlage" ; """ #swich table name above tableforprofile = wr.athena.read_sql_query(query, database="intl-euro-archmcc-database", boto3_session=session, ctas_approach=False, workgroup='DataScientists') print("read in the table queried above") print("got rid of missing and added a new index") profile_tblforprofile = ProfileReport(tableforprofile, title="Pandas Profiling Report", minimal=True) print("Generated table profile") return profile_tblforprofile if __name__ == '__main__': profile_tblforprofile = run_profile() print("Generated outputs") output_path_tblforprofile = ('/opt/ml/processing/output/profile_vw_aws_a_bijlage.html') #switch profile name above print(output_path_tblforprofile) profile_tblforprofile.to_file(output_path_tblforprofile) ``` ``` import sagemaker from sagemaker.processing import ProcessingInput, ProcessingOutput session = boto3.Session(region_name="eu-west-2") bucket = 'intl-euro-uk-datascientist-prod' prefix = 'Mark' sm_session = sagemaker.Session(boto_session=session, default_bucket=bucket) sm_session.upload_data(path='vw_aws_a_bijlageprofile.py', bucket=bucket, key_prefix=f'{prefix}/source') ``` ``` import boto3 #import sagemaker from sagemaker import get_execution_role from sagemaker.sklearn.processing import SKLearnProcessor region = boto3.session.Session().region_name S3_ROOT_PATH = "s3://{}/{}".format(bucket, prefix) role = get_execution_role() sklearn_processor = SKLearnProcessor(framework_version='0.20.0', role=role, sagemaker_session=sm_session, instance_type='ml.m5.24xlarge', instance_count=1) ``` ``` sklearn_processor.run(code='s3://{}/{}/source/vw_aws_a_bijlageprofile.py'.format(bucket, prefix), inputs=[], outputs=[ProcessingOutput(output_name='output', source='/opt/ml/processing/output', destination='s3://intl-euro-uk-datascientist-prod/Mark/IODataProfiles/')]) ```
1
answers
0
votes
34
views
asked a month ago

How to save a .html file to S3 that is created in a Sagemaker processing container

**Error message:** "FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/processing/output/profile_case.html'" **Background:** I am working in Sagemaker using python trying to profile a dataframe that is saved in a S3 bucket with pandas profiling. The data is very large so instead of spinning up a large EC2 instance, I am using a SKLearn processor. Everything runs fine but when the job finishes it does not save the pandas profile (a .html file) in a S3 bucket or back in the instance Sagemaker is running in. When I try to export the .html file that is created from the pandas profile, I keep getting errors saying that the file cannot be found. Does anyone know of a way to export the .html file out of the temporary 24xl instance that the SKLearn processor is running in to S3? Below is the exact code I am using: ``` import os import sys import subprocess def install(package): subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package]) install('awswrangler') install('tqdm') install('pandas') install('botocore==1.19.4') install('ruamel.yaml') install('pandas-profiling==2.13.0') import awswrangler as wr import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from string import Template import gc import boto3 from pandas_profiling import ProfileReport client = boto3.client('s3') session = boto3.Session(region_name="eu-west-2") ``` ``` %%writefile casetableprofile.py import os import sys import subprocess def install(package): subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package]) install('awswrangler') install('tqdm') install('pandas') install('botocore') install('ruamel.yaml') install('pandas-profiling') import awswrangler as wr import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from string import Template import gc import boto3 from pandas_profiling import ProfileReport client = boto3.client('s3') session = boto3.Session(region_name="eu-west-2") def run_profile(): query = """ SELECT * FROM "healthcloud-refined"."case" ; """ tableforprofile = wr.athena.read_sql_query(query, database="healthcloud-refined", boto3_session=session, ctas_approach=False, workgroup='DataScientists') print("read in the table queried above") print("got rid of missing and added a new index") profile_tblforprofile = ProfileReport(tableforprofile, title="Pandas Profiling Report", minimal=True) print("Generated carerequest profile") return profile_tblforprofile if __name__ == '__main__': profile_tblforprofile = run_profile() print("Generated outputs") output_path_tblforprofile = ('profile_case.html') print(output_path_tblforprofile) profile_tblforprofile.to_file(output_path_tblforprofile) #Below is the only part where I am getting errors import boto3 import os s3 = boto3.resource('s3') s3.meta.client.upload_file('/opt/ml/processing/output/profile_case.html', 'intl-euro-uk-datascientist-prod','Mark/healthclouddataprofiles/{}'.format(output_path_tblforprofile)) ``` ``` import sagemaker from sagemaker.processing import ProcessingInput, ProcessingOutput session = boto3.Session(region_name="eu-west-2") bucket = 'intl-euro-uk-datascientist-prod' prefix = 'Mark' sm_session = sagemaker.Session(boto_session=session, default_bucket=bucket) sm_session.upload_data(path='./casetableprofile.py', bucket=bucket, key_prefix=f'{prefix}/source') ``` ``` import boto3 #import sagemaker from sagemaker import get_execution_role from sagemaker.sklearn.processing import SKLearnProcessor region = boto3.session.Session().region_name S3_ROOT_PATH = "s3://{}/{}".format(bucket, prefix) role = get_execution_role() sklearn_processor = SKLearnProcessor(framework_version='0.20.0', role=role, sagemaker_session=sm_session, instance_type='ml.m5.24xlarge', instance_count=1) ``` ``` sklearn_processor.run(code='s3://{}/{}/source/casetableprofile.py'.format(bucket, prefix), inputs=[], outputs=[ProcessingOutput(output_name='output', source='/opt/ml/processing/output', destination='s3://intl-euro-uk-datascientist-prod/Mark/')]) ``` Thank you in advance!!!
1
answers
0
votes
47
views
asked 2 months ago

Multi-arch Docker image deployment using CDK Pipelines

I'd like to build a multi-architecture Docker image, push it to the default CDK ECR repo, and then push it to different deployment stages (stacks in separate accounts) using CDK Pipelines. I create the image using something like the following: ``` IMAGE_TAG=${AWS_ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/cdk-hnb659fds-container-assets-${AWS_ACCOUNT}-${REGION}:myTag docker buildx build --progress=plain \ --platform linux/amd64,linux/arm64 --push \ --tag ${IMAGE_TAG} \ myDir/ ``` This results in three things pushed to ECR, two images and an image index (manifest). I'm then attempting to use the [cdk-ecr-deployment](https://github.com/cdklabs/cdk-ecr-deployment) to copy the image to a specific stack, for example: ``` cdk_ecr_deployment.ECRDeployment( self, "MultiArchImage", src=cdk_ecr_deployment.DockerImageName(f"{cdk_registry}:myTag"), dest=cdk_ecr_deployment.DockerImageName(f"{stack_registry}:myTag"), ) ``` However, this ends up copying only the image corresponding to the platform running the CDK deployment instead of the 2 images plus manifest. There's a [feature request](https://github.com/cdklabs/cdk-ecr-deployment/issues/192) open on `cdk-ecr-deployment` to support multi-arch images. I'm hoping someone might be able to suggest a modification to the above or some alternative that achieves the same goal, which is to deploy the image to multiple environments using CDK Pipelines. I also tried building the images + manifest into a tarball locally and then using the `aws_ecr_assets.TarballImageAsset` construct, but I encountered this [open issue ](https://github.com/aws/aws-cdk/issues/18044) when attempting the deployment locally. I'm not sure if the `TarballImageAsset` supports a multi-arch image, as it seems like the `DockerImageAsset` doesn't. Any ideas?
1
answers
0
votes
45
views
asked 2 months ago