By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon SageMaker

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

How to save a .html file to S3 that is created in a Sagemaker processing container

**Error message:** "FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/processing/output/profile_case.html'" **Background:** I am working in Sagemaker using python trying to profile a dataframe that is saved in a S3 bucket with pandas profiling. The data is very large so instead of spinning up a large EC2 instance, I am using a SKLearn processor. Everything runs fine but when the job finishes it does not save the pandas profile (a .html file) in a S3 bucket or back in the instance Sagemaker is running in. When I try to export the .html file that is created from the pandas profile, I keep getting errors saying that the file cannot be found. Does anyone know of a way to export the .html file out of the temporary 24xl instance that the SKLearn processor is running in to S3? Below is the exact code I am using: ``` import os import sys import subprocess def install(package): subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package]) install('awswrangler') install('tqdm') install('pandas') install('botocore==1.19.4') install('ruamel.yaml') install('pandas-profiling==2.13.0') import awswrangler as wr import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from string import Template import gc import boto3 from pandas_profiling import ProfileReport client = boto3.client('s3') session = boto3.Session(region_name="eu-west-2") ``` ``` %%writefile casetableprofile.py import os import sys import subprocess def install(package): subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package]) install('awswrangler') install('tqdm') install('pandas') install('botocore') install('ruamel.yaml') install('pandas-profiling') import awswrangler as wr import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from string import Template import gc import boto3 from pandas_profiling import ProfileReport client = boto3.client('s3') session = boto3.Session(region_name="eu-west-2") def run_profile(): query = """ SELECT * FROM "healthcloud-refined"."case" ; """ tableforprofile = wr.athena.read_sql_query(query, database="healthcloud-refined", boto3_session=session, ctas_approach=False, workgroup='DataScientists') print("read in the table queried above") print("got rid of missing and added a new index") profile_tblforprofile = ProfileReport(tableforprofile, title="Pandas Profiling Report", minimal=True) print("Generated carerequest profile") return profile_tblforprofile if __name__ == '__main__': profile_tblforprofile = run_profile() print("Generated outputs") output_path_tblforprofile = ('profile_case.html') print(output_path_tblforprofile) profile_tblforprofile.to_file(output_path_tblforprofile) #Below is the only part where I am getting errors import boto3 import os s3 = boto3.resource('s3') s3.meta.client.upload_file('/opt/ml/processing/output/profile_case.html', 'intl-euro-uk-datascientist-prod','Mark/healthclouddataprofiles/{}'.format(output_path_tblforprofile)) ``` ``` import sagemaker from sagemaker.processing import ProcessingInput, ProcessingOutput session = boto3.Session(region_name="eu-west-2") bucket = 'intl-euro-uk-datascientist-prod' prefix = 'Mark' sm_session = sagemaker.Session(boto_session=session, default_bucket=bucket) sm_session.upload_data(path='./casetableprofile.py', bucket=bucket, key_prefix=f'{prefix}/source') ``` ``` import boto3 #import sagemaker from sagemaker import get_execution_role from sagemaker.sklearn.processing import SKLearnProcessor region = boto3.session.Session().region_name S3_ROOT_PATH = "s3://{}/{}".format(bucket, prefix) role = get_execution_role() sklearn_processor = SKLearnProcessor(framework_version='0.20.0', role=role, sagemaker_session=sm_session, instance_type='ml.m5.24xlarge', instance_count=1) ``` ``` sklearn_processor.run(code='s3://{}/{}/source/casetableprofile.py'.format(bucket, prefix), inputs=[], outputs=[ProcessingOutput(output_name='output', source='/opt/ml/processing/output', destination='s3://intl-euro-uk-datascientist-prod/Mark/')]) ``` Thank you in advance!!!
1
answers
0
votes
50
views
asked 2 months ago

SageMaker Debugger: cannot load training information of estimator

I am using a SageMaker notebook for training a ML model. When I created and trained the estimator successfully with the following script, I could load the debugging information (s3_output_path) as expected: ``` from sagemaker.debugger import Rule, DebuggerHookConfig, CollectionConfig, rule_configs rules = [ Rule.sagemaker(rule_configs.loss_not_decreasing()), Rule.sagemaker(rule_configs.vanishing_gradient()), Rule.sagemaker(rule_configs.overfit()), Rule.sagemaker(rule_configs.overtraining()), Rule.sagemaker(rule_configs.poor_weight_initialization())] collection_configs=[CollectionConfig(name="CrossEntropyLoss_output_0", parameters={ "include_regex": "CrossEntropyLoss_output_0", "train.save_interval": "100","eval.save_interval": "10"})] debugger_config = DebuggerHookConfig( collection_configs=collection_configs) estimator = PyTorch( role=sagemaker.get_execution_role(), instance_count=1, instance_type="ml.m5.xlarge", #instance_type="ml.g4dn.2xlarge", entry_point="train.py", framework_version="1.8", py_version="py36", hyperparameters=hyperparameters, debugger_hook_config=debugger_config, rules=rules, ) estimator.fit({"training": inputs}) s3_output_path = estimator.latest_job_debugger_artifacts_path() ``` After the kernel died, I attached the estimator and tried to access the debugging information of the training: ``` estimator = sagemaker.estimator.Estimator.attach('pytorch-training-2022-06-07-11-07-09-804') s3_output_path = estimator.latest_job_debugger_artifacts_path() rules_path = estimator.debugger_rules ``` The return values of these 2 functions were None. Could this be a problem with the attach-function? And how can I access training information of the debugger after the kernel was shut down?
1
answers
0
votes
62
views
asked 2 months ago

Sagemaker Data Capture does not write files

I want to enable data capture for a specific endpoint (so far, only via the console). The endpoint works fine and also logs & returns the desired results. However, no files are written to the specified S3 location. ### Endpoint Configuration ### The endpoint is based on a training job with a scikit learn classifier. It has only one variant which is a `ml.m4.xlarge` instance type. Data Capture is enabled with a sampling percentage of 100%. As data capture storage locations I tried `s3://<bucket-name>` as well as `s3://<bucket-name>/<some-other-path>`. With the "Capture content type" I tried leaving everything blank, setting `text/csv` in "CSV/Text" and `application/json` in "JSON". ### Endpoint Invokation ### The endpoint is invoked in a Lambda function with a client. Here's the call: ``` sagemaker_body_source = { "segments": segments, "language": language } payload = json.dumps(sagemaker_body_source).encode() response = self.client.invoke_endpoint(EndpointName=endpoint_name, Body=payload, ContentType='application/json', Accept='application/json') result = json.loads(response['Body'].read().decode()) return result["predictions"] ``` Internally, the endpoint uses a Flask API with an `/invocation` path that returns the result. ### Logs ### The endpoint itself works fine and the Flask API is logging input and output: ``` INFO:api:body: {'segments': [<strings...>], 'language': 'de'} ``` ``` INFO:api:output: {'predictions': [{'text': 'some text', 'label': 'some_label'}, ....]} ```
1
answers
0
votes
43
views
asked 2 months ago

not authorized to perform: sagemaker:CreateModel on resource

I have been given AmazonSagemakerFullAccess by my companie's AWS admin. No one at our company can figure out why I can't get this line to run to launch the model. ***** CODE PRODUCING ERROR ***** lang_id = sagemaker.Model( image_uri=container, model_data=model_location, role=role, sagemaker_session=sess ) lang_id.deploy(initial_instance_count=1, instance_type="ml.t2.medium") ***** ERROR MESSAGE ***** --------------------------------------------------------------------------- ClientError Traceback (most recent call last) <ipython-input-5-4c80ec284a4b> in <module> 2 image_uri=container, model_data=model_location, role=role, sagemaker_session=sess 3 ) ----> 4 lang_id.deploy(initial_instance_count=1, instance_type="ml.t2.medium") 5 6 from sagemaker.deserializers import JSONDeserializer ~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/sagemaker/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, **kwargs) 1132 1133 self._create_sagemaker_model( -> 1134 instance_type, accelerator_type, tags, serverless_inference_config 1135 ) 1136 ~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/sagemaker/model.py in _create_sagemaker_model(self, instance_type, accelerator_type, tags, serverless_inference_config) 671 tags=tags, 672 ) --> 673 self.sagemaker_session.create_model(**create_model_args) 674 675 def _ensure_base_name_if_needed(self, image_uri, script_uri, model_uri): ~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/sagemaker/session.py in create_model(self, name, role, container_defs, vpc_config, enable_network_isolation, primary_container, tags) 2715 raise 2716 -> 2717 self._intercept_create_request(create_model_request, submit, self.create_model.__name__) 2718 return name 2719 ~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/sagemaker/session.py in _intercept_create_request(self, request, create, func_name) 4294 func_name (str): the name of the function needed intercepting 4295 """ -> 4296 return create(request) 4297 4298 ~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/sagemaker/session.py in submit(request) 2703 LOGGER.debug("CreateModel request: %s", json.dumps(request, indent=4)) 2704 try: -> 2705 self.sagemaker_client.create_model(**request) 2706 except ClientError as e: 2707 error_code = e.response["Error"]["Code"] ~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 506 ) 507 # The "self" in this scope is referring to the BaseClient. --> 508 return self._make_api_call(operation_name, kwargs) 509 510 _api_call.__name__ = str(py_operation_name) ~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 909 error_code = parsed_response.get("Error", {}).get("Code") 910 error_class = self.exceptions.from_code(error_code) --> 911 raise error_class(parsed_response, operation_name) 912 else: 913 return parsed_response ClientError: An error occurred (AccessDeniedException) when calling the CreateModel operation: User: arn:aws:sts::XXXXXXXXXX:assumed-role/sagemakeraccesstoservices/SageMaker is not authorized to perform: sagemaker:CreateModel on resource: arn:aws:sagemaker:us-east-2:XXXXXXXXXX:model/blazingtext-2022-08-09-13-58-21-739 because no identity-based policy allows the sagemaker:CreateModel action
1
answers
0
votes
50
views
asked 2 months ago

How can I terminate Amazon SageMaker RunInstance?

Hello. I had an unexpected billing, and that was because the SageMaker RunInstance was still running. (Especailly DataWrangler ; see the screenshot below.) I didn't know how to terminate, so I contacted to the AWS support center. I followed all the instructions they gave, which means that I deleted endpoints / models / notebook instances / s3buckets / cloudwatch log groups. But after 24 hours of monitoring, AWS support center said that the SageMaker RunInstance is still running. They gave me the same instructions and one additional instruction : stop the training jobs (https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tasks-stop-training-job.html) Firstly I checked endpoints / models / notebook instances / s3buckets / cloudwatch log groups again, and I found nothing in those tabs. Secondly I tried to stop the jobs, but I have deleted the domain since I tried to delete everything - thought it would be better (Maybe that was a fault) - So I created the domain again with quick start option (to get access to the studio), and got into the studio to terminate jobs. But all of the jobs are already completed. There's nothing with activated 'stop training jobs' button. So I followed the instruction, but there was nothing I could do. I really want to STOP this SageMaker RunInstance, but I don't know why it's still running. What should I do..? Please help me. ![Enter image description here](https://repost.aws/media/postImages/original/IMCxdk5Ab-QYuhAWNl1kt6jw)
1
answers
0
votes
95
views
asked 2 months ago