Questions tagged with Machine Learning & AI

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Sagemaker Pipelines - Is it possible to use a TransformStep with the Catboost Estimator ?

Hi! I am trying to implement a Sagemaker Pipeline including the following steps (among other things): * **ProcessingStep**: processing script (PySparkProcessor) generating a train , validation and test dataset (csv) * **TrainingStep**: model training, CatBoost Estimator (https://docs.aws.amazon.com/sagemaker/latest/dg/catboost.html) * **TransformStep**: batch inference using the model on the test dataset (csv) The TransformStep returns the following error: **python3: can't open file 'serve': [Errno 2] No such file or directory** I wonder if I'm using TransformStep in the wrong way or if, at the moment, the use of TransformStep with the CatBoost model has not been implemented yet. Code: ``` [...] pyspark_processor = PySparkProcessor( base_job_name="sm-spark", framework_version="3.1", role=role_arn, instance_type="ml.m5.xlarge", instance_count=12, sagemaker_session=pipeline_session, max_runtime_in_seconds=2400, ) step_process_args = pyspark_processor.run( submit_app=os.path.join( s3_preprocess_script_dir, "preprocess.py" ), # Hack to fix cache hit submit_py_files=[os.path.join( s3_preprocess_script_dir, "preprocess_utils.py" ), os.path.join( s3_preprocess_script_dir, "spark_utils.py" )], outputs=[ ProcessingOutput( output_name="datasets", source="/opt/ml/processing/output", destination=s3_preprocess_output_path, ) ], arguments=["--aws_account", AWS_ACCOUNT, "--aws_env", AWS_ENV, "--project_name", PROJECT_NAME, "--mode", "training"], ) step_process = ProcessingStep( name="PySparkPreprocessing", step_args=step_process_args, cache_config=cache_config, ) train_model_id = "catboost-classification-model" train_model_version = "*" train_scope = "training" training_instance_type = "ml.m5.xlarge" # Retrieve the docker image train_image_uri = image_uris.retrieve( region=None, framework=None, model_id=train_model_id, model_version=train_model_version, image_scope=train_scope, instance_type=training_instance_type, ) # Retrieve the training script train_source_uri = script_uris.retrieve( model_id=train_model_id, model_version=train_model_version, script_scope=train_scope ) # Retrieve the pre-trained model tarball to further fine-tune train_model_uri = model_uris.retrieve( model_id=train_model_id, model_version=train_model_version, model_scope=train_scope ) training_job_name = name_from_base(f"jumpstart-{train_model_id}-training") # Create SageMaker Estimator instance tabular_estimator = Estimator( role=role_arn, image_uri=train_image_uri, source_dir=train_source_uri, model_uri=train_model_uri, entry_point="transfer_learning.py", instance_count=1, instance_type="ml.m5.xlarge", max_run=360000, hyperparameters=hyperparameters, sagemaker_session=pipeline_session, output_path=s3_training_output_path, disable_profiler=True, # The default profiler rule includes a timestamp which will change each time the pipeline is upserted, causing cache misses. If profiling is not needed, set disable_profiler to True on the estimator. ) # Launch a SageMaker Training job by passing s3 path of the training data step_train_args = tabular_estimator.fit( { "training": TrainingInput( s3_data=step_process.properties.ProcessingOutputConfig.Outputs[ "datasets" ].S3Output.S3Uri ) }, logs=True, job_name=training_job_name, ) step_train = TrainingStep( name="CatBoostTraining", step_args=step_train_args, cache_config=cache_config, ) script_eval = ScriptProcessor( image_uri=[MASKED], command=["python3"], instance_type="ml.m5.xlarge", instance_count=1, base_job_name="script-evaluation", role=role_arn, sagemaker_session=pipeline_session, ) eval_args = script_eval.run( inputs=[ ProcessingInput( source=step_train.properties.ModelArtifacts.S3ModelArtifacts, destination="/opt/ml/processing/model", ), ProcessingInput( source=step_process.properties.ProcessingOutputConfig.Outputs[ "datasets" ].S3Output.S3Uri, destination="/opt/ml/processing/input", ), ], outputs=[ ProcessingOutput( output_name="evaluation", source="/opt/ml/processing/evaluation", destination=s3_evaluation_output_path, ), ], code="common/evaluation.py", ) evaluation_report = PropertyFile( name="EvaluationReport", output_name="evaluation", path="evaluation.json" ) step_eval = ProcessingStep( name="Evaluation", step_args=eval_args, property_files=[evaluation_report], cache_config=cache_config, ) model = Model( image_uri="467855596088.dkr.ecr.eu-west-3.amazonaws.com/sagemaker-catboost-image:latest", model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts, sagemaker_session=pipeline_session, role=role_arn, ) evaluation_s3_uri = "{}/evaluation.json".format( step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"] ) model_step_args = model.create( instance_type="ml.m5.large", ) create_model = ModelStep(name="CatBoostModel", step_args=model_step_args) step_fail = FailStep( name="FailBranch", error_message=Join( on=" ", values=["Execution failed due to F1-score <", 0.8] ), ) cond_lte = ConditionGreaterThanOrEqualTo( left=JsonGet( step_name=step_eval.name, property_file=evaluation_report, json_path="classification_metrics.f1-score.value", ), right=f1_threshold, ) step_cond = ConditionStep( name="F1ScoreCondition", conditions=[cond_lte], if_steps=[create_model], else_steps=[step_fail], ) # Transform Job s3_test_transform_input = os.path.join(step_process.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"], "test") transformer = Transformer(model_name=create_model.properties.ModelName, instance_count=1, instance_type="ml.m5.xlarge", assemble_with="Line", accept="text/csv", output_path=s3_test_transform_output_path, sagemaker_session=pipeline_session) transform_step_args = transformer.transform( data=s3_test_transform_input, content_type="text/csv", split_type="Line", ) step_transform = TransformStep( name="InferenceTransform", step_args=transform_step_args, ) # Create and execute pipeline step_transform.add_depends_on([step_process, create_model]) pipeline = Pipeline( name=pipeline_name, steps=[step_process, step_train, step_eval, step_cond, step_transform], sagemaker_session=pipeline_session, ) pipeline.upsert(role_arn=role_arn, description=[MASKED]) execution = pipeline.start() execution.wait(delay=60, max_attempts=120) ```
2
answers
0
votes
47
views
HaPo
asked 17 days ago
1
answers
0
votes
46
views
asked 18 days ago

Help with Inference Script for Amazon Sagemaker Neo Compiled Models

Hello everyone, I was trying to execute the example mentioned in the docs - [https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.html](). I was able to successfully run this example but as soon as I changed the target_device to `jetson_tx2`, after which I ran the entire script again, keeping the rest of the code as it is, the model stopped working. I was not getting any inferences from the deployed model and it always errors out with the message: ``` An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from <users-sagemaker-endpoint> with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again." ``` According to the troubleshoot docs [https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-inference.html](), this seems to be an issue of **model_fn**() function. The inference script used by this example is mentioned here [https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_neo_compilation_jobs/pytorch_torchvision/code/resnet18.py]() , which itself doesn't contain any model_fn() definition but it still worked for target device `ml_c5`. So could anyone please help me with the following questions: 1. What changes does SageMaker Neo do to the model depending on `target_device` type? Since it seems the same model is loaded in a different way for different target device. 2. Is there any way to determine how the model is expected to load for a certain target_device type so that I could define the **model_fn**() function myself in the same inference script mentioned above? 3. At-last, can anyone please help with the inference script for this very same model as mentioned in docs above which works for `jetson_tx2` device as well. Any suggestions or links on how to resolve this issue would be really helpful.
1
answers
0
votes
33
views
Rupesh
asked 20 days ago