Questions tagged with Amazon SageMaker

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

SageMaker batch transform not loading CSV correctly

I am running a batch transform job that us uploading data from a CSV. The CSV is formatted as such ``` "joe annes rifle accesories discount" "cute puppies for sale" "Two dudes talk about sports" "Smith & Wesson M&P 500 review" "Glock vs 1911 handgun" ``` My code for creating the batch transform is below ``` elec_model = PyTorchModel(model_data='s3://some_path/binary-models/tar_models/14_10_2022__19_54_23_arms_ammunition.tar.gz', role=role, entry_point='torchserve_.py', source_dir='source_dir', framework_version='1.12.0', py_version='py38') ``` ``` nl_detector = elec_model.transformer( instance_count = 1, instance_type = 'ml.g4dn.xlarge', strategy="MultiRecord", assemble_with="Line", output_path = "s3://some_path/trash_output") ``` ``` nl_detector.transform("s3://brand-safety-training-data/trash", content_type="text/csv", split_type="Line") ``` When I run this code instead of the batch job taking the CSV and breaking up the examples with every space, which is what ``` split_type="Line" ``` is telling the algorithm to do, but instead it just ingests all of the sentences in the above CSV, and outputs 1 probability. Also, if I do the same thing with the same code, but switch ``` strategy="MultiRecord" ``` to ``` strategy="SingleRecord" ``` so the one code block would look like this ``` nl_detector = elec_model.transformer( instance_count = 1, instance_type = 'ml.g4dn.xlarge', strategy="SingleRecord", assemble_with="Line", output_path = "s3://some_path/trash_output") ``` The algorithm works correctly, and performs inference on all of the above sentences in the CSV correctly. Any reason why this is happening? EDIT 1: When I print the input payload it looks like this ``` "joe annes rifle accesories discount" 2022-10-26T21:03:04,265 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "cute puppies for sale" 2022-10-26T21:03:04,265 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "Two dudes talk about sports" 2022-10-26T21:03:04,265 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle ``` Where each sentence is a inference example, and is separated by this statement ``` 2022-10-26T21:03:04,265 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle ``` So it seems like sagemaker is separating the inference examples. But when I try and pass these sentences into a huggingface tokenizer, the tokenizer tokenizes them like they are one inference example, when they should be 3 distinct inference examples.
1
answers
0
votes
93
views
asked a month ago

how to add autoscaling policy to an sagemaker endpoint via terraform?

based on the documentation here, https://github.com/aws/amazon-sagemaker-examples/blob/main/async-inference/Async-Inference-Walkthrough.ipynb, an autoscaling policy could be added to an sagemaker async endpoint . I want to set this via terraform and have this so far (sample below) , but i'm not how to set Dimensions attribute in TargetTrackingScalingPolicyConfiguration in terraform? ``` TargetTrackingScalingPolicyConfiguration={ "TargetValue": 5.0, SageMakerVariantInvocationsPerInstance "CustomizedMetricSpecification": { "MetricName": "ApproximateBacklogSizePerInstance", "Namespace": "AWS/SageMaker", "Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}], "Statistic": "Average", }, ``` ``` resource "aws_appautoscaling_target" "sagemaker_target" { max_capacity = 3 min_capacity = 1 resource_id = "myendpoint" scalable_dimension = "sagemaker:variant:DesiredInstanceCount" service_namespace = "sagemaker" } resource "aws_appautoscaling_policy" "sagemaker_policy" { name = "somepolicy" policy_type = "TargetTrackingScaling" resource_id = aws_appautoscaling_target.sagemaker_target.resource_id scalable_dimension = aws_appautoscaling_target.sagemaker_target.scalable_dimension service_namespace = aws_appautoscaling_target.sagemaker_target.service_namespace target_tracking_scaling_policy_configuration { customized_metric_specification{ metric_name = "ApproximateBacklogSizePerInstance" namespace = "AWS/SageMaker" Dimensions = ....????? statistic = "Average" } target_value = 3 scale_in_cooldown =300 scale_out_cooldown = 600 } } ```
1
answers
0
votes
26
views
asked a month ago

I want to use the model monitor in the sagemaker pipeline to get the results of statistics.json and constraints.json

### Summary. - I would like to know what the CSV format is for getting aws sagemaker model quality metrics (regression metrics). ### Contents - The model monitor in sagemaker is used to evaluate models (own models, batch processing). At this time, I believe that using the AWS mechanism, I can get statistics.json and constraints.json as a result. In the model monitor mechanism, there are three parameters for "problem_type" in the following script. I think there are three parameters for "regression", "BinaryClassification" and "MulticlassClassification", but I don't know what CSV format to send the content in for "regression", and I'm not sure if AWS will handle it appropriately. Question. Incidentally, in the case of classification, 'BinaryClassification' and 'MulticlassClassification' results from the creation of appropriate CSVs. `The CSV in the case of classification is created below.` ```py probability, prediction, label [0.99,0.88], 0, 0 [0.34,0.77], 1, 0 ・・・ ``` - I don't know what kind of CSV content I should send to get the 'Regression' results. Here are some excerpts from pipeline.py and the CSV content we are currently testing. `Here are some excerpts from 'pipeline.py` ```py model_quality_check_config = ModelQualityCheckConfig( baseline_dataset=step_transform.properties.TransformOutput.S3OutputPath, dataset_format=DatasetFormat.csv(header=False), output_s3_uri= f"sagemaker/monitor/", problem_type='Regression', inference_attribute= "_c1", ground_truth_attribute= "_c0" ) model_quality_check_step = QualityCheckStep( name="ModelQualityCheckStep", depends_on = ["lambdacsvtransform"], skip_check=skip_check_model_quality, register_new_baseline=register_new_baseline_model_quality, quality_check_config=model_quality_check_config, check_job_config=check_job_config, supplied_baseline_statistics=supplied_baseline_statistics_model_quality, supplied_baseline_constraints=supplied_baseline_constraints_model_quality, model_package_group_name="group" ) ``` `CSV(Regression)` ```py "_c0", "_c0" 0.88, 0.99 0.66, 0.87 ・・・ ```
1
answers
0
votes
34
views
asked a month ago