how to create a custom inference file using sagemaker sdk that allows me to call a custom predict function(combination rule based method and ML prediction) after model training.

0

I am currently working on a kmeans clustering algorithm for my dataset. Currently what i have done is to creating a preprocess.py that preprocess my data and stores it in s3 bucket.and train step function called via Estimator sdk.

input_data = ParameterString(
    name="InputDataUrl",
    default_value="s3://ml-pipeline-jobs/input_files/mydata.csv",
)

# processing step for feature engineering
sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1",
    instance_type=processing_instance_type,
    instance_count=processing_instance_count,
    base_job_name=f"{base_job_prefix}/sklearn-billofwork-preprocess",
    sagemaker_session=pipeline_session,
    role=role,
)
step_args = sklearn_processor.run(
    outputs=[
        ProcessingOutput(output_name="train_preprocessed", source="/opt/ml/processing/train/data_final"),
    ],
    code=os.path.join(BASE_DIR, "preprocess.py"),
    arguments=["--input-data", input_data],
)
step_process = ProcessingStep(
    name="PreprocessBillOfWorkData",
    step_args=step_args,
)

image_uri = sagemaker.image_uris.retrieve( framework="kmeans", region=region, py_version="py3", instance_type=training_instance_type, ) kmeans = Estimator( image_uri=image_uri, sagemaker_session=pipeline_session, role=role, instance_type=training_instance_type, instance_count=1, ) kmeans.set_hyperparameters( k= 40, feature_dim=27295 )

step_args_preprocess= TrainingInput(
s3_data=step_process.properties.ProcessingOutputConfig.Outputs["preprocessed_data"].S3Output.S3Uri, 
content_type="text/csv", 
)

step_train = TrainingStep(
    name="TrainBowModel",
    estimator=kmeans,
    inputs={
    "train":step_args_preprocess,
    }
 )

Now what i would like to do is to have a step created that accepts part of input data from step_process function and also accept a .py file that can take the new data and do some additional preprocessing and then perform .predict function.

I was able to complete until training dataset using aws sdks.But i am not sure how to proceed after this. I studied about how inferencing is done using AWS sdk and it seems there are 4 different types. But i clueless which exactly suits my type of problem.

Kindly please guide.Thanks

1 réponse
1
Réponse acceptée

To implement a custom prediction behavior, you can use "Script Mode".

You can specify "entry_point" argument for Model object, and use it in your pipeline via ModelStep.

AWS
répondu il y a 2 mois
  • Thanks for sharing the information @tomonori Shimomura You answered part of my question. Yes indeed I used the entry_poit method to run my custom file. The second part where i would like to use the inputs from step_process is achieved by writing the data to s3 bucket and calling it from inference file using boto3 client.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions