Integrate Sklearn Processing Step in Inference Pipeline

0

Hello,

I am facing a problem as I not able to integrate a fitted sklearn processor/estimator in my sagemaker pipeline. I am defining the different steps in different functions as the follows:

def _get_step_preprocess(
    pipeline_session: PipelineSession,
    processing_instance_count: ParameterInteger,
    role: str,
    # input_data_uri: ParameterString,
    subnet_id: str,
    security_group_id: str,
) -> ProcessingStep:
    """
    Step 1
    This Step is preprocessing the data as a first step of the pipeline.
    Args:
        processing_instance_count (ParameterInteger): Number of instances
        role (str): Sagemaker Execution Role

    Returns:
        ProcessingStep: Defined PreprocessingStep
    """

    network_config = NetworkConfig(
        enable_network_isolation=False,
        security_group_ids=[security_group_id],
        subnets=[subnet_id],
        encrypt_inter_container_traffic=True,
    )

    sklearn_processor = FrameworkProcessor(
        estimator_cls=SKLearn,
        framework_version="1.0-1",
        instance_count=processing_instance_count,
        instance_type="ml.m5.xlarge",
        sagemaker_session=pipeline_session,
        base_job_name="name",
        role=role,
        network_config=network_config,
    )

    processor_args = sklearn_processor.run(
        inputs=[],
        outputs=[
            ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
            ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
            ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
            ProcessingOutput(output_name="encoder", source="/opt/ml/processing/encoder"),
        ],
        code="main.py",
        source_dir="../sagemaker/step_preprocess",
    )

    step_preprocess = ProcessingStep(name="BankingSecondaryRejectionPreprocess", step_args=processor_args)

    return step_preprocess

If seen in different examples that I am not only able to execute a script like in the given example but also fit a sklearn preprocessor which can be integrated in my final pipeline model and so in the whole inference endpoint. An example i came across was this: https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.html

Nevertheless, I am not able to integrate the sklearn estimator from the example into my whole preprocessing step defined above. How is it done the right way? Is it even possible? The ProcessingStep seems not to be able to take a fitted estimator as an argument.

Thanks in advance

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions