Hello,
I am facing a problem as I not able to integrate a fitted sklearn processor/estimator in my sagemaker pipeline. I am defining the different steps in different functions as the follows:
def _get_step_preprocess(
pipeline_session: PipelineSession,
processing_instance_count: ParameterInteger,
role: str,
# input_data_uri: ParameterString,
subnet_id: str,
security_group_id: str,
) -> ProcessingStep:
"""
Step 1
This Step is preprocessing the data as a first step of the pipeline.
Args:
processing_instance_count (ParameterInteger): Number of instances
role (str): Sagemaker Execution Role
Returns:
ProcessingStep: Defined PreprocessingStep
"""
network_config = NetworkConfig(
enable_network_isolation=False,
security_group_ids=[security_group_id],
subnets=[subnet_id],
encrypt_inter_container_traffic=True,
)
sklearn_processor = FrameworkProcessor(
estimator_cls=SKLearn,
framework_version="1.0-1",
instance_count=processing_instance_count,
instance_type="ml.m5.xlarge",
sagemaker_session=pipeline_session,
base_job_name="name",
role=role,
network_config=network_config,
)
processor_args = sklearn_processor.run(
inputs=[],
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
ProcessingOutput(output_name="encoder", source="/opt/ml/processing/encoder"),
],
code="main.py",
source_dir="../sagemaker/step_preprocess",
)
step_preprocess = ProcessingStep(name="BankingSecondaryRejectionPreprocess", step_args=processor_args)
return step_preprocess
If seen in different examples that I am not only able to execute a script like in the given example but also fit a sklearn preprocessor which can be integrated in my final pipeline model and so in the whole inference endpoint. An example i came across was this:
https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.html
Nevertheless, I am not able to integrate the sklearn estimator from the example into my whole preprocessing step defined above. How is it done the right way? Is it even possible? The ProcessingStep seems not to be able to take a fitted estimator as an argument.
Thanks in advance