Sagemaker: disable_output_compression=True does not work for SKLearn estimator

0

Hi,

I am instantiating an SKLearn estimator like this:

estimator = SKLearn(
    entry_point='steps/training.py',
    role=sm_execution_role,
    image_uri=image_uri,
    instance_type=instance_type,
    sagemaker_session=pipeline_session,
    keep_alive_period_in_seconds=600,
    disable_output_compression=True,
)

Then I instantiate a HyperparameterTuner instance: tuner=HyperparameterTuner(estimator=estimator,...)

The above are used in a Sagemaker model building pipeline. Despite setting disable_output_compression=True in the estimator, the model and output artifacts produced by the training jobs are in tar.gz format. Does this not work when used inside a HyperparameterTuner?

asked 7 months ago118 views
1 Answer
0

The issue you're experiencing with the disable_output_compression parameter not working as expected is likely due to its usage within a HyperparameterTuner context.

When you use an estimator inside a HyperparameterTuner, some of the estimator's configurations may be overridden or not directly applied to the individual training jobs launched by the tuner. The HyperparameterTuner creates multiple training jobs with different hyperparameter configurations, and it may not propagate all the estimator settings to each of these jobs.

To address this, you have a couple of options:

  1. Instead of setting disable_output_compression on the estimator, you can try to set it as a static hyperparameter in the HyperparameterTuner configuration. This way, it will be applied to all training jobs launched by the tuner.

  2. If you're using a SageMaker pipeline, you might need to set this parameter at the pipeline step level rather than on the estimator itself.

If these approaches don't work, you may need to handle the decompression of the output artifacts after the training jobs complete. Alternatively, you could consider modifying your training script to save the model in an uncompressed format directly.

Remember that the exact behavior might depend on the specific version of SageMaker you're using and how the HyperparameterTuner interacts with the underlying training jobs. If the issue persists, you may want to consult the AWS documentation for any updates or known limitations regarding this parameter in the context of hyperparameter tuning jobs.
Sources
Configure and Launch a Hyperparameter Tuning Job - Amazon SageMaker
How to build, train, and deploy a machine learning model with Amazon SageMaker

profile picture
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions