Can I get class probabilities from an XGBoost model, using batch transform?

0

AutoML gave me an XGBoost model, which I would like to now use to run inference on a large CSV file.

After "deploying" the model, I figured out how to do this using a "batch transform" job. But the output appears to only be 0.0 or 1.0, class labels.

Is there a way to get class probabilities out from the model?

apullin
asked a year ago625 views
1 Answer
1

Hi Apullin,

May i know how are you making the predictions currently? Here is how i manage to get probability of a class instead of only 0 or 1.

#Deploying
xgb_predictor = xgb.deploy(initial_instance_count=1,
                           instance_type='any')

#Serializer
xgb_predictor.serializer = sagemaker.serializers.CSVSerializer()

#Making the prediction
xgb_predictor.predict(array).decode('utf-8')]

You may try integrating this method into how you are performing 'batch transform' currently and see if it works.

profile pictureAWS
EXPERT
ljunkai
answered a year ago
  • Also, do check that you did not use np.round() or anything similar on your response and hence converting the results to whole numbers.

  • Ah, I am using "batch prediction" through the web dashboard itself, via a CSV uploaded to S3. I am not at all familiar with the boto3 or the sagemaker python APIs ... but maybe I'll have to try it out.

    Is there a way to do it from just the web interface?

    For getting it to work in Python, I am up against a slightly different Sagemaker issue: The model (and many others) appear to exist in the output from Autopilot in Studio, but I don't know how to reference it outside of that context. I can "deploy" it to an endpoint, which is why I did the web batch transform.

  • Hi Apullin,

    Noted that you are using the 'Create batch transform job' from SageMaker Console. I have tested the scenario using XGBoost Model for Binary Classification and is getting the probabilities. May i know what are the configurations you have set when creating the job?

  • OK, after some tinkering, I was able to run this in a Sagemaker Studio notebook:

    automl_predictor1 = sm.predictor.Predictor( "TestEndpoint1" )
    automl_predictor1.serializer = sm.serializers.CSVSerializer()
    resp = pred1.predict( df.values[:100000] ).decode('utf-8')
    resp[:300]
    

    But that still just gives back 0.0\n0.0\n1.0\n0.0\n....

    So, no idea. I suppose I'll go back to trying to do CV parameter search on XGBoost locally ... so much for AutoML.

  • One more note: While the "Model Details" view does give me a name for the model, that model does not seem to be actually accessible anywhere. That is, back in the web dashboard view, under "Inference >> Models", the long long list of models tried by AutoML do not exist there. It DOES give the path for an S3 bucket with a model.tar.gz in it, though.

    Similarly, I am not able to reference that model via the sagemaker python, and ListModels only shows me the same entries as on the web dashboard.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions