Can I get class probabilities from an XGBoost model, using batch transform?

0

AutoML gave me an XGBoost model, which I would like to now use to run inference on a large CSV file.

After "deploying" the model, I figured out how to do this using a "batch transform" job. But the output appears to only be 0.0 or 1.0, class labels.

Is there a way to get class probabilities out from the model?

apullin
已提問 1 年前檢視次數 638 次
1 個回答
1

Hi Apullin,

May i know how are you making the predictions currently? Here is how i manage to get probability of a class instead of only 0 or 1.

#Deploying
xgb_predictor = xgb.deploy(initial_instance_count=1,
                           instance_type='any')

#Serializer
xgb_predictor.serializer = sagemaker.serializers.CSVSerializer()

#Making the prediction
xgb_predictor.predict(array).decode('utf-8')]

You may try integrating this method into how you are performing 'batch transform' currently and see if it works.

profile pictureAWS
專家
ljunkai
已回答 1 年前
  • Also, do check that you did not use np.round() or anything similar on your response and hence converting the results to whole numbers.

  • Ah, I am using "batch prediction" through the web dashboard itself, via a CSV uploaded to S3. I am not at all familiar with the boto3 or the sagemaker python APIs ... but maybe I'll have to try it out.

    Is there a way to do it from just the web interface?

    For getting it to work in Python, I am up against a slightly different Sagemaker issue: The model (and many others) appear to exist in the output from Autopilot in Studio, but I don't know how to reference it outside of that context. I can "deploy" it to an endpoint, which is why I did the web batch transform.

  • Hi Apullin,

    Noted that you are using the 'Create batch transform job' from SageMaker Console. I have tested the scenario using XGBoost Model for Binary Classification and is getting the probabilities. May i know what are the configurations you have set when creating the job?

  • OK, after some tinkering, I was able to run this in a Sagemaker Studio notebook:

    automl_predictor1 = sm.predictor.Predictor( "TestEndpoint1" )
    automl_predictor1.serializer = sm.serializers.CSVSerializer()
    resp = pred1.predict( df.values[:100000] ).decode('utf-8')
    resp[:300]
    

    But that still just gives back 0.0\n0.0\n1.0\n0.0\n....

    So, no idea. I suppose I'll go back to trying to do CV parameter search on XGBoost locally ... so much for AutoML.

  • One more note: While the "Model Details" view does give me a name for the model, that model does not seem to be actually accessible anywhere. That is, back in the web dashboard view, under "Inference >> Models", the long long list of models tried by AutoML do not exist there. It DOES give the path for an S3 bucket with a model.tar.gz in it, though.

    Similarly, I am not able to reference that model via the sagemaker python, and ListModels only shows me the same entries as on the web dashboard.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南