passing a numpy array to predict_fn when making inference for xgboost model

0

I have a model that's trained locally and deployed to SageMaker to make inferences / invoke endpoint. When I try to make predictions, I get the following exception.

raise ValueError('Input numpy.ndarray must be 2 dimensional')
ValueError: Input numpy.ndarray must be 2 dimensional
    

My model is a xgboost model with some pre-processing (variable encoding) and hyper-parameter tuning. Here's what model object looks like:

XGBRegressor(colsample_bytree=xxx, gamma=xxx,
             learning_rate=xxx, max_depth=x, n_estimators=xxx,
             subsample=xxx)

My test data is a string of float values which is turned into an array as the data must be passed as numpy array.

testdata = [........., 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 2000, 200, 85, 412412, 123, 41, 552, 50000, 512, 0.1, 10.0, 2.0, 0.05]

I have tried to reshape the numpy array from 1d to 2d, however, that doesn't work as the number of features between test data and trained model do not match.

My question is how do I pass a numpy array same as the length of # of features in trained model? I am able to make predictions by passing test data as a list locally.

More info on inference script here: https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/xgboost_script_mode_local_training_and_serving/code/inference.py

Traceback (most recent call last):
File "/miniconda3/lib/python3.6/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper
return fn(*args, **kwargs)
File "/opt/ml/code/inference.py", line 75, in predict_fn
prediction = model.predict(input_data)
File "/miniconda3/lib/python3.6/site-packages/xgboost/sklearn.py", line 448, in predict
test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs)
File "/miniconda3/lib/python3.6/site-packages/xgboost/core.py", line 404, in __init__
self._init_from_npy2d(data, missing, nthread)
File "/miniconda3/lib/python3.6/site-packages/xgboost/core.py", line 474, in _init_from_npy2d
raise ValueError('Input numpy.ndarray must be 2 dimensional')
ValueError: Input numpy.ndarray must be 2 dimensional
  • I see you say reshaping "doesn't work as the number of features between test data and trained model do not match" but don't quite understand this? Your input dimensionality for inference should be the same as for training right? And at inference time a leading 'batch' dimension would be expected for efficiently processing multiple samples at once?

질문됨 2년 전1774회 조회
2개 답변
0

Try converting your list to a numpy 2d array like so:

a = np.array([1, 2, 3])

and replace [1, 2, 3] with your list.

profile pictureAWS
전문가
답변함 2년 전
0

XGBoost, similar to scikit-learn, expects X as 2D data (n_samples, n_features). In order to predict one sample, you need to reshape your list or feature vector to a 2D array.

import numpy as np

lst = [1, 2, 3]
lst_reshaped = np.array(lst).reshape((1,-1))
clf.predict(lst_reshaped)
AWS
Zmnako
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠