How to specify target feature in Sagemaker XGBoost?

0

I am considering migrating a data science project from Datarobot to Sagemaker. I am familiar with writing Python and have been going through one of the tutorial Jupyter notebooks to see how to explore the data and to build and deploy and estimator. But, I cannot see how to specify the target feature. I have entirely numerical data in a csv file. One of the fields in that file is the intended target for estimation, the rest are information from which the estimate is to be made.

How do I specify the column that is to be estimated? The code I expect should have this is ...

container = sm.image_uris.retrieve("xgboost", session.boto_region_name, "1.5-1")

xgb = sm.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path="s3://xxxxxx001/",
    sagemaker_session=session,
)

xgb.set_hyperparameters(
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.8,
    verbosity=0,
    num_round=100,
)
s3_input_train = TrainingInput(
    s3_data="s3://xxxxxx001/data.csv", content_type="csv"
)
xgb.fit({"train": s3_input_train})

asked 4 months ago27 views
1 Answer
0
Accepted Answer

On a badly formatted page on the AWS documentation, I found a statement that - the CSV file must have no headers and the target field must be the first field. So, apparently, it is not possible to specify the target. So primitive, yeah?

answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions