How to specify target feature in Sagemaker XGBoost?

0

I am considering migrating a data science project from Datarobot to Sagemaker. I am familiar with writing Python and have been going through one of the tutorial Jupyter notebooks to see how to explore the data and to build and deploy and estimator. But, I cannot see how to specify the target feature. I have entirely numerical data in a csv file. One of the fields in that file is the intended target for estimation, the rest are information from which the estimate is to be made.

How do I specify the column that is to be estimated? The code I expect should have this is ...

container = sm.image_uris.retrieve("xgboost", session.boto_region_name, "1.5-1")

xgb = sm.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path="s3://xxxxxx001/",
    sagemaker_session=session,
)

xgb.set_hyperparameters(
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.8,
    verbosity=0,
    num_round=100,
)
s3_input_train = TrainingInput(
    s3_data="s3://xxxxxx001/data.csv", content_type="csv"
)
xgb.fit({"train": s3_input_train})

已提问 2 年前228 查看次数
1 回答
0
已接受的回答

On a badly formatted page on the AWS documentation, I found a statement that - the CSV file must have no headers and the target field must be the first field. So, apparently, it is not possible to specify the target. So primitive, yeah?

已回答 2 年前
profile picture
专家
已审核 15 天前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则