I am considering migrating a data science project from Datarobot to Sagemaker. I am familiar with writing Python and have been going through one of the tutorial Jupyter notebooks to see how to explore the data and to build and deploy and estimator. But, I cannot see how to specify the target feature. I have entirely numerical data in a csv file. One of the fields in that file is the intended target for estimation, the rest are information from which the estimate is to be made.
How do I specify the column that is to be estimated?
The code I expect should have this is ...
container = sm.image_uris.retrieve("xgboost", session.boto_region_name, "1.5-1")
xgb = sm.estimator.Estimator(
container,
role,
instance_count=1,
instance_type="ml.m4.xlarge",
output_path="s3://xxxxxx001/",
sagemaker_session=session,
)
xgb.set_hyperparameters(
max_depth=5,
eta=0.2,
gamma=4,
min_child_weight=6,
subsample=0.8,
verbosity=0,
num_round=100,
)
s3_input_train = TrainingInput(
s3_data="s3://xxxxxx001/data.csv", content_type="csv"
)
xgb.fit({"train": s3_input_train})