How to specify target feature in Sagemaker XGBoost?

0

I am considering migrating a data science project from Datarobot to Sagemaker. I am familiar with writing Python and have been going through one of the tutorial Jupyter notebooks to see how to explore the data and to build and deploy and estimator. But, I cannot see how to specify the target feature. I have entirely numerical data in a csv file. One of the fields in that file is the intended target for estimation, the rest are information from which the estimate is to be made.

How do I specify the column that is to be estimated? The code I expect should have this is ...

container = sm.image_uris.retrieve("xgboost", session.boto_region_name, "1.5-1")

xgb = sm.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path="s3://xxxxxx001/",
    sagemaker_session=session,
)

xgb.set_hyperparameters(
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.8,
    verbosity=0,
    num_round=100,
)
s3_input_train = TrainingInput(
    s3_data="s3://xxxxxx001/data.csv", content_type="csv"
)
xgb.fit({"train": s3_input_train})

feita há 2 anos230 visualizações
1 Resposta
0
Resposta aceita

On a badly formatted page on the AWS documentation, I found a statement that - the CSV file must have no headers and the target field must be the first field. So, apparently, it is not possible to specify the target. So primitive, yeah?

respondido há 2 anos
profile picture
ESPECIALISTA
avaliado há 21 dias

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas