Embedding categorical feature indices for the Catboost algorithm in AWS

0

According to the below link : https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb if the prediction includes categorical feature(s), a json-format file have to be used. Preparing the JSON file was confusing. here is more information provided there: " If the predictors include categorical feature(s), a json-format file named 'categorical_index.json' should be included in the input directory to indicate the column index(es) of the categorical features. Within the json-format file, it should have a python directory where the key is a string of 'cat_index_list' and the value is a list of unique integer(s). Each integer in the list indicates the column index of categorical features in the 'data.csv'. The range of each integer should be more than 0 (index 0 indicates the target) and less than the total number of columns."

Could you please advise me to create the JSON-format file for the categorical feature indices?

1 Answer
0

According to the documentation : Categorical features input format: If your predictors include categorical features, you can provide a JSON file named categorical_index.json in the same location as your data directories. This file should contain a Python dictionary where the key is the string "cat_index_list" and the value is a list of unique integers. Each integer in the value list should indicate the column index of the corresponding categorical features in your training data CSV file. Each value should be a positive integer (greater than zero because zero represents the target value), less than the Int32.MaxValue (2147483647), and less than the total number of columns. There should only be one categorical index JSON file. {"cat_index_list":[ 1,2,3]} 1,2,3 being the indices of the categorial columns.

AWS
EXPERT
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions