Using the SageMaker Python SDK, I would like to specify the interaction_constraints
hyperparameter on an XGBoost Estimator.
The documentation specifies these should be given as nested lists of integers (xgboost itself allows both feature names and feature indices according to their documentation).
However, I can't seem to work past the validation logic in sagemaker-xgboost-container
: Given the following minimal example executed on SageMaker Studio:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput
role = get_execution_role()
image = sagemaker.image_uris.retrieve(
framework="xgboost",
region="eu-west-1",
version="1.7-1",
)
xgb_estimator = Estimator(
image_uri=image,
instance_type="ml.c5.2xlarge",
instance_count=1,
output_path="s3://my-bucket-name/",
role=role,
hyperparameters={
"num_round": "233",
"interaction_constraints": "[[2, 3]]",
"tree_method": "hist", # required for interaction_constraints, although the default of "auto" should be equivalent to "hist"
},
)
xgb_estimator.fit({"train": TrainingInput(s3_data="s3://my-bucket-name/train.csv", content_type="text/csv")})
and a csv like the following:
point_estimate,year,month,day,minute_of_day,weekday
9.0,2016,12,30,480,4
10.0,2016,12,30,495,4
18.0,2016,12,30,510,4
17.0,2016,12,30,525,4
18.0,2016,12,30,540,4
26.0,2016,12,30,555,4
14.0,2016,12,30,570,4
13.0,2016,12,30,585,4
17.0,2016,12,30,600,4
I get the following Traceback:
Traceback (most recent call last):
File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1617, in _transform_interaction_constraints
[feature_idx_mapping[feature_name] for feature_name in constraint]
File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1617, in <listcomp>
[feature_idx_mapping[feature_name] for feature_name in constraint]
KeyError: 2
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 319, in train_job
bst = xgb.train(
File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 620, in inner_f
return func(**kwargs)
File "/miniconda3/lib/python3.8/site-packages/xgboost/training.py", line 160, in train
bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1579, in __init__
params_processed = self._configure_constraints(params_processed)
File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1637, in _configure_constraints
] = self._transform_interaction_constraints(value)
File "/miniconda3/lib/python3.8/site-packages/xgboost/core.py", line 1621, in _transform_interaction_constraints
raise ValueError(
ValueError: Constrained features are not a subset of training data feature names
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/miniconda3/lib/python3.8/site-packages/sagemaker_containers/_trainer.py", line 84, in train
entrypoint()
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 102, in main
train(framework.training_env())
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 98, in train
run_algorithm_mode()
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 64, in run_algorithm_mode
sagemaker_train(
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 250, in sagemaker_train
train_job(**train_args)
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 415, in train_job
raise exc.AlgorithmError(f"{exception_prefix}:\n {str(e)}")
sagemaker_algorithm_toolkit.exceptions.AlgorithmError: XGB train call failed with exception:
Constrained features are not a subset of training data feature names
XGB train call failed with exception:
Constrained features are not a subset of training data feature names
Trying with feature names "interaction_constraints": "[['month', 'day']]"
instead, I get the following
Traceback (most recent call last):
File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 287, in validate
self.hyperparameters[hp].validate_range(value)
File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 198, in validate_range
if any([element not in self.range for outer in value for element in outer]):
File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 198, in <listcomp>
if any([element not in self.range for outer in value for element in outer]):
File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 368, in __contains__
or (self.min_closed is not None and value < self.min_closed)
TypeError: '<' not supported between instances of 'str' and 'int'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/miniconda3/lib/python3.8/site-packages/sagemaker_containers/_trainer.py", line 84, in train
entrypoint()
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 102, in main
train(framework.training_env())
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 98, in train
run_algorithm_mode()
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/training.py", line 64, in run_algorithm_mode
sagemaker_train(
File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 124, in sagemaker_train
validated_train_config = hyperparameters.validate(train_config)
File "/miniconda3/lib/python3.8/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 291, in validate
raise exc.AlgorithmError(
sagemaker_algorithm_toolkit.exceptions.AlgorithmError: Hyperparameter interaction_constraints: unexpected failure when validating [['month', 'day']] (caused by TypeError)
Hyperparameter interaction_constraints: unexpected failure when validating [['month', 'day']] (caused by TypeError)
So it seems sagemaker_xgboost_container
does not let me specify feature names in the first place, but while specifying indices makes it past the validation, they are given to xgboost in a way that they are interpreted as feature names. I also tried with variations like
"interaction_constraints": "[['2', '3']]"
"interaction_constraints": [['2', '3']]
"interaction_constraints": [[2, 3]]
with the same results as above. I also tried setting the columns in the csv to numeric indices, to no avail.
In what format are interaction constraints expected to be specified? Any insights are welcome.
Thanks in advance!
Thanks for the fast response, but unfortunately without headers I get the same traceback with
Constrained features are not a subset of training data feature names