Cant generate XGBoost training report in sagemaker, only profiler_report.

0

I am trying to generate the XGBoost training report to see feature importances however the following code only generates the profiler report.

import boto3, re, sys, math, json, os, sagemaker, urllib.request
from sagemaker import get_execution_role
import numpy as np
import pandas as pd
from sagemaker.predictor import csv_serializer
from sagemaker.debugger import Rule, rule_configs

# Define IAM role
rules=[
    Rule.sagemaker(rule_configs.create_xgboost_report())
]
role = get_execution_role()
prefix = 'sagemaker/models'
my_region = boto3.session.Session().region_name 

# this line automatically looks for the XGBoost image URI and builds an XGBoost container.
xgboost_container = sagemaker.image_uris.retrieve("xgboost", my_region, "latest")



bucket_name = 'binary-base' 
s3 = boto3.resource('s3')
try:
    if  my_region == 'us-east-1':
      s3.create_bucket(Bucket=bucket_name)
    else: 
      s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={ 'LocationConstraint': my_region })
    print('S3 bucket created successfully')
except Exception as e:
    print('S3 error: ',e)

boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'train/train.csv')).upload_file('../Data/Base_Model_Data_No_Labels/train.csv')
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'validation/val.csv')).upload_file('../Data/Base_Model_Data_No_Labels/val.csv')
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'test/test.csv')).upload_file('../Data/Base_Model_Data/test.csv'


sess = sagemaker.Session()
xgb = sagemaker.estimator.Estimator(xgboost_container,
                                    role, 
                                    volume_size =5,
                                    instance_count=1, 
                                    instance_type='ml.m4.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket_name, prefix, 'xgboost_model'),
                                    sagemaker_session=sess, 
                                    rules=rules)

xgb.set_hyperparameters(objective='binary:logistic',
                        num_round=100, 
                        scale_pos_weight=8.5)

xgb.fit({'train': s3_input_train, "validation": s3_input_val}, wait=True)

When Checking the output path via:

rule_output_path = xgb.output_path + "/" + xgb.latest_training_job.job_name + "/rule-output"
! aws s3 ls {rule_output_path} --recursive

Which Outputs:

2022-07-07 18:40:27     329715 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-report.html
2022-07-07 18:40:26     171087 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-report.ipynb
2022-07-07 18:40:23        191 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/BatchSize.json
2022-07-07 18:40:23        199 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/CPUBottleneck.json
2022-07-07 18:40:23        126 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/Dataloader.json
2022-07-07 18:40:23        127 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/GPUMemoryIncrease.json
2022-07-07 18:40:23        198 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/IOBottleneck.json
2022-07-07 18:40:23        119 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/LoadBalancing.json
2022-07-07 18:40:23        151 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/LowGPUUtilization.json
2022-07-07 18:40:23        179 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/MaxInitializationTime.json
2022-07-07 18:40:23        133 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/OverallFrameworkMetrics.json
2022-07-07 18:40:23        465 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/OverallSystemUsage.json
2022-07-07 18:40:23        156 sagemaker/models/output/xgboost-2022-07-07-18-35-55-436/rule-output/ProfilerReport-1657218955/profiler-output/profiler-reports/StepOutlier.json

As you can see only the profiler report in created which does not interest me. Why isn't there a CreateXGBoostReport folder generated with the training report? How do I generate this/what am I missing?

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions