SageMaker Debugger: cannot load training information of estimator

0

I am using a SageMaker notebook for training a ML model. When I created and trained the estimator successfully with the following script, I could load the debugging information (s3_output_path) as expected:

from sagemaker.debugger import Rule, DebuggerHookConfig, CollectionConfig, rule_configs
rules = [
    Rule.sagemaker(rule_configs.loss_not_decreasing()),
    Rule.sagemaker(rule_configs.vanishing_gradient()),
    Rule.sagemaker(rule_configs.overfit()),
    Rule.sagemaker(rule_configs.overtraining()),
    Rule.sagemaker(rule_configs.poor_weight_initialization())]

collection_configs=[CollectionConfig(name="CrossEntropyLoss_output_0", parameters={
    "include_regex": "CrossEntropyLoss_output_0", "train.save_interval": "100","eval.save_interval": "10"})]

debugger_config = DebuggerHookConfig(
    collection_configs=collection_configs)

estimator = PyTorch(
role=sagemaker.get_execution_role(),
instance_count=1,
instance_type="ml.m5.xlarge",
#instance_type="ml.g4dn.2xlarge",
entry_point="train.py",
framework_version="1.8",
py_version="py36",
hyperparameters=hyperparameters,
debugger_hook_config=debugger_config,
rules=rules,
)

estimator.fit({"training": inputs})

s3_output_path = estimator.latest_job_debugger_artifacts_path()

After the kernel died, I attached the estimator and tried to access the debugging information of the training:

estimator = sagemaker.estimator.Estimator.attach('pytorch-training-2022-06-07-11-07-09-804')

s3_output_path = estimator.latest_job_debugger_artifacts_path()
rules_path = estimator.debugger_rules

The return values of these 2 functions were None. Could this be a problem with the attach-function? And how can I access training information of the debugger after the kernel was shut down?

1 Answer
0

Hello,

It will be difficult to analyze why the two values of the functions were none. We will require more information on the same. I will request you to open a support case with AWS PS and one of our Sagemaker engineer should be able to assist you.

Thank you.

SUPPORT ENGINEER
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions