Skip to content

How do I save the logs of a SageMaker training job, with accept_eula=True?

0

I would like to save the logs from a SageMaker training job, following something similar to the code snippet below.

estimator = JumpStartEstimator(
    model_id = "...",
    environment = { "accept_eula": "true" }
)
attached_estimator = JumpStartEstimator.attach(job_name)
attached_estimator.logs()

However, there are two issues with the code:

  1. attached_estimator.logs() prints the logs. I would need a text string that contains the logs, to save this to a file.
  2. Where should I set the { "accept_eula": "true" }? The first line was added, based on the documentation here: https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-choose.html. However, it does not seem necessary for this code to work. The code works with just the last two lines.
1 Answer
0

To save the logs of a SageMaker training job, you'll need to use CloudWatch instead of the logs() method on the estimator. Here's how you can modify your approach:

  1. First, ensure you have the necessary permissions to access CloudWatch logs. Your IAM role should include permissions for actions like logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents, and others related to CloudWatch logs.

  2. To save the logs, you'll need to access them through CloudWatch after the training job is complete. Here's a general approach:

import boto3

# Assume you've already created and run your estimator
job_name = attached_estimator.latest_training_job.job_name

# Create a CloudWatch logs client
logs_client = boto3.client('logs')

# Get the log group and stream
log_group = "/aws/sagemaker/TrainingJobs"
log_stream = job_name

# Get the log events
response = logs_client.get_log_events(logGroupName=log_group, logStreamName=log_stream)

# Extract and save the logs
log_output = ""
for event in response['events']:
    log_output += event['message'] + "\n"

# Save to a file
with open('training_logs.txt', 'w') as f:
    f.write(log_output)

This approach will save the logs to a file named 'training_logs.txt'.

Regarding your second point about accept_eula, it's not necessary for accessing or saving logs. The accept_eula parameter is typically used when deploying certain pre-trained models that require accepting a license agreement. If you're not using such a model, you can omit that part of the code. The environment parameter in the JumpStartEstimator is used to set environment variables for the training job, which is separate from log access.
Sources
Access CloudWatch logs for Debugger rules and training jobs - Amazon SageMaker AI
Logging and monitoring - SageMaker Studio Administration Best Practices

answered 10 months ago
  • The log stream name starts with the job name, but there is an additional text after it. For example the log stream name might have the format "{job_name}/algo-1-..."

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.