내용으로 건너뛰기

Accessing and Identifying SageMaker Canvas Prediction Outputs in Studio

0

In SageMaker Canvas > My Model > Prediction, when I run a prediction, the result is saved as a CSV file in S3, right? I’d like to use this saved file later in SageMaker Studio > Code Editor.

But since the file is created automatically, it seems like I can't set the path or filename myself. In this context, I have two following questions.

  1. Is my understanding correct? Or is there actually a way to set the path or filename manually?
  2. Is there a reliable way to identify the prediction result CSV file? For example, by detecting the most recently created CSV inside the inference folder of the relevant S3 bucket, or something similar?
1개 답변
0

Good morning,

Let me address your questions about SageMaker Canvas predictions:

  1. Regarding file paths and names:
  • Your understanding is correct - when you make predictions in SageMaker Canvas, the results are automatically saved to S3 without allowing manual path/filename configuration
  • The files are saved in your default SageMaker Canvas bucket with a structure like: s3://{canvas-bucket-name}/inference/{model-id}/{timestamp}_batch_prediction.csv
  1. For identifying and accessing prediction files:
  • Yes, there are several reliable ways to locate the prediction files:
    • Using boto3 to list objects in the inference folder and sort by timestamp
    • Using the S3 API's LastModified attribute to find the most recent file
    • Filtering objects by the "_batch_prediction.csv" suffix

Here's a simple example of how you could find the most recent prediction file using Python in SageMaker Studio:

import boto3
from operator import itemgetter

s3_client = boto3.client('s3')

def get_latest_prediction(bucket_name, model_id):
    prefix = f'inference/{model_id}/'
    
    # List all objects in the inference folder for this model
    response = s3_client.list_objects_v2(
        Bucket=bucket_name,
        Prefix=prefix
    )
    
    # Filter for CSV files and sort by last modified date
    if 'Contents' in response:
        files = [
            {
                'Key': obj['Key'],
                'LastModified': obj['LastModified']
            }
            for obj in response['Contents']
            if obj['Key'].endswith('_batch_prediction.csv')
        ]
        
        if files:
            # Get the most recent file
            latest_file = max(files, key=itemgetter('LastModified'))
            return latest_file['Key']
    
    return None

This code will help you locate the most recent prediction file for a specific model in your SageMaker Canvas bucket.

AWS
답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.