跳至內容

Accessing and Identifying SageMaker Canvas Prediction Outputs in Studio

0

In SageMaker Canvas > My Model > Prediction, when I run a prediction, the result is saved as a CSV file in S3, right? I’d like to use this saved file later in SageMaker Studio > Code Editor.

But since the file is created automatically, it seems like I can't set the path or filename myself. In this context, I have two following questions.

  1. Is my understanding correct? Or is there actually a way to set the path or filename manually?
  2. Is there a reliable way to identify the prediction result CSV file? For example, by detecting the most recently created CSV inside the inference folder of the relevant S3 bucket, or something similar?
已提問 4 個月前檢視次數 24 次
1 個回答
0

Good morning,

Let me address your questions about SageMaker Canvas predictions:

  1. Regarding file paths and names:
  • Your understanding is correct - when you make predictions in SageMaker Canvas, the results are automatically saved to S3 without allowing manual path/filename configuration
  • The files are saved in your default SageMaker Canvas bucket with a structure like: s3://{canvas-bucket-name}/inference/{model-id}/{timestamp}_batch_prediction.csv
  1. For identifying and accessing prediction files:
  • Yes, there are several reliable ways to locate the prediction files:
    • Using boto3 to list objects in the inference folder and sort by timestamp
    • Using the S3 API's LastModified attribute to find the most recent file
    • Filtering objects by the "_batch_prediction.csv" suffix

Here's a simple example of how you could find the most recent prediction file using Python in SageMaker Studio:

import boto3
from operator import itemgetter

s3_client = boto3.client('s3')

def get_latest_prediction(bucket_name, model_id):
    prefix = f'inference/{model_id}/'
    
    # List all objects in the inference folder for this model
    response = s3_client.list_objects_v2(
        Bucket=bucket_name,
        Prefix=prefix
    )
    
    # Filter for CSV files and sort by last modified date
    if 'Contents' in response:
        files = [
            {
                'Key': obj['Key'],
                'LastModified': obj['LastModified']
            }
            for obj in response['Contents']
            if obj['Key'].endswith('_batch_prediction.csv')
        ]
        
        if files:
            # Get the most recent file
            latest_file = max(files, key=itemgetter('LastModified'))
            return latest_file['Key']
    
    return None

This code will help you locate the most recent prediction file for a specific model in your SageMaker Canvas bucket.

AWS
已回答 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。