Salta al contenuto

Accessing and Identifying SageMaker Canvas Prediction Outputs in Studio

0

In SageMaker Canvas > My Model > Prediction, when I run a prediction, the result is saved as a CSV file in S3, right? I’d like to use this saved file later in SageMaker Studio > Code Editor.

But since the file is created automatically, it seems like I can't set the path or filename myself. In this context, I have two following questions.

  1. Is my understanding correct? Or is there actually a way to set the path or filename manually?
  2. Is there a reliable way to identify the prediction result CSV file? For example, by detecting the most recently created CSV inside the inference folder of the relevant S3 bucket, or something similar?
1 Risposta
0

Good morning,

Let me address your questions about SageMaker Canvas predictions:

  1. Regarding file paths and names:
  • Your understanding is correct - when you make predictions in SageMaker Canvas, the results are automatically saved to S3 without allowing manual path/filename configuration
  • The files are saved in your default SageMaker Canvas bucket with a structure like: s3://{canvas-bucket-name}/inference/{model-id}/{timestamp}_batch_prediction.csv
  1. For identifying and accessing prediction files:
  • Yes, there are several reliable ways to locate the prediction files:
    • Using boto3 to list objects in the inference folder and sort by timestamp
    • Using the S3 API's LastModified attribute to find the most recent file
    • Filtering objects by the "_batch_prediction.csv" suffix

Here's a simple example of how you could find the most recent prediction file using Python in SageMaker Studio:

import boto3
from operator import itemgetter

s3_client = boto3.client('s3')

def get_latest_prediction(bucket_name, model_id):
    prefix = f'inference/{model_id}/'
    
    # List all objects in the inference folder for this model
    response = s3_client.list_objects_v2(
        Bucket=bucket_name,
        Prefix=prefix
    )
    
    # Filter for CSV files and sort by last modified date
    if 'Contents' in response:
        files = [
            {
                'Key': obj['Key'],
                'LastModified': obj['LastModified']
            }
            for obj in response['Contents']
            if obj['Key'].endswith('_batch_prediction.csv')
        ]
        
        if files:
            # Get the most recent file
            latest_file = max(files, key=itemgetter('LastModified'))
            return latest_file['Key']
    
    return None

This code will help you locate the most recent prediction file for a specific model in your SageMaker Canvas bucket.

AWS
con risposta 4 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.