Passer au contenu

Accessing and Identifying SageMaker Canvas Prediction Outputs in Studio

0

In SageMaker Canvas > My Model > Prediction, when I run a prediction, the result is saved as a CSV file in S3, right? I’d like to use this saved file later in SageMaker Studio > Code Editor.

But since the file is created automatically, it seems like I can't set the path or filename myself. In this context, I have two following questions.

  1. Is my understanding correct? Or is there actually a way to set the path or filename manually?
  2. Is there a reliable way to identify the prediction result CSV file? For example, by detecting the most recently created CSV inside the inference folder of the relevant S3 bucket, or something similar?
demandé il y a 4 mois24 vues
1 réponse
0

Good morning,

Let me address your questions about SageMaker Canvas predictions:

  1. Regarding file paths and names:
  • Your understanding is correct - when you make predictions in SageMaker Canvas, the results are automatically saved to S3 without allowing manual path/filename configuration
  • The files are saved in your default SageMaker Canvas bucket with a structure like: s3://{canvas-bucket-name}/inference/{model-id}/{timestamp}_batch_prediction.csv
  1. For identifying and accessing prediction files:
  • Yes, there are several reliable ways to locate the prediction files:
    • Using boto3 to list objects in the inference folder and sort by timestamp
    • Using the S3 API's LastModified attribute to find the most recent file
    • Filtering objects by the "_batch_prediction.csv" suffix

Here's a simple example of how you could find the most recent prediction file using Python in SageMaker Studio:

import boto3
from operator import itemgetter

s3_client = boto3.client('s3')

def get_latest_prediction(bucket_name, model_id):
    prefix = f'inference/{model_id}/'
    
    # List all objects in the inference folder for this model
    response = s3_client.list_objects_v2(
        Bucket=bucket_name,
        Prefix=prefix
    )
    
    # Filter for CSV files and sort by last modified date
    if 'Contents' in response:
        files = [
            {
                'Key': obj['Key'],
                'LastModified': obj['LastModified']
            }
            for obj in response['Contents']
            if obj['Key'].endswith('_batch_prediction.csv')
        ]
        
        if files:
            # Get the most recent file
            latest_file = max(files, key=itemgetter('LastModified'))
            return latest_file['Key']
    
    return None

This code will help you locate the most recent prediction file for a specific model in your SageMaker Canvas bucket.

AWS
répondu il y a 4 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.