Ir para o conteúdo

Accessing and Identifying SageMaker Canvas Prediction Outputs in Studio

0

In SageMaker Canvas > My Model > Prediction, when I run a prediction, the result is saved as a CSV file in S3, right? I’d like to use this saved file later in SageMaker Studio > Code Editor.

But since the file is created automatically, it seems like I can't set the path or filename myself. In this context, I have two following questions.

  1. Is my understanding correct? Or is there actually a way to set the path or filename manually?
  2. Is there a reliable way to identify the prediction result CSV file? For example, by detecting the most recently created CSV inside the inference folder of the relevant S3 bucket, or something similar?
feita há 4 meses24 visualizações
1 Resposta
0

Good morning,

Let me address your questions about SageMaker Canvas predictions:

  1. Regarding file paths and names:
  • Your understanding is correct - when you make predictions in SageMaker Canvas, the results are automatically saved to S3 without allowing manual path/filename configuration
  • The files are saved in your default SageMaker Canvas bucket with a structure like: s3://{canvas-bucket-name}/inference/{model-id}/{timestamp}_batch_prediction.csv
  1. For identifying and accessing prediction files:
  • Yes, there are several reliable ways to locate the prediction files:
    • Using boto3 to list objects in the inference folder and sort by timestamp
    • Using the S3 API's LastModified attribute to find the most recent file
    • Filtering objects by the "_batch_prediction.csv" suffix

Here's a simple example of how you could find the most recent prediction file using Python in SageMaker Studio:

import boto3
from operator import itemgetter

s3_client = boto3.client('s3')

def get_latest_prediction(bucket_name, model_id):
    prefix = f'inference/{model_id}/'
    
    # List all objects in the inference folder for this model
    response = s3_client.list_objects_v2(
        Bucket=bucket_name,
        Prefix=prefix
    )
    
    # Filter for CSV files and sort by last modified date
    if 'Contents' in response:
        files = [
            {
                'Key': obj['Key'],
                'LastModified': obj['LastModified']
            }
            for obj in response['Contents']
            if obj['Key'].endswith('_batch_prediction.csv')
        ]
        
        if files:
            # Get the most recent file
            latest_file = max(files, key=itemgetter('LastModified'))
            return latest_file['Key']
    
    return None

This code will help you locate the most recent prediction file for a specific model in your SageMaker Canvas bucket.

AWS
respondido há 3 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.