SageMaker Endpoint Keeps on Returning Error Code 500

0

Hello Everyone,

I am currently struggling with getting a PyTorch ML model running as a SageMaker endpoint. Project Introduction:

  • To deploy the sam-hq model and infer through SageMaker instead of running on a local machine.
  • sam-hq has its own module that is distributed via git clone.
  • Since git clone on SageMaker is hardly possible, the module is initially downloaded to a local system then uploaded to S3 for download on the SageMaker instance with boto3.
  • The module is then imported by: from sam_hq.segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor in inference.py.

Current workflow (that is not working):

  • One of the lightweight sam-hq models has been presumably successfully deployed as an endpoint on SageMaker. The folder structure compressed into .tar.gz has the following structure:
    model
    |—model.pth
    ___|—code
    ______|—inference.py
    `where bolded is a folder
  • What is shown in the sam-hq demo works in SageMaker Studio on the same instance type.
  • The same workflow, with the inference part on the endpoint does not seem to be working.
  • Input format has been confirmed to be properly implemented as required by the model. Tested with the input returned back to the Studio (right before inferring to the model on the endpoint) to run inference, which works.
  • The input data in transmission has been encoded for receipt by the endpoint. Using a serializer has also been done, but the results are the same.
  • The endpoint has been returning the generic error code 500 - ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary and could not lo ad the entire response body.
  • I have referred to this BERT sample for most of my implementation

How I deployed the model:

#1. Create a model object for deployment

import sagemaker
from sagemaker.pytorch import PyTorchModel

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

model_data = (path to model in S3 bucket)  # Path to the saved PyTorch model in S3
entry_point = 'inference.py'  # Filename of the inference script in S3 -> why refers to the one here in the Studio???

pytorch_model = PyTorchModel(model_data=model_data, role=role, entry_point=entry_point,
                             framework_version='1.10.0', py_version='py38') # Need to specify the PyTorch version and Python3 version respectively

#2. Create a new predictor object, along with deploying the model(endpoint)

predictor = pytorch_model.deploy(instance_type='ml.g4dn.xlarge', initial_instance_count=1)

My inference.py code:

import pip
pip.main(['install', 'opencv-python']) #needs libGL
pip.main(['install', 'timm', '--no-cache-dir'])
pip.main(['install', 'sagemaker'])
#OK; perhaps move to requirements.txt later!!


import torch
import torchvision.transforms as transforms
from PIL import Image
import json
from io import BytesIO
import cv2
import numpy as np
#OK

# To import the sam-hq package
import boto3
import zipfile
import io

s3 = boto3.client('s3')
bucket_name = (bucket name)
zip_file_key = 'sam_hq.zip'

# Download the ZIP file from S3
response = s3.get_object(Bucket=bucket_name, Key=zip_file_key)
zip_data = response['Body'].read()

# Extract the contents of the ZIP file
with zipfile.ZipFile(io.BytesIO(zip_data)) as zip_ref:
    zip_ref.extractall('')  # Specify your desired extraction directory

#import sam_hq
from sam_hq.segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor

import os

# Load the model
"""
model_fn: This function is a required function in Amazon SageMaker that loads the model for inference. It takes the model_dir parameter, which specifies the directory where the model artifacts are stored.
model_path: This variable holds the path to the model file within the model_dir. In this case, the model file is located at 'model/model.pth' relative to the model_dir.
sam: This variable represents the loaded sam model.
return sam: The loaded model, sam, is returned by the model_fn function.
"""
def model_fn(model_dir):

    #sam = model
    #['handler_service.py', 'code', 'model', 'MAR-INF', '__pycache__', 'sam_hq'] = model_dir/
    #['._.DS_Store', '.DS_Store', 'model.pth', '._model.pth', 'code'] = model_dir/model
    
    model_path = os.path.join(model_dir, 'model/model.pth') #only uploaded vit_tiny for now
    
    sam = sam_model_registry["vit_tiny"](checkpoint=model_path)
    
    print("model_fn() done")
    
    return sam

# Perform inference
"""
predict_fn: This function applies the loaded model, sam, to the incoming request represented by input_image.
mask_generator: This variable represents an instance of the SamAutomaticMaskGenerator class.
masks = mask_generator.generate(input_image): This line applies the mask_generator to the input_image to generate masks.

return masks: returns the mask list for annotation on the client side.

Copied from sam-hq demo.
"""
def predict_fn(input_image, model):
    """
    Apply model to the incoming request
    """

    device = torch.device("cuda")
    sam.to(device = device) #gpu, cuda, cpu
    sam.eval()
    
    points_per_side = 48
    pred_iou_thresh = 0.65              # ↑ fewer false positives, fewer masks # ↑ break more, more area have mask
    stability_score_thresh = 0.65        # ↑ more stable masks, fewer masks being generated.
    crop_n_layers = 1                   # ↑ smaller cropped region around the object, improve  accuracy ,fewer masks being generated overall
    crop_n_points_downscale_factor = 2  # ↑ larger cropped regions, decrease accuracy, more masks being generated overall.
    min_mask_region_area = 100          # ↑ fewer false positives, fewer masks

    mask_generator = SamAutomaticMaskGenerator(
        model=sam,
        points_per_side=points_per_side,
        points_per_batch=48,
        pred_iou_thresh=pred_iou_thresh,
        stability_score_thresh=stability_score_thresh,
        crop_n_layers=crop_n_layers,
        crop_n_points_downscale_factor=crop_n_points_downscale_factor,
        min_mask_region_area=min_mask_region_area,
        #output_mode="binary_mask",
        # uncompressed_rle
    )
    
    #OK till here
    
    masks = mask_generator.generate(input_image) # not working, stuck here; error code 500
    
    print("predict_fn() done")
    return masks

"""
input_fn: This function is an input function used in Amazon SageMaker to process the request body and content type.
request_body: The parameter represents the body of the incoming request, which is in bytes format (always encoded in transmission).
request_content_type: This parameter specifies the content type of the incoming request.
image_array = np.frombuffer(request_body, dtype=np.uint8): This line converts the request body from bytes to a NumPy array using np.frombuffer().
image = cv2.imdecode(image_array, cv2.IMREAD_COLOR): This line decodes the image array using cv2.imdecode(), in color format (cv2.IMREAD_COLOR).

return image: returns the decoded image.
"""
def input_fn(request_body, request_content_type):
    if request_content_type == 'image/jpeg':
        # Convert the request body from bytes to numpy array
        image_array = np.frombuffer(request_body, dtype=np.uint8) #request_body == image_bytes == image_response_data
        # Decode the image using cv2.imdecode()
        image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)
        # Return the decoded image
        
        print("input_fn() done")
        return image
    else:
        raise ValueError(f'Unsupported content type: {request_content_type}')

    return request
    

"""
output_fn: This function is an output function used in Amazon SageMaker to prepare the prediction output for serialization and response to the client.
prediction: The parameter represents the prediction output generated by the model.
response_content_type: This parameter specifies the content type for the response.
response: This variable holds prediction output to the response variable.

return response: returns the response variable as the prepared output for the response to the client.
"""
def output_fn(prediction, response_content_type):
    """
    Serialize and prepare the prediction output
    """

    if response_content_type == "application/json":
        response = (prediction.tobytes())
    else:
        response = (prediction.tobytes())
        
    print("output_fn() done")    
    return response
  • I have made sure that input_image is in the right format, like mentioned before, by returning it to the Studio for testing and inferring the model there.

This is how I infer the endpoint from Studio:

# 4. Infer to endpoint

import boto3
import cv2
import io
import sagemaker

sm = sagemaker.Session().sagemaker_runtime_client

# Set the endpoint name and region
endpoint_name = predictor.endpoint_name

# Convert the image to JPEG format
_, encoded_image = cv2.imencode(".jpeg", image)
image_bytes = encoded_image.tobytes() # Need to encode for transmission (both ways)

# Create a Boto3 client for Amazon SageMaker Runtime
sagemaker_client = boto3.client("sagemaker-runtime")

# Set the content type header
content_type = "image/jpeg"

print("Waiting for a response from the endpoint: " + endpoint_name)

# Send the request to the endpoint
response = sm.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType=content_type,
    Body= image_bytes
)

# Check the response status code
if response["ResponseMetadata"]["HTTPStatusCode"] == 200:
    # Successful request
    # Process the response content as needed
    mask = response["Body"].read()
else:
    # Failed request
    print("Request failed with status code:", response["ResponseMetadata"]["HTTPStatusCode"])

print("Result returned and held by variable")

# using a more powerful instance also does not work

Looking forward to your assistance and responses!
Cheers.

Andrew
asked 6 months ago431 views
1 Answer
0

Hi,

You shoud follow this guidance for troubleshooting SageMaker Internal Errors (code 500): https://repost.aws/knowledge-center/sagemaker-http-500-internal-server-error

Best,

Didier

profile pictureAWS
EXPERT
answered 6 months ago
  • Hi Didier,

    Appreciate your response.

    Missing permissions do not seem to be the main issue as I have full access to EC2, SageMaker and CloudWatch. Also, when I went to review CloudWatch logs, no messages seem to be output, including the print statements in my inference.py code. I was only able to find these two messages in the log events:

    • WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
    • Model server started.

    There's also no data available in the operational and invocation metrics pages in my endpoint.

    Many thanks, Andrew

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions