SageMaker Endpoint Keeps on Returning Error Code 500
0
Hello Everyone,
I am currently struggling with getting a PyTorch ML model running as a SageMaker endpoint. Project Introduction:
- To deploy the sam-hq model and infer through SageMaker instead of running on a local machine.
- sam-hq has its own module that is distributed via git clone.
- Since git clone on SageMaker is hardly possible, the module is initially downloaded to a local system then uploaded to S3 for download on the SageMaker instance with boto3.
- The module is then imported by:
from sam_hq.segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
in inference.py.
Current workflow (that is not working):
- One of the lightweight sam-hq models has been presumably successfully deployed as an endpoint on SageMaker. The folder structure compressed into .tar.gz has the following structure:
model
|—model.pth
___|—code
______|—inference.py
`where bolded is a folder - What is shown in the sam-hq demo works in SageMaker Studio on the same instance type.
- The same workflow, with the inference part on the endpoint does not seem to be working.
- Input format has been confirmed to be properly implemented as required by the model. Tested with the input returned back to the Studio (right before inferring to the model on the endpoint) to run inference, which works.
- The input data in transmission has been encoded for receipt by the endpoint. Using a serializer has also been done, but the results are the same.
- The endpoint has been returning the generic error code 500 - ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary and could not lo ad the entire response body.
- I have referred to this BERT sample for most of my implementation
How I deployed the model:
#1. Create a model object for deployment
import sagemaker
from sagemaker.pytorch import PyTorchModel
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
model_data = (path to model in S3 bucket) # Path to the saved PyTorch model in S3
entry_point = 'inference.py' # Filename of the inference script in S3 -> why refers to the one here in the Studio???
pytorch_model = PyTorchModel(model_data=model_data, role=role, entry_point=entry_point,
framework_version='1.10.0', py_version='py38') # Need to specify the PyTorch version and Python3 version respectively
#2. Create a new predictor object, along with deploying the model(endpoint)
predictor = pytorch_model.deploy(instance_type='ml.g4dn.xlarge', initial_instance_count=1)
My inference.py code:
import pip
pip.main(['install', 'opencv-python']) #needs libGL
pip.main(['install', 'timm', '--no-cache-dir'])
pip.main(['install', 'sagemaker'])
#OK; perhaps move to requirements.txt later!!
import torch
import torchvision.transforms as transforms
from PIL import Image
import json
from io import BytesIO
import cv2
import numpy as np
#OK
# To import the sam-hq package
import boto3
import zipfile
import io
s3 = boto3.client('s3')
bucket_name = (bucket name)
zip_file_key = 'sam_hq.zip'
# Download the ZIP file from S3
response = s3.get_object(Bucket=bucket_name, Key=zip_file_key)
zip_data = response['Body'].read()
# Extract the contents of the ZIP file
with zipfile.ZipFile(io.BytesIO(zip_data)) as zip_ref:
zip_ref.extractall('') # Specify your desired extraction directory
#import sam_hq
from sam_hq.segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import os
# Load the model
"""
model_fn: This function is a required function in Amazon SageMaker that loads the model for inference. It takes the model_dir parameter, which specifies the directory where the model artifacts are stored.
model_path: This variable holds the path to the model file within the model_dir. In this case, the model file is located at 'model/model.pth' relative to the model_dir.
sam: This variable represents the loaded sam model.
return sam: The loaded model, sam, is returned by the model_fn function.
"""
def model_fn(model_dir):
#sam = model
#['handler_service.py', 'code', 'model', 'MAR-INF', '__pycache__', 'sam_hq'] = model_dir/
#['._.DS_Store', '.DS_Store', 'model.pth', '._model.pth', 'code'] = model_dir/model
model_path = os.path.join(model_dir, 'model/model.pth') #only uploaded vit_tiny for now
sam = sam_model_registry["vit_tiny"](checkpoint=model_path)
print("model_fn() done")
return sam
# Perform inference
"""
predict_fn: This function applies the loaded model, sam, to the incoming request represented by input_image.
mask_generator: This variable represents an instance of the SamAutomaticMaskGenerator class.
masks = mask_generator.generate(input_image): This line applies the mask_generator to the input_image to generate masks.
return masks: returns the mask list for annotation on the client side.
Copied from sam-hq demo.
"""
def predict_fn(input_image, model):
"""
Apply model to the incoming request
"""
device = torch.device("cuda")
sam.to(device = device) #gpu, cuda, cpu
sam.eval()
points_per_side = 48
pred_iou_thresh = 0.65 # ↑ fewer false positives, fewer masks # ↑ break more, more area have mask
stability_score_thresh = 0.65 # ↑ more stable masks, fewer masks being generated.
crop_n_layers = 1 # ↑ smaller cropped region around the object, improve accuracy ,fewer masks being generated overall
crop_n_points_downscale_factor = 2 # ↑ larger cropped regions, decrease accuracy, more masks being generated overall.
min_mask_region_area = 100 # ↑ fewer false positives, fewer masks
mask_generator = SamAutomaticMaskGenerator(
model=sam,
points_per_side=points_per_side,
points_per_batch=48,
pred_iou_thresh=pred_iou_thresh,
stability_score_thresh=stability_score_thresh,
crop_n_layers=crop_n_layers,
crop_n_points_downscale_factor=crop_n_points_downscale_factor,
min_mask_region_area=min_mask_region_area,
#output_mode="binary_mask",
# uncompressed_rle
)
#OK till here
masks = mask_generator.generate(input_image) # not working, stuck here; error code 500
print("predict_fn() done")
return masks
"""
input_fn: This function is an input function used in Amazon SageMaker to process the request body and content type.
request_body: The parameter represents the body of the incoming request, which is in bytes format (always encoded in transmission).
request_content_type: This parameter specifies the content type of the incoming request.
image_array = np.frombuffer(request_body, dtype=np.uint8): This line converts the request body from bytes to a NumPy array using np.frombuffer().
image = cv2.imdecode(image_array, cv2.IMREAD_COLOR): This line decodes the image array using cv2.imdecode(), in color format (cv2.IMREAD_COLOR).
return image: returns the decoded image.
"""
def input_fn(request_body, request_content_type):
if request_content_type == 'image/jpeg':
# Convert the request body from bytes to numpy array
image_array = np.frombuffer(request_body, dtype=np.uint8) #request_body == image_bytes == image_response_data
# Decode the image using cv2.imdecode()
image = cv2.imdecode(image_array, cv2.IMREAD_COLOR)
# Return the decoded image
print("input_fn() done")
return image
else:
raise ValueError(f'Unsupported content type: {request_content_type}')
return request
"""
output_fn: This function is an output function used in Amazon SageMaker to prepare the prediction output for serialization and response to the client.
prediction: The parameter represents the prediction output generated by the model.
response_content_type: This parameter specifies the content type for the response.
response: This variable holds prediction output to the response variable.
return response: returns the response variable as the prepared output for the response to the client.
"""
def output_fn(prediction, response_content_type):
"""
Serialize and prepare the prediction output
"""
if response_content_type == "application/json":
response = (prediction.tobytes())
else:
response = (prediction.tobytes())
print("output_fn() done")
return response
- I have made sure that
input_image
is in the right format, like mentioned before, by returning it to the Studio for testing and inferring the model there.
This is how I infer the endpoint from Studio:
# 4. Infer to endpoint
import boto3
import cv2
import io
import sagemaker
sm = sagemaker.Session().sagemaker_runtime_client
# Set the endpoint name and region
endpoint_name = predictor.endpoint_name
# Convert the image to JPEG format
_, encoded_image = cv2.imencode(".jpeg", image)
image_bytes = encoded_image.tobytes() # Need to encode for transmission (both ways)
# Create a Boto3 client for Amazon SageMaker Runtime
sagemaker_client = boto3.client("sagemaker-runtime")
# Set the content type header
content_type = "image/jpeg"
print("Waiting for a response from the endpoint: " + endpoint_name)
# Send the request to the endpoint
response = sm.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=content_type,
Body= image_bytes
)
# Check the response status code
if response["ResponseMetadata"]["HTTPStatusCode"] == 200:
# Successful request
# Process the response content as needed
mask = response["Body"].read()
else:
# Failed request
print("Request failed with status code:", response["ResponseMetadata"]["HTTPStatusCode"])
print("Result returned and held by variable")
# using a more powerful instance also does not work
Looking forward to your assistance and responses!
Cheers.
asked 6 months ago431 viewslg...
1 Answer
- Newest
- Most votes
- Most comments
Are these answers helpful? Upvote the correct answer to help the community benefit from your knowledge.
0
Hi,
You shoud follow this guidance for troubleshooting SageMaker Internal Errors (code 500): https://repost.aws/knowledge-center/sagemaker-http-500-internal-server-error
Best,
Didier
Relevant content
- asked 2 years agolg...
- asked 2 years agolg...
- Accepted Answerasked 3 years agolg...
- AWS OFFICIALUpdated 3 days ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 8 months ago
Hi Didier,
Appreciate your response.
Missing permissions do not seem to be the main issue as I have full access to EC2, SageMaker and CloudWatch. Also, when I went to review CloudWatch logs, no messages seem to be output, including the print statements in my inference.py code. I was only able to find these two messages in the log events:
There's also no data available in the operational and invocation metrics pages in my endpoint.
Many thanks, Andrew