An error occurred (ValidationException) when calling the CreateTransformJob operation: Requested instance type cannot work with all containers of the model

0

Hello, I'd appreciate your help with this error: botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateTransformJob operation: Requested instance type cannot work with all containers of the model: ml.m5.4xlarge

Define the JumpStart model with the specified region

pretrained_model = JumpStartModel( model_id=jumpstart_model_name, role=my_role, region=region_name, sagemaker_session=Sagemaker_Session )

#==============================================================================

We will use a default s3 bucket for providing the input and output paths for batch transform

#s3_bucket_name = Sagemaker_Session.default_bucket()

Create the full S3 paths

s3_input_data_path = f"s3://{s3_bucket_name}/{s3_bucket_folder}/batch_input" s3_output_data_path = f"s3://{s3_bucket_name}/{s3_bucket_folder}/batch_output"

Uploading the data

#s3 = boto3.client("s3") #s3.upload_file(batchtransform_data_file_path, s3_bucket_name, f"{s3_bucket_folder}/batch_input/{batchtransform_data_file_name}") S3Uploader.upload(batchtransform_data_file_path, s3_input_data_path) print(f"input data S3 location: {s3_input_data_path}") #==============================================================================

Deploy the model to an endpoint

#pretrained_model_predictor = pretrained_model.deploy(

instance_type=InstanceType,

initial_instance_count=1,

serializer=sagemaker.serializers.JSONSerializer(),

deserializer=sagemaker.deserializers.JSONDeserializer()

#)

Creating the batch transformer object. If you have a large dataset you can

divide it into smaller chunks and use more instances for faster inference

batch_transformer = pretrained_model.transformer( instance_count=Instance_Count, instance_type=InstanceType, output_path=s3_output_data_path, assemble_with="Line", accept="text/csv", # "application/jsonlines" max_payload=Max_Payload, env = hyper_params_dict,

sagemaker_session=Sagemaker_Session

)

batch_transformer.env = hyper_params_dict

Making the predictions on the input data

batch_transformer.transform( s3_input_data_path, content_type="application/jsonlines", split_type="Line" )

batch_transformer.wait()

reza
asked 6 months ago201 views
1 Answer
0

I realized this issue can be resolved by switching to the huggingface batch transform in SageMaker with the following code:

retrieve the llm image uri

llm_image = get_huggingface_llm_image_uri( "huggingface", # huggingface or lmi version=llm_image_uri_ver, session=Sagemaker_Session, region=region_name )

print ecr image uri

print(f"llm image uri: {llm_image}")

Define Model and Endpoint configuration parameter

config = { 'HF_MODEL_ID': HF_model_name, # model_id from hf.co/models 'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica 'MAX_INPUT_LENGTH': json.dumps(MAX_INPUT_LENGTH), # Max length of input text 'MAX_TOTAL_TOKENS': json.dumps(MAX_TOTAL_TOKENS), # Max length of the generation (including input text) 'MAX_BATCH_TOTAL_TOKENS': json.dumps(MAX_BATCH_TOTAL_TOKENS), # Limits the number of tokens that can be processed in parallel during the generation 'HUGGING_FACE_HUB_TOKEN': HUGGING_FACE_HUB_TOKEN

,'HF_MODEL_QUANTIZE': "bitsandbytes", # comment in to quantize

HF_MODEL_QUANTIZE (Optional): Meaning: Enables model quantization to reduce model size and potentially improve performance, especially inference speed.

However, lower weight precision could affect the quality of the output for some models.

Typical Choices: Can be set to "bitsandbytes" or similar, depending on the quantization method supported.

}

check if token is set

#assert config['HUGGING_FACE_HUB_TOKEN'] != HUGGING_FACE_HUB_TOKEN, "Please set your Hugging Face Hub token"

create HuggingFaceModel with the image uri

llm_model = HuggingFaceModel( role=my_role, image_uri=llm_image, env=config )

Specify the batch job hyperparameters here, If you want to treate each example hyperparameters different please pass hyper_params_dict as None

hyper_params = {"max_new_tokens":str( Max_New_Tokens),"truncate":str(Input_Truncation), "return_full_text":str( False)} #hyper_params = {"batch_size":str(Batch_Size), "max_new_tokens":str( Max_New_Tokens), "truncate":str(Input_Truncation), "return_full_text":str( False)}

#hyper_params_dict = {"HYPER_PARAMS": str(hyper_params)}

create transformer to run a batch job

batch_job = llm_model.transformer( instance_count=Instance_Count, instance_type=InstanceType, #strategy="MultiRecord",# Description: Determines how records should be batched. 'SingleRecord' means each record is sent individually, while 'MultiRecord' sends multiple records in a single batch. assemble_with="Line", strategy='SingleRecord', # strategy: Which determines how records should be batched into each prediction request within the batch transform job. ‘MultiRecord’ may be faster, but some use cases may require ‘SingleRecord’. output_path=s3_output_data_path, # we are using the s3 path to save the output with the input env = hyper_params, accept='application/json', #max_concurrent_transforms= MaxConcurrentTransforms,# (int): The maximum number of HTTP requests to be made to each individual transform container at one time. max_payload= MaxPayloadInMB,# (int): Maximum size of the payload in a single HTTP request to the container in MB. )

starts batch transform job and uses S3 data as input

batch_job.transform( data= f"{s3_input_data_path}/{batchtransform_data_file_name}", content_type='application/json',
split_type='Line', input_filter="$", # output_filter="$['id','SageMakerOutput']", # Select "id" from input and model output join_source='Input', # None for not Including input data in the output, Input for including wait=True, )

reza
answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions

Relevant content