By using AWS re:Post, you agree to the Terms of Use
/Amazon Elastic Inference/

Questions tagged with Amazon Elastic Inference

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

AWS EKS - EIA attached on node not reachable by Pod

I'm using a standard **AWS EKS** cluster, all cloud based (K8S 1.22) with multiple node groups, one of which uses a Launch Template that defines an Elastic Inference Accelerator attached to the instances (eia2.medium) to serve some kind of Tensorflow model. I've been struggling a lot to make our Deep Learning model working at all while deployed, namely I have a Pod in a Deployment with a Service Account and an **EKS IRSA** policy attached, that is based on AWS Deep Learning Container for inference model serving based on Tensorflow 1.15.0. The image used is `763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference-eia:1.15.0-cpu` and when the model is deployed in the cluster, with a node affinity to the proper EIA-enabled node, it simply doesn't work when called using /invocations endpoint: ``` Using Amazon Elastic Inference Client Library Version: 1.6.3 Number of Elastic Inference Accelerators Available: 1 Elastic Inference Accelerator ID: eia-<id> Elastic Inference Accelerator Type: eia2.medium Elastic Inference Accelerator Ordinal: 0 2022-05-11 13:47:17.799145: F external/org_tensorflow/tensorflow/contrib/ei/session/eia_session.cc:1221] Non-OK-status: SwapExStateWithEI(tmp_inputs, tmp_outputs, tmp_freeze) status: Internal: Failed to get the initial operator <redacted>list from server. WARNING:__main__:unexpected tensorflow serving exit (status: 134). restarting. ``` Just as a reference, when using the CPU-only image available at `763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:1.15.0-cpu`, the model serves perfectly in any environment (locally too), of course with much longer computational time. Along with this, if i deploy a single EC2 instance with the attached EC2, and serve the container using a simple Docker command, the EIA works fine and is accessed correctly by the container. Each EKS node and the Pod itself (via IRSA) has the following policy attached: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "elastic-inference:Connect", "iam:List*", "iam:Get*", "ec2:Describe*", "ec2:Get*", "ec2:ModifyInstanceAttribute" ], "Resource": "*" } ] } ``` as per documentation from AWS itself, also i have created a **VPC Endpoint for Elastic Inference** as described by AWS and binded it to the private subnets used by EKS nodes along with a properly configured **Security Group** which allows **SSH**, **HTTPS** and **8500/8501 TCP** ports from any worker node in the VPC CIDR. Using both the **AWS Reachability Analyzer** and the **IAM Policy Simulator** nothing seems wrong and the networking and permissions seem fine, while also the *EISetupValidator.py* script provided by AWS says the same. Any clue on what's actually happening here? Am i missing some kind of permissions or networking setup?
0
answers
0
votes
9
views
asked 6 days ago

Unsupported pytorch version 1.10.0 with SM Elastic Inference Accelerators

Hi Team, Greetings!! We are not able to deploy on real-time endpoint with elastic inference accelerators. Could you please have a look? SageMaker version: 2.76.0 Code: from sagemaker.pytorch import PyTorchModel from sagemaker import get_execution_role endpoint_name = 'ner-bert' model = PyTorchModel(entry_point='deploy_ei.py', source_dir='code', model_data=model_data, role=get_execution_role(), framework_version='1.10.0', py_version='py38') predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge', accelerator_type='ml.eia2.medium', endpoint_name=endpoint_name) Error details: Unsupported pytorch version: 1.10.0. You may need to upgrade your SDK version (pip install -U sagemaker) for newer pytorch versions. Supported pytorch version(s): 1.3.1, 1.5.1, 1.3, 1.5. Note: We are able to deploy without elastic accelerator in above code and want to use Python 3.8 version because we have some dependency libraries which supports only Python 3.8 version. I looked at "Available DL containers" at https://github.com/aws/deep-learning-containers/blob/master/available_images.md and by looking at this section "SageMaker Framework Containers (SM support only)", SM support PyTorch 1.10.0 with Python 3.8 version. But we would like to deploy on Elastic Inference and by looking at this section "Elastic Inference Containers" in above URL, EI containers supports only PyTorch 1.5.1 with Python 3.6. Why these containers are so outdated? What could be the solution? Can we specify the latest version of Python in requirements.txt file and get it installed? Thanks, Vinayak
0
answers
0
votes
4
views
asked 3 months ago

Host a fine-tuned BERT Multilingual model on SageMaker with Serverless inference

Hi All, Good day!! Key point to note here is, we have pre-processing script for the text document, deserialize which is required for prediction then we have post-processing script for generating NER (entitites). I went through SageMaker material and decided to try following options. 1. Option 1: Bring our own model, write a inference script and deploy it on SM real-time endpoint using Pytorch container. I went through Suman video (https://www.youtube.com/watch?v=D9Qo5OpG4p8) which is really good, need to try with our pre-processing and post-processing scripts then see if it works fine or not. 2. Option 2: Bring our own model, write a inference script and deploy it on SM real-time endpoint using Huggingface container. I went through Huggingface docs (https://huggingface.co/docs/sagemaker/inference#deploy-a-%F0%9F%A4%97-transformers-model-trained-in-sagemaker) but there is no reference for how to use own pre and post-processing scripts to setup inference pipeline. If you know any examples on using our own pre and post-processing scripts using Huggingface container then please share it. 3. Option 3: Bring our own model, write a inference script and deploy it on SM Serverless inference/endpoint using Huggingface container. I went through Julien video (https://www.youtube.com/watch?v=cUhDLoBH80o&list=PLJgojBtbsuc0E1JcQheqgHUUThahGXLJT&index=35) which is excellent but he has not shown how to use our own pre and post-processing scripts using Huggingface container. Please share if you know any examples. Could you please help? Thanks, Vinayak
1
answers
0
votes
15
views
asked 4 months ago

Inference endpoint not responding when invoked by lambda

Hi fellow AWS users, I am working on an inference pipeline on AWS. Simply put, I have trained a PyTorch model and I deployed it (and created an inference endpoint) on Sagemaker from a notebook. On the other hand, I have a lambda that will be triggered whenever there is a new audio that gets uploaded to my S3 bucket and pass the name of that audio to the endpoint. The endpoint downloads the audio, performs some pre-processing (super-quick) and returns predictions. The lambda then sends these predictions by email. Audios get uploaded on the S3 bucket on a non regular basis, like around 10 audios a day, at irregular intervals. This morning, I tried manually uploading a test audio to the bucket to check if the pipeline was working. It turns out that my endpoint is correctly invoked by my lambda but looking at the endpoint logs nothing happens (and I don't get any email). I tried a couple of times, without any more success. The lambda just ends up timing out after 300ms (what I set). However, invoking the endpoint from my sagemaker notebook worked perfectly fine on the first try and seemed to unblock the endpoint. After that, the endpoint was responsive to the lambda invokation. Was that because the endpoint was not "cold" anymore and it was a coincidence, I couldn't tell. My questions are: - Are there any differences in endpoint invokations between the two scenarios (from the lambda or from the Sagemaker notebook)? - How can we see how much time after an invokation the endpoint will become "cold" again? Please correct me If I am wrong using the term cold here. I know it applies to lambdas as well. To what I understood, the endpoint is basically calling my inference script on a ECR container. - According to my use case (number of inferences a day, pre-proccesing lightness, ...), what would be the best option for my endpoint? (async, batch, ...) - My lambda seems to try invokation twice in total (invoke 1 - timeout 1 - invoke 2 - timeout 2). Can that be set differently? - Shall I increase the timeout of my lambda and let it try more times until the ECR is "warm"? or is there such a setting that can be modified on the endpoint side? Thank you so much in advance for your support. Cheers Antoine
1
answers
0
votes
10
views
asked 4 months ago
  • 1
  • 90 / page