Amazon SageMaker - Training Job / Data Wrangler

0

I have a customer who is interested in testing Amazon Sagemaker and would like to consult the following questions:

Q1. While submitting training job in Amazon Sagemaker, if there is insufficient capacity occurred, would there be any auto-retry mechanism? How to set up? Q2. Is the underlying SQL / MySQL infrastructure in Data Wrangler from AWS serverless DB backend? Q3. What is the backend database to support Sagemaker / Sagemkaer Data Wrangler ?

Use case: Vision ML - Object detection (self-built algorithm) Framework: Tensorflow 2.4.4

AWS
已提问 2 年前390 查看次数
2 回答
0

Hi,

Q1/ this is not a built in feature for training job API. You'd need to implement on your side with some try:catch mechanism. If instead you are using SageMaker Pipelines to start the jobs, then that has such functionality (see: Retry mechanism)

Q2/Q3/ SageMaker Data Wrangler does NOT implement a database. It does offer the option to connect to a number of data sources, including databases. Is this what you meant? Can you elaborate a bit more on these two points on what you are looking for?

Thank you, G

AWS
已回答 2 年前
0

Q1:In order to realize the retry of Training Job, you can use EventBridge to integrate Sagemaker Pipeline. For details, please refer to the following link https://docs.aws.amazon.com/sagemaker/latest/dg/pipeline-eventbridge.html

已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则