Amazon SageMaker - Training Job / Data Wrangler

0

I have a customer who is interested in testing Amazon Sagemaker and would like to consult the following questions:

Q1. While submitting training job in Amazon Sagemaker, if there is insufficient capacity occurred, would there be any auto-retry mechanism? How to set up? Q2. Is the underlying SQL / MySQL infrastructure in Data Wrangler from AWS serverless DB backend? Q3. What is the backend database to support Sagemaker / Sagemkaer Data Wrangler ?

Use case: Vision ML - Object detection (self-built algorithm) Framework: Tensorflow 2.4.4

AWS
preguntada hace 2 años390 visualizaciones
2 Respuestas
0

Hi,

Q1/ this is not a built in feature for training job API. You'd need to implement on your side with some try:catch mechanism. If instead you are using SageMaker Pipelines to start the jobs, then that has such functionality (see: Retry mechanism)

Q2/Q3/ SageMaker Data Wrangler does NOT implement a database. It does offer the option to connect to a number of data sources, including databases. Is this what you meant? Can you elaborate a bit more on these two points on what you are looking for?

Thank you, G

AWS
respondido hace 2 años
0

Q1:In order to realize the retry of Training Job, you can use EventBridge to integrate Sagemaker Pipeline. For details, please refer to the following link https://docs.aws.amazon.com/sagemaker/latest/dg/pipeline-eventbridge.html

respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas