- Más nuevo
- Más votos
- Más comentarios
I'm not 100% sure that I understand the question, but in the blog post you are referencing, data is being pushed to DVC from within the SageMaker processing job and training job. You could do the same thing in the processing and training jobs you want to include in the SageMaker Pipeline.
If there is another reason why you need to run DVC commands in a custom step after the training and processing steps have completed, then I would look at using a Lambda step first. If you expect your code to run for more than 15 minutes, or if there is another reason why Lambda is not a suitable choice for compute, you could use a Callback step. This step sends a message to an Amazon SQS queue, and you can trigger any process you want when you receive this message. When your process has finished running, you use an API call to inform SageMaker that the step has finished running.
Contenido relevante
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace un año
@S_Moose - thanks. the link i posted is just as an example. for my implementation , i want to run training/pre processing via sagemaker pipeline steps. and in the example , everything is done in the notebook (all the dvc commands and git commands) . my understanding , is once you create the pipeline and run it, you can come back it and run it from the sagemaker studio UI. that is the reason, i need to do this via code. so when processing steps runs and finishes the data it generates, i want to check in via dvc . also, once the training is finished , the output is dumped to an s3 bucket, i want to track that via dvc , so will need to run those dvc add and git commits command. also, this is off the topic, but can one configure the output bucket beforehand. so that once training is done, the model output is dumped to the s3 uri that i want.