Skip to content

Track lineage in MLOps

0

Hi,

I have an MLOps pipeline whose code is stored in CodeCommit. A commit to CodeCommit triggers a CodePipeline which in turn triggers a Sagemaker pipeline. The Sagemaker Pipeline creates a new model for each its execution which would be registered in Sagemaker Model Registry. For each model version registered in model registry, can we track its entire lineage in such a way that we get to see which commit ID in CodeCommit, which data version in S3 is used to train the model, which training job resulted in the particular version of the model (in model registry).

1 Answer
1

Hey Manohar,

Yes there is a way to track the entire lineage to see the specific details you mentioned. For your code pipeline you need to pass the commit ID as a parameter to the SageMaker pipeline. You can get the commit ID using "SourceVariables.CommitId". (Commit action output variables: https://docs.aws.amazon.com/codepipeline/latest/userguide/reference-variables.html).

For the S3 versioning data, make sure you are capturing the specific version or URI of the dataset used for training, and pass this in as a parameter to your SageMaker pipeline as well. You can pass it in as an input for your training estimator.

Now when you register the model to SageMaker Model Registry, during your register step, pass in the commit ID, data version, and the training job name as metadata. You should now be able to see the details you mentioned.

Hope this helps

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.