Do Amazon SageMaker manifest files enable dataset versioning?

0

Some Amazon SageMaker algorithms can train with a manifest JSON file that stores the mapping between images and their Amazon S3 ARNs and metadata, such as labels. This is a great option, because the manifest file is much smaller than the dataset itself. Because the manifest files are small, they can be used easily in versioning tools or saved as part of the model artifact. This appears to be the best construct enabling exact dataset versioning within SageMaker. i.e., if we exclude the creation of a unique training set hard copy per training job that can't be scaled to large datasets. Is my understanding accurate?

AWS
EXPERT
asked 3 years ago327 views
1 Answer
0
Accepted Answer

If you create the conditions for immutability of the assets the manifest points to, then manifest enables exact dataset versioning with SageMaker. You can have a data store in Amazon S3 with all versions of the data assets and use the manifest files for creating and versioning datasets for specific usage.

If you don't guarantee immutability for the assets that the manifest points to, then your manifest becomes invalid.

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions