Do Amazon SageMaker manifest files enable dataset versioning?

0

Some Amazon SageMaker algorithms can train with a manifest JSON file that stores the mapping between images and their Amazon S3 ARNs and metadata, such as labels. This is a great option, because the manifest file is much smaller than the dataset itself. Because the manifest files are small, they can be used easily in versioning tools or saved as part of the model artifact. This appears to be the best construct enabling exact dataset versioning within SageMaker. i.e., if we exclude the creation of a unique training set hard copy per training job that can't be scaled to large datasets. Is my understanding accurate?

AWS
专家
已提问 3 年前328 查看次数
1 回答
0
已接受的回答

If you create the conditions for immutability of the assets the manifest points to, then manifest enables exact dataset versioning with SageMaker. You can have a data store in Amazon S3 with all versions of the data assets and use the manifest files for creating and versioning datasets for specific usage.

If you don't guarantee immutability for the assets that the manifest points to, then your manifest becomes invalid.

已回答 3 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则