Do Amazon SageMaker manifest files enable dataset versioning?

0

Some Amazon SageMaker algorithms can train with a manifest JSON file that stores the mapping between images and their Amazon S3 ARNs and metadata, such as labels. This is a great option, because the manifest file is much smaller than the dataset itself. Because the manifest files are small, they can be used easily in versioning tools or saved as part of the model artifact. This appears to be the best construct enabling exact dataset versioning within SageMaker. i.e., if we exclude the creation of a unique training set hard copy per training job that can't be scaled to large datasets. Is my understanding accurate?

AWS
專家
已提問 3 年前檢視次數 328 次
1 個回答
0
已接受的答案

If you create the conditions for immutability of the assets the manifest points to, then manifest enables exact dataset versioning with SageMaker. You can have a data store in Amazon S3 with all versions of the data assets and use the manifest files for creating and versioning datasets for specific usage.

If you don't guarantee immutability for the assets that the manifest points to, then your manifest becomes invalid.

已回答 3 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南