Do Amazon SageMaker manifest files enable dataset versioning?

0

Some Amazon SageMaker algorithms can train with a manifest JSON file that stores the mapping between images and their Amazon S3 ARNs and metadata, such as labels. This is a great option, because the manifest file is much smaller than the dataset itself. Because the manifest files are small, they can be used easily in versioning tools or saved as part of the model artifact. This appears to be the best construct enabling exact dataset versioning within SageMaker. i.e., if we exclude the creation of a unique training set hard copy per training job that can't be scaled to large datasets. Is my understanding accurate?

AWS
전문가
질문됨 3년 전328회 조회
1개 답변
0
수락된 답변

If you create the conditions for immutability of the assets the manifest points to, then manifest enables exact dataset versioning with SageMaker. You can have a data store in Amazon S3 with all versions of the data assets and use the manifest files for creating and versioning datasets for specific usage.

If you don't guarantee immutability for the assets that the manifest points to, then your manifest becomes invalid.

답변함 3년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인