Sagemaker Training Input mode as FastFile still downloading

0

I'm trying to train a custom model using the FastFile input mode, as my input data is quite large (about 27 GB). As specified in the Sagemaker SDK doc, I've set the input_mode parameter of my Estimator object to FastFile, and created a small pipeline to expose my hyperparameters (I prefer to use the pipeline in Sagemaker Studio). When I run this pipeline, a training task is created using my custom image and code to train this model. I usually use the ml.g4dn.xlarge instance for training to speed up the process. Finally, after the initialization stage, the training task attempts to download the data from the "folder" on the S3 bucket (where my 27 GB of data is stored). But I clearly specified that I wanted to use FastFile mode, I didn't expect to download the data from my bucket.

So the question is: why is the training job still downloading data from the bucket even though I've enabled FastFile mode?

  • Do you have a large number of files ?

  • Ye, in those 27 GB, I have 266 folders with 4 numpy files each. Each folder is about 100 MB of data.

질문됨 10달 전511회 조회
1개 답변
0

SageMaker Fast file mode streams the data directly from S3 when you access the file. From an usability perspective you will still access the files as if they are on disc and SageMaker makes sure to stream the file from S3 when accessed. For your use case using File Mode which does the full copy rather than streaming will be better approach as the initial copy is much faster for datasets less than 100 GB. Please refer to the below blog to determine the right option for your training

https://aws.amazon.com/blogs/machine-learning/choose-the-best-data-source-for-your-amazon-sagemaker-training-job/

AWS
답변함 10달 전
  • In the short term, I can deal with the default File mode. However, in the long term, I may need the Fast File mode (I didn't reach 100 GB of data yet). I was expecting to be working with a small example of nearly 30 GB, that's why I do not understand why it's not working, especially when I can switch from File to FastFile without changing the code.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠