By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon SageMaker Autopilot

Sort by most recent
  • 1
  • 12 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

how to use sagemaker with input mode FastFile with files that has Chinese in their name?

This post is both a bug report and a question. We are trying to use SageMaker to train a model and everything is quite standard. Since we have a lot of images, we'll suffer from a super long image downloading time if we don't change the input_mode to FastFile. Then I struggled to successfully load image in the container. In my dataset there are a lot of samples whose name contains Chinese. When I started debugging because I could not properly load files, I found that when sagemaker mounts the data from s3, it didn't take care of the encoding correctly. Here is an image name and the image path inside the training container: `七年级上_第10章分式_七年级上_第10章分式_1077759_title_0-0_4_mathjax` `/opt/ml/input/data/validation/\u4E03\u5E74\u7EA7\u4E0A_\u7B2C10\u7AE0\u5206\u5F0F_\u4E03\u5E74\u7EA7\u4E0A_\u7B2C10\u7AE0\u5206\u5F0F_1077759_title_0-0_4_mathjax.png` This is not neat but still I can have the right path in the container. The problem is that I'm not able to read the file even though the path exists: what I mean is `os.path.exists('/opt/ml/input/data/validation/\u4E03\u5E74\u7EA7\u4E0A_\u7B2C10\u7AE0\u5206\u5F0F_\u4E03\u5E74\u7EA7\u4E0A_\u7B2C10\u7AE0\u5206\u5F0F_1077759_title_0-0_4_mathjax.png')` gives true but `cv2.imread('/opt/ml/input/data/validation/\u4E03\u5E74\u7EA7\u4E0A_\u7B2C10\u7AE0\u5206\u5F0F_\u4E03\u5E74\u7EA7\u4E0A_\u7B2C10\u7AE0\u5206\u5F0F_1077759_title_0-0_4_mathjax.png')` returns None. Then I tried to open the file and fortunately it gives an error The code is `with open('/opt/ml/input/data/validation/\u4E03\u5E74\u7EA7\u4E0A_\u7B2C10\u7AE0\u5206\u5F0F_\u4E03\u5E74\u7EA7\u4E0A_\u7B2C10\u7AE0\u5206\u5F0F_1077759_title_0-0_4_mathjax.png', 'rb') as f: a = f.read() ` and it gives me the error `OSError: [Errno 107] Transport endpoint is not connected` I tried to load a file in the same folder whose name doesn't contain any Chinese. Everything works well in this case so I'm sure that the Chinese characters in the filenames are causing problems. I wonder if there is a quick walk around so I don't need to rename maybe 80% of the data in s3.
0
answers
0
votes
46
views
asked 4 months ago
3
answers
0
votes
21
views
asked 6 months ago
  • 1
  • 12 / page