download a file from the internet to s3, and then unzip/untar the file on s3 from a Jupyter Notebook

0

I would like to download a file from the internet directly to s3: http://hog.ee.columbia.edu/craffel/lmd/lmd_matched_h5.tar.gz Then, I would like to unzip/untar the file and extract it's contents (and folder structure) in s3. Note: There is a folder structure within the tar. I am not planning to do this on multiple tar.gz files-- it is just a one-time operation as a part of a demo in a Jupyter Notebook. What is the simplest, most direct, and most efficient way to accomplish this task?

awsuser
已提問 10 個月前檢視次數 671 次
2 個答案
0

Hi - Some steps could be

  1. Read the zip file from S3 using the Boto3 S3 resource Object
  2. Open the object using a module which supports working with tar or zip.
  3. Iterate over each file in the zip file using any available list method
  4. Write the file back to another bucket in S3
profile pictureAWS
專家
已回答 10 個月前
0

The suggestion by @Nitin above would certainly work, if preserving the directory tree within the ZIP file is important you may want to look at mounting the S3 bucket onto the Linux host itself.

The officially supported way would be S3 File Gateway https://aws.amazon.com/blogs/storage/mounting-amazon-s3-to-an-amazon-ec2-instance-using-a-private-connection-to-s3-file-gateway/ but that's expensive, and probably not worth it for a one-off demonstration.

There is also s3fs https://github.com/s3fs-fuse/s3fs-fuse which will do much the same, although I find it rather slow if it's just for a one-off demonstration you can probably live with it. The README.md of that Github project shows where it's available from, and how to install it.

There's also a very new offering called Mountpoint for S3 https://aws.amazon.com/blogs/storage/the-inside-story-on-mountpoint-for-amazon-s3-a-high-performance-open-source-file-client/ which I've not used myself yet, but on a quick reading of that blog it may be also achieve what you want.

profile picture
專家
Steve_M
已回答 10 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南