download a file from the internet to s3, and then unzip/untar the file on s3 from a Jupyter Notebook

0

I would like to download a file from the internet directly to s3: http://hog.ee.columbia.edu/craffel/lmd/lmd_matched_h5.tar.gz Then, I would like to unzip/untar the file and extract it's contents (and folder structure) in s3. Note: There is a folder structure within the tar. I am not planning to do this on multiple tar.gz files-- it is just a one-time operation as a part of a demo in a Jupyter Notebook. What is the simplest, most direct, and most efficient way to accomplish this task?

awsuser
gefragt vor 10 Monaten672 Aufrufe
2 Antworten
0

Hi - Some steps could be

  1. Read the zip file from S3 using the Boto3 S3 resource Object
  2. Open the object using a module which supports working with tar or zip.
  3. Iterate over each file in the zip file using any available list method
  4. Write the file back to another bucket in S3
profile pictureAWS
EXPERTE
beantwortet vor 10 Monaten
0

The suggestion by @Nitin above would certainly work, if preserving the directory tree within the ZIP file is important you may want to look at mounting the S3 bucket onto the Linux host itself.

The officially supported way would be S3 File Gateway https://aws.amazon.com/blogs/storage/mounting-amazon-s3-to-an-amazon-ec2-instance-using-a-private-connection-to-s3-file-gateway/ but that's expensive, and probably not worth it for a one-off demonstration.

There is also s3fs https://github.com/s3fs-fuse/s3fs-fuse which will do much the same, although I find it rather slow if it's just for a one-off demonstration you can probably live with it. The README.md of that Github project shows where it's available from, and how to install it.

There's also a very new offering called Mountpoint for S3 https://aws.amazon.com/blogs/storage/the-inside-story-on-mountpoint-for-amazon-s3-a-high-performance-open-source-file-client/ which I've not used myself yet, but on a quick reading of that blog it may be also achieve what you want.

profile picture
EXPERTE
Steve_M
beantwortet vor 10 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen