Skip to content

Read a tar file from s3 and uncompress it

0

Hi,

I want to read a tar file from s3, uncompress it and load it to another s3 bucket using Glue job. But I am facing "fileobj must implement read".

obj=s3.getObject(bucketname,key) objbuffer = io.BytesIO(obj["Body"].read()) tarf = tarfile.open(fileobj=objbuffer) files = tarf.getnames() for file in files: with open(file, 'rb') as f: s3.upload_fileobj(f, tgt_bucket, filepath, Config=config)

Note : I am using upload_fileobj to handle mutlipart upload and Config has TransferConfig details

asked 2 years ago634 views
1 Answer
1

Hi,

Are you 100% sure that the tarf.getnames() returns "real" files ? It can also return symlinks, directories, etcetc.

Look at https://docs.python.org/3/library/tarfile.html#tarfile.TarInfo

TarInfo.type
File type. type is usually one of these constants: REGTYPE, AREGTYPE, LNKTYPE, 
SYMTYPE, DIRTYPE, FIFOTYPE, CONTTYPE, CHRTYPE, BLKTYPE, GNUTYPE_SPARSE. 
To determine the type of a TarInfo object more conveniently, use the is*() methods below.

So, you may want to check the type of the tar member before uploading it.

Best,

Didier

EXPERT
answered 2 years ago
AWS
EXPERT
reviewed 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.