Avoid metadata from Athena with Boto 3

2

I'm trying to schedule a data transformation with Athena using python and boto 3 (via glue). Once the query is launched, the data should be stored at an S3 sub-bucket.

I need the subbucket to have just the data, but the query creates a metadata file. I didn't find a way to avoid the query to create the metadata file I'm using the start_query_execution from boto 3 to run the query:

queryStart = client.start_query_execution(
    QueryString = query,
    QueryExecutionContext = {
        'Database': database
    }, 
    ResultConfiguration = { 'OutputLocation': 's3://' + bucket + '/' + subbucketpath}
)

I tried with the below function to remove the metadata file

s3 = session.resource('s3')
my_bucket = s3.Bucket(bucket)
for item in my_bucket.objects.filter(Prefix=subbucketpath):
      if item.endswith('.csv.metadata'):
            item.delete()

but it gives an error: AttributeError: 's3.ObjectSummary' object has no attribute 'endswith'.

Is there any other way to launch the Athena query from Glue or to remove the '.csv.metadata' files?

已提问 2 年前314 查看次数
1 回答
0

Hi,

Athena automatically creates metadata files when it moves files using the start_query_execution command. In order to delete the .csv.metadata files, you can use the following logic below. Make sure to use item.key to get the name of the object. The try statement will skip over the s3.ObjectSummary object that is giving this error.

session = boto3.session.Session()
s3 = session.resource('s3')
my_bucket = s3.Bucket(<bucketname>)
for item in my_bucket.objects.filter(Prefix=<subbucketpath>):
      try:
            if item.key.endswith('.csv.metadata'):
                  item.delete()
      except Exception as e:
            print("The following error occured: {}".format(e))

Reference: https://docs.aws.amazon.com/athena/latest/ug/querying.html

AWS
已回答 9 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则