Avoid metadata from Athena with Boto 3

2

I'm trying to schedule a data transformation with Athena using python and boto 3 (via glue). Once the query is launched, the data should be stored at an S3 sub-bucket.

I need the subbucket to have just the data, but the query creates a metadata file. I didn't find a way to avoid the query to create the metadata file I'm using the start_query_execution from boto 3 to run the query:

queryStart = client.start_query_execution(
    QueryString = query,
    QueryExecutionContext = {
        'Database': database
    }, 
    ResultConfiguration = { 'OutputLocation': 's3://' + bucket + '/' + subbucketpath}
)

I tried with the below function to remove the metadata file

s3 = session.resource('s3')
my_bucket = s3.Bucket(bucket)
for item in my_bucket.objects.filter(Prefix=subbucketpath):
      if item.endswith('.csv.metadata'):
            item.delete()

but it gives an error: AttributeError: 's3.ObjectSummary' object has no attribute 'endswith'.

Is there any other way to launch the Athena query from Glue or to remove the '.csv.metadata' files?

質問済み 2年前313ビュー
1回答
0

Hi,

Athena automatically creates metadata files when it moves files using the start_query_execution command. In order to delete the .csv.metadata files, you can use the following logic below. Make sure to use item.key to get the name of the object. The try statement will skip over the s3.ObjectSummary object that is giving this error.

session = boto3.session.Session()
s3 = session.resource('s3')
my_bucket = s3.Bucket(<bucketname>)
for item in my_bucket.objects.filter(Prefix=<subbucketpath>):
      try:
            if item.key.endswith('.csv.metadata'):
                  item.delete()
      except Exception as e:
            print("The following error occured: {}".format(e))

Reference: https://docs.aws.amazon.com/athena/latest/ug/querying.html

AWS
回答済み 9ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ