Avoid metadata from Athena with Boto 3


I'm trying to schedule a data transformation with Athena using python and boto 3 (via glue). Once the query is launched, the data should be stored at an S3 sub-bucket.

I need the subbucket to have just the data, but the query creates a metadata file. I didn't find a way to avoid the query to create the metadata file I'm using the start_query_execution from boto 3 to run the query:

queryStart = client.start_query_execution(
    QueryString = query,
    QueryExecutionContext = {
        'Database': database
    ResultConfiguration = { 'OutputLocation': 's3://' + bucket + '/' + subbucketpath}

I tried with the below function to remove the metadata file

s3 = session.resource('s3')
my_bucket = s3.Bucket(bucket)
for item in my_bucket.objects.filter(Prefix=subbucketpath):
      if item.endswith('.csv.metadata'):

but it gives an error: AttributeError: 's3.ObjectSummary' object has no attribute 'endswith'.

Is there any other way to launch the Athena query from Glue or to remove the '.csv.metadata' files?

질문됨 2년 전314회 조회
1개 답변


Athena automatically creates metadata files when it moves files using the start_query_execution command. In order to delete the .csv.metadata files, you can use the following logic below. Make sure to use item.key to get the name of the object. The try statement will skip over the s3.ObjectSummary object that is giving this error.

session = boto3.session.Session()
s3 = session.resource('s3')
my_bucket = s3.Bucket(<bucketname>)
for item in my_bucket.objects.filter(Prefix=<subbucketpath>):
            if item.key.endswith('.csv.metadata'):
      except Exception as e:
            print("The following error occured: {}".format(e))

Reference: https://docs.aws.amazon.com/athena/latest/ug/querying.html

답변함 9달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠