Athena UNLOAD not bringing over headers?

0

Hello. I have an Athena query utilizing UNLOAD to bring data over to my S3 buckets. The query works quite well. However, I do not get the associated header information (column names) in the transferred files. I do not see an explicit parameter that I might be able to use to ensure the header attachment to the compressed (.gz) CSV files. Any help would be appreciated. Thanks.

UNLOAD (SELECT * FROM dataplace.datatable WHERE file_date = date '2022-07-01') 
TO 's3://my/super/bucket' 
WITH (format='TEXTFILE', field_delimiter = ',')
質問済み 1年前960ビュー
2回答
2

There is no option to add header in TEXTFILE using UNLOAD option in Athena. Please check https://docs.aws.amazon.com/athena/latest/ug/unload.html

If your goal is to use csv files, every Athena query execution stores the results as a csv file in the S3 location that you have set up. You can check that under "Settings" > "Query location" The query results are available based on query execution ID and you can download these files with the first column as column names.

If you are doing this programmatically, I can provide an example using Python boto3

profile pictureAWS
回答済み 1年前
  • Awesome! Would love to see your programatic solution with Python boto3. Thanks!

1

If you use the location variable, that should have your query result location. It is named query ID.csv - so, you could also construct the file name using that logic as another option.

import boto3,time
client = boto3.client('athena')
config_dict = {'query':'','bucket':''}
## This function executes the query and returns the query execution ID
response_query_execution_id = client.start_query_execution(
    QueryString = config_dict['query'],
    QueryExecutionContext = {
        'Database' : "default"
    },
    ResultConfiguration = {
        'OutputLocation': 's3://' + config_dict['bucket'] + '/queryoutput/' + 
    }
)

response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
)
status = 'RUNNING'
iterations = 360 # 30 mins

while (iterations > 0):
    iterations = iterations - 1
    response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
    )
    status = response_get_query_details['QueryExecution']['Status']['State']
    
    if (status == 'FAILED') or (status == 'CANCELLED') :
        failure_reason = response_get_query_details['QueryExecution']['Status']['StateChangeReason']
        print(failure_reason)

    elif status == 'SUCCEEDED':
        location = response_get_query_details['QueryExecution']['ResultConfiguration']['OutputLocation']

else:
        time.sleep(10)
profile pictureAWS
回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ