Athena UNLOAD not bringing over headers?

0

Hello. I have an Athena query utilizing UNLOAD to bring data over to my S3 buckets. The query works quite well. However, I do not get the associated header information (column names) in the transferred files. I do not see an explicit parameter that I might be able to use to ensure the header attachment to the compressed (.gz) CSV files. Any help would be appreciated. Thanks.

UNLOAD (SELECT * FROM dataplace.datatable WHERE file_date = date '2022-07-01') 
TO 's3://my/super/bucket' 
WITH (format='TEXTFILE', field_delimiter = ',')
已提問 1 年前檢視次數 960 次
2 個答案
2

There is no option to add header in TEXTFILE using UNLOAD option in Athena. Please check https://docs.aws.amazon.com/athena/latest/ug/unload.html

If your goal is to use csv files, every Athena query execution stores the results as a csv file in the S3 location that you have set up. You can check that under "Settings" > "Query location" The query results are available based on query execution ID and you can download these files with the first column as column names.

If you are doing this programmatically, I can provide an example using Python boto3

profile pictureAWS
已回答 1 年前
  • Awesome! Would love to see your programatic solution with Python boto3. Thanks!

1

If you use the location variable, that should have your query result location. It is named query ID.csv - so, you could also construct the file name using that logic as another option.

import boto3,time
client = boto3.client('athena')
config_dict = {'query':'','bucket':''}
## This function executes the query and returns the query execution ID
response_query_execution_id = client.start_query_execution(
    QueryString = config_dict['query'],
    QueryExecutionContext = {
        'Database' : "default"
    },
    ResultConfiguration = {
        'OutputLocation': 's3://' + config_dict['bucket'] + '/queryoutput/' + 
    }
)

response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
)
status = 'RUNNING'
iterations = 360 # 30 mins

while (iterations > 0):
    iterations = iterations - 1
    response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
    )
    status = response_get_query_details['QueryExecution']['Status']['State']
    
    if (status == 'FAILED') or (status == 'CANCELLED') :
        failure_reason = response_get_query_details['QueryExecution']['Status']['StateChangeReason']
        print(failure_reason)

    elif status == 'SUCCEEDED':
        location = response_get_query_details['QueryExecution']['ResultConfiguration']['OutputLocation']

else:
        time.sleep(10)
profile pictureAWS
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南