Athena UNLOAD not bringing over headers?

0

Hello. I have an Athena query utilizing UNLOAD to bring data over to my S3 buckets. The query works quite well. However, I do not get the associated header information (column names) in the transferred files. I do not see an explicit parameter that I might be able to use to ensure the header attachment to the compressed (.gz) CSV files. Any help would be appreciated. Thanks.

UNLOAD (SELECT * FROM dataplace.datatable WHERE file_date = date '2022-07-01') 
TO 's3://my/super/bucket' 
WITH (format='TEXTFILE', field_delimiter = ',')
asked a year ago940 views
2 Answers
2

There is no option to add header in TEXTFILE using UNLOAD option in Athena. Please check https://docs.aws.amazon.com/athena/latest/ug/unload.html

If your goal is to use csv files, every Athena query execution stores the results as a csv file in the S3 location that you have set up. You can check that under "Settings" > "Query location" The query results are available based on query execution ID and you can download these files with the first column as column names.

If you are doing this programmatically, I can provide an example using Python boto3

profile pictureAWS
answered a year ago
  • Awesome! Would love to see your programatic solution with Python boto3. Thanks!

1

If you use the location variable, that should have your query result location. It is named query ID.csv - so, you could also construct the file name using that logic as another option.

import boto3,time
client = boto3.client('athena')
config_dict = {'query':'','bucket':''}
## This function executes the query and returns the query execution ID
response_query_execution_id = client.start_query_execution(
    QueryString = config_dict['query'],
    QueryExecutionContext = {
        'Database' : "default"
    },
    ResultConfiguration = {
        'OutputLocation': 's3://' + config_dict['bucket'] + '/queryoutput/' + 
    }
)

response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
)
status = 'RUNNING'
iterations = 360 # 30 mins

while (iterations > 0):
    iterations = iterations - 1
    response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
    )
    status = response_get_query_details['QueryExecution']['Status']['State']
    
    if (status == 'FAILED') or (status == 'CANCELLED') :
        failure_reason = response_get_query_details['QueryExecution']['Status']['StateChangeReason']
        print(failure_reason)

    elif status == 'SUCCEEDED':
        location = response_get_query_details['QueryExecution']['ResultConfiguration']['OutputLocation']

else:
        time.sleep(10)
profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions