Athena UNLOAD not bringing over headers?

0

Hello. I have an Athena query utilizing UNLOAD to bring data over to my S3 buckets. The query works quite well. However, I do not get the associated header information (column names) in the transferred files. I do not see an explicit parameter that I might be able to use to ensure the header attachment to the compressed (.gz) CSV files. Any help would be appreciated. Thanks.

UNLOAD (SELECT * FROM dataplace.datatable WHERE file_date = date '2022-07-01') 
TO 's3://my/super/bucket' 
WITH (format='TEXTFILE', field_delimiter = ',')
已提问 1 年前960 查看次数
2 回答
2

There is no option to add header in TEXTFILE using UNLOAD option in Athena. Please check https://docs.aws.amazon.com/athena/latest/ug/unload.html

If your goal is to use csv files, every Athena query execution stores the results as a csv file in the S3 location that you have set up. You can check that under "Settings" > "Query location" The query results are available based on query execution ID and you can download these files with the first column as column names.

If you are doing this programmatically, I can provide an example using Python boto3

profile pictureAWS
已回答 1 年前
  • Awesome! Would love to see your programatic solution with Python boto3. Thanks!

1

If you use the location variable, that should have your query result location. It is named query ID.csv - so, you could also construct the file name using that logic as another option.

import boto3,time
client = boto3.client('athena')
config_dict = {'query':'','bucket':''}
## This function executes the query and returns the query execution ID
response_query_execution_id = client.start_query_execution(
    QueryString = config_dict['query'],
    QueryExecutionContext = {
        'Database' : "default"
    },
    ResultConfiguration = {
        'OutputLocation': 's3://' + config_dict['bucket'] + '/queryoutput/' + 
    }
)

response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
)
status = 'RUNNING'
iterations = 360 # 30 mins

while (iterations > 0):
    iterations = iterations - 1
    response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
    )
    status = response_get_query_details['QueryExecution']['Status']['State']
    
    if (status == 'FAILED') or (status == 'CANCELLED') :
        failure_reason = response_get_query_details['QueryExecution']['Status']['StateChangeReason']
        print(failure_reason)

    elif status == 'SUCCEEDED':
        location = response_get_query_details['QueryExecution']['ResultConfiguration']['OutputLocation']

else:
        time.sleep(10)
profile pictureAWS
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则