Athena UNLOAD not bringing over headers?

0

Hello. I have an Athena query utilizing UNLOAD to bring data over to my S3 buckets. The query works quite well. However, I do not get the associated header information (column names) in the transferred files. I do not see an explicit parameter that I might be able to use to ensure the header attachment to the compressed (.gz) CSV files. Any help would be appreciated. Thanks.

UNLOAD (SELECT * FROM dataplace.datatable WHERE file_date = date '2022-07-01') 
TO 's3://my/super/bucket' 
WITH (format='TEXTFILE', field_delimiter = ',')
질문됨 일 년 전961회 조회
2개 답변
2

There is no option to add header in TEXTFILE using UNLOAD option in Athena. Please check https://docs.aws.amazon.com/athena/latest/ug/unload.html

If your goal is to use csv files, every Athena query execution stores the results as a csv file in the S3 location that you have set up. You can check that under "Settings" > "Query location" The query results are available based on query execution ID and you can download these files with the first column as column names.

If you are doing this programmatically, I can provide an example using Python boto3

profile pictureAWS
답변함 일 년 전
  • Awesome! Would love to see your programatic solution with Python boto3. Thanks!

1

If you use the location variable, that should have your query result location. It is named query ID.csv - so, you could also construct the file name using that logic as another option.

import boto3,time
client = boto3.client('athena')
config_dict = {'query':'','bucket':''}
## This function executes the query and returns the query execution ID
response_query_execution_id = client.start_query_execution(
    QueryString = config_dict['query'],
    QueryExecutionContext = {
        'Database' : "default"
    },
    ResultConfiguration = {
        'OutputLocation': 's3://' + config_dict['bucket'] + '/queryoutput/' + 
    }
)

response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
)
status = 'RUNNING'
iterations = 360 # 30 mins

while (iterations > 0):
    iterations = iterations - 1
    response_get_query_details = client.get_query_execution(
    QueryExecutionId = response_query_execution_id['QueryExecutionId']
    )
    status = response_get_query_details['QueryExecution']['Status']['State']
    
    if (status == 'FAILED') or (status == 'CANCELLED') :
        failure_reason = response_get_query_details['QueryExecution']['Status']['StateChangeReason']
        print(failure_reason)

    elif status == 'SUCCEEDED':
        location = response_get_query_details['QueryExecution']['ResultConfiguration']['OutputLocation']

else:
        time.sleep(10)
profile pictureAWS
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠