How can I schedule an Amazon Athena query?

3 minuto de leitura
1

I want to schedule queries in Amazon Athena.

Short description

Scheduling queries is useful in many scenarios, such as running periodic reporting queries or loading new partitions on a regular interval. Here are some of the ways that you can schedule queries in Athena:

Resolution

Follow these steps to schedule an Athena query using a Lambda function and an EventBridge rule:

1.    Create an AWS Identity and Access Management (IAM) service role for Lambda. Then, attach a policy that allows access to Athena, Amazon Simple Storage Service (Amazon S3), and Amazon CloudWatch Logs. For example, you can add AmazonAthenaFullAccess and CloudWatchLogsFullAccess to the role. AmazonAthenaFullAccess allows full access to Athena and includes basic permissions for Amazon S3. CloudWatchLogsFullAccess allows full access to CloudWatch Logs.

2.    Open the Lambda console.

3.    Choose Create function.

4.    Be sure that Author from scratch is selected, and then configure the following options:

For Name, enter a name for your function.
For Runtime, choose one of the Python options.
For Role, choose Use an existing role, and then choose the IAM role that you created in step 1.

5.    Choose Create function.

6.    Paste your code in the Function code section. The following example uses Python 3.7. Replace the following values in the example:

default: the Athena database name
SELECT * FROM default.tb: the query that you want to schedule
s3://AWSDOC-EXAMPLE-BUCKET/: the S3 bucket for the query output

import boto3

# Query string to execute
query = 'SELECT * FROM database.tb'

# Database to execute the query against
DATABASE = 'database'

# Output location for query results
output='s3://OUTPUTBUCKET/'

def lambda_handler(event, context):
    # Initiate the Boto3 Client
    client = boto3.client('athena')

    # Start the query execution
    response = client.start_query_execution(
        QueryString=query,
        QueryExecutionContext={
            'Database': DATABASE
        },
        ResultConfiguration={
            'OutputLocation': output
        }
    )

    # Return response after starting the query execution
    return response

7.    Choose Deploy.

8.    Open the Amazon EventBridge console.

9.    In the navigation pane, choose Rules, and then choose Create rule.

10.    Enter a name and description for the rule.

11.    For Define pattern, select Schedule.

12.    Select Cron expression, and then enter a cron expression.

13.    For Select event bus, select AWS default event bus.

14.    In the Select Targets section, do the following:

For Target, select Lambda function from the dropdown list. For Function, select the name of your Lambda function from the dropdown list.

15.    Choose Create.

If you're scheduling multiple queries, note that there are quotas for the number of calls to the Athena API per account. For more information, see Per account API call quotas.


Related information

Tutorial: Schedule AWS Lambda functions using EventBridge

Creating an Amazon EventBridge rule that runs on a schedule

AWS OFICIAL
AWS OFICIALAtualizada há um ano