How can I schedule an Amazon Athena query?
I want to schedule queries in Amazon Athena.
Short description
Scheduling queries is useful in many scenarios, such as running periodic reporting queries or loading new partitions on a regular interval. Here are some of the ways that you can schedule queries in Athena:
- Create an AWS Lambda function, using the SDK of your choice, to schedule the query. For more information about the programming languages that Lambda supports, see AWS Lambda FAQs. Then, create an Amazon EventBridge rule to schedule the Lambda function. This is the method explained in the Resolution.
- If you're using Athena in an ETL pipeline, use AWS Step Functions to create the pipeline and schedule the query.
- On a Linux machine, use crontab to schedule the query.
- Use an AWS Glue Python shell job to run the Athena query using the Athena boto3 API. Then, define a schedule for the AWS Glue job.
Resolution
Follow these steps to schedule an Athena query using a Lambda function and an EventBridge rule:
1. Create an AWS Identity and Access Management (IAM) service role for Lambda. Then, attach a policy that allows access to Athena, Amazon Simple Storage Service (Amazon S3), and Amazon CloudWatch Logs. For example, you can add AmazonAthenaFullAccess and CloudWatchLogsFullAccess to the role. AmazonAthenaFullAccess allows full access to Athena and includes basic permissions for Amazon S3. CloudWatchLogsFullAccess allows full access to CloudWatch Logs.
2. Open the Lambda console.
3. Choose Create function.
4. Be sure that Author from scratch is selected, and then configure the following options:
For Name, enter a name for your function.
For Runtime, choose one of the Python options.
For Role, choose Use an existing role, and then choose the IAM role that you created in step 1.
5. Choose Create function.
6. Paste your code in the Function code section. The following example uses Python 3.7. Replace the following values in the example:
default: the Athena database name
SELECT * FROM default.tb: the query that you want to schedule
s3://AWSDOC-EXAMPLE-BUCKET/: the S3 bucket for the query output
import boto3 # Query string to execute query = 'SELECT * FROM database.tb' # Database to execute the query against DATABASE = 'database' # Output location for query results output='s3://OUTPUTBUCKET/' def lambda_handler(event, context): # Initiate the Boto3 Client client = boto3.client('athena') # Start the query execution response = client.start_query_execution( QueryString=query, QueryExecutionContext={ 'Database': DATABASE }, ResultConfiguration={ 'OutputLocation': output } ) # Return response after starting the query execution return response
7. Choose Deploy.
8. Open the Amazon EventBridge console.
9. In the navigation pane, choose Rules, and then choose Create rule.
10. Enter a name and description for the rule.
11. For Define pattern, select Schedule.
12. Select Cron expression, and then enter a cron expression.
13. For Select event bus, select AWS default event bus.
14. In the Select Targets section, do the following:
For Target, select Lambda function from the dropdown list. For Function, select the name of your Lambda function from the dropdown list.
15. Choose Create.
If you're scheduling multiple queries, note that there are quotas for the number of calls to the Athena API per account. For more information, see Per account API call quotas.
Related information

Relevanter Inhalt
- AWS OFFICIALAktualisiert vor 8 Monaten
- AWS OFFICIALAktualisiert vor 7 Monaten
- AWS OFFICIALAktualisiert vor 3 Monaten
- AWS OFFICIALAktualisiert vor 8 Monaten