How do I create an AWS Lambda function that runs queries in Amazon Redshift?

3 minute read
0

I want to create an AWS Lambda function that runs queries on Amazon Redshift.

Resolution

Prerequisites

Before you create a Lambda function, you must set up the following Amazon Virtual Private Cloud (Amazon VPC) endpoints:

1.    Create a VPC with a private subnet.

2.    Create a subnet group. Add the VPC and subnet that you just created.

3.    Create a private Amazon Redshift cluster. Select the VPC and subnet group that you just created.

4.    Create a new secret for Amazon Redshift with AWS Secrets Manager. Name your secret redshift.

Create your Lambda function

To create a Lambda function that queries your Amazon Redshift cluster, follow these steps:

1.    Open the Lambda console.

2.    Choose Create function.

3.    Choose the Author from Scratch option.

4.    Update the following fields:
Function name: Enter a custom name.
Runtime: Enter your code environment. (The examples in this resolution are compatible with Python 3.9.)
Architecture: Enter your system architecture. (The examples in this resolution are compatible with x86_64.)
Permissions: Choose Create a new role with basic Lambda permissions.

5.    Choose Create function.

Set the correct permissions for your Lambda function

1.    In the Lambda console, choose Configuration.

2.    Choose Permissions.

3.    Choose the role created for your Lambda function.

4.    Choose Add Permissions.

5.    Choose Attach policies.

6.    Add the AmazonRedshiftDataFullAccess and SecretsManagerReadWrite policies to your Lambda execution role.

Note: It's a best practice to grant least privilege for only the permissions required to perform a task. For more information, see Apply least-privilege permissions.

Add Python code to your Lambda function

1.    In the Lambda console, choose Code.

2.    Paste the following code into the Code box:

Important:

  • Replace "dev" with the name of your database.
  • Add an environment variable in the Lambda function's configuration section for the key SecretId and secret_name.
import os
import json
import boto3
import botocore 
import botocore.session as bc
from botocore.client import Config
 
print('Loading function')

secret_name=os.environ['SecretId'] # getting SecretId from Environment varibales
session = boto3.session.Session()
region = session.region_name

# Initializing Secret Manager's client    
client = session.client(
    service_name='secretsmanager',
        region_name=region
    )

get_secret_value_response = client.get_secret_value(
        SecretId=secret_name
    )
secret_arn=get_secret_value_response['ARN']

secret = get_secret_value_response['SecretString']

secret_json = json.loads(secret)

cluster_id=secret_json['dbClusterIdentifier']

# Initializing Botocore client
bc_session = bc.get_session()

session = boto3.Session(
        botocore_session=bc_session,
        region_name=region
    )

# Initializing Redshift's client   
config = Config(connect_timeout=5, read_timeout=5)
client_redshift = session.client("redshift-data", config = config)

def lambda_handler(event, context):
    print("Entered lambda_handler")

    query_str = "create table public.lambda_func (id int);"
    try:
        result = client_redshift.execute_statement(Database= 'dev', SecretArn= secret_arn, Sql= query_str, ClusterIdentifier= cluster_id)
        print("API successfully executed")
        
    except botocore.exceptions.ConnectionError as e:
        client_redshift_1 = session.client("redshift-data", config = config)
        result = client_redshift_1.execute_statement(Database= 'dev', SecretArn= secret_arn, Sql= query_str, ClusterIdentifier= cluster_id)
        print("API executed after reestablishing the connection")
        return str(result)
        
    except Exception as e:
        raise Exception(e)
        
    return str(result)

In this example, Lambda connects to the Amazon Redshift database and creates a lambda_func table in the public schema.

AWS OFFICIAL
AWS OFFICIALUpdated a year ago
6 Comments

This code is missing the important part of making describe_statement request to actually see the results of your statement execution.

replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied a year ago

I am able to run the lambda against a serverless redshift cluster. The execute statement command works, but I am not able to see the returned result. result = client_redshift.execute_statement(Database= 'dev', SecretArn= secret_arn, Sql= query_str, ClusterIdentifier= cluster_id)

I am running Boto3 version 1.24.65

Logging the results end up blank. With this statement "logger.info('response', response)" I see this in CloudWatch:

[INFO] 2023-05-10T03:51:56.094Z a01a4549-5dc6-4ead-8060-c004b331f6ed response

The response object is never logged. And when trying to convert "result" to string with str(result) gives the following error:

[ERROR] TypeError: not all arguments converted during string formatting

But when I check the cluster via the AWS console, I see the new table was created. What needs to be done to be able to see the result of the query execution?

replied 10 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 10 months ago

As mentioned you'd need something to get the result of the execute command. I have a snippet below that I did, but you also have to import the time library as the time it takes the command to run can vary:

    statement_id = result['Id']

    describe_statement = client_redshift.describe_statement (
        Id = statement_id
    )
    while True:
        describe_statement = client_redshift.describe_statement (Id = statement_id)
        if (describe_statement["Status"] == "FINISHED"):
            print("Query Status - " + describe_statement["Status"])
            break;
        else:
            print("Query Status - " + describe_statement["Status"])
            time.sleep(2)
    statement_result = client_redshift.get_statement_result(
        Id = statement_id
    )

You can then take statement_result and pull out information you may need. For instance if you were running a SELECT query, you could see that by calling/printing statement_result["Records"]

replied 9 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 9 months ago