By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Setup SageMaker Notebooks to Access FinSpace with Managed kdb Insights

11 minute read
Content level: Advanced
0

This guide will help customers configure Amazon SageMaker managed notebooks to be used as q clients to Amazon FinSpace with Managed kdb Insights clusters.

This guide assumes you are familiar with the AWS tools such as the console and CLI, and have sufficient permissions to perform the tasks in the guide. Finally, please first have configured a FinSpace environment by running the FinSpace Foundations Workshop, the cluster created in the workshop will be connected to from the created SageMaker notebook.

Stage KX License File onto S3

First we need to put the KX license file (kc.lic) on S3 so that the lifecycle configuration script can copy the license file onto the notebook instance we will create. You will need the full S3 path of the license file when you create the lifecycle script (the script will copy the license file from S3 onto the notebook instance).

# Copy local license file to S3
aws s3 cp kc.lic s3://<PATH TO MY LICENSE FILE>/

Create a SageMaker Lifecycle Configuration

A lifecycle configuration is needed to setup the notebook instance by installing PyKX and the KX license. We will now create the lifecycle configuration for notebook instances.

In the AWS Console from the Amazon SageMaker page, select under “Admin configurations“ , ”Lifecycle configurations“, then under the ”Notebook Instance“ tab select ”Create configuration“.

SC1

In the “Create lifecycle configuration” wizard give the script a name (such as “install-pykx“) and for the script body, copy/paste the code below into the textbox under the ”Start notebook“ tab. Be sure to edit the script with the actual location of the kc.lic file in your S3 bucket.

SC2

Be sure to edit this line in the script to reflect the location of your license file (kc.lic).

aws s3 cp s3://<PATH TO MY LICENSE FILE>/kc.lic /home/ec2-user

Lifecycle Script

#!/bin/bash

set -e

# OVERVIEW
# This script installs packages in all SageMaker conda environments, apart from the JupyterSystemEnv
# which is a system environment reserved for Jupyter.

# NOTE: if the total runtime of this script exceeds 5 minutes, the Notebook Instance will fail to start up.  If you
# would like to run this script in the background, then replace "sudo" with "nohup sudo -b".  This will allow the
# Notebook Instance to start up while the installation happens in the background.

# set environment variables
# ------------------------------------------------
touch /etc/profile.d/jupyter-env.sh
 
# Set SSL_VERIFY_SERVER environment variable
echo "export SSL_VERIFY_SERVER=0" >> /etc/profile.d/jupyter-env.sh

# Set the QLIC environment variable, the location of the KDB License file
echo "export QLIC=\"/home/ec2-user\"" >> /etc/profile.d/jupyter-env.sh

# ------------------------------------------------

# TCP KeepAlive settings
sudo sysctl -w net.ipv4.tcp_keepalive_time=300
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=60
sudo sysctl -w net.ipv4.tcp_keepalive_probes=25

sudo -u ec2-user -i <<'EOF'

# copy the kc.lic file from S3
_**aws s3 cp s3://<PATH TO MY LICENSE FILE>/kc.lic /home/ec2-user**_

# Set SSL_VERIFY_SERVER environment variable
echo "export SSL_VERIFY_SERVER=0" >> /home/ec2-user/.bashrc

# Set the QLIC environment variable, the location of the KDB License file
echo "export QLIC=\"/home/ec2-user\"" >> /home/ec2-user/.bashrc

# Note that "base" is special environment name, include it there as well.
for env in base /home/ec2-user/anaconda3/envs/*; do
    conda activate $(basename "$env")

    if [ $env = 'JupyterSystemEnv' ]; then
        continue
    fi

    pip install pykx pandas numexpr awswrangler

    conda deactivate
done

EOF

# restart command is dependent on current running Amazon Linux and JupyterLab
CURR_VERSION=$(cat /etc/os-release)
if [[ $CURR_VERSION == *$"http://aws.amazon.com/amazon-linux-ami/"* ]]; then
    sudo initctl restart jupyter-server --no-wait
else
    sudo systemctl --no-block restart jupyter-server.service
fi

Confirm Lifecycle configuration creation

With the lifecycle configuration created, you should now see its name “install-pykx” in the list of Notebook instance Lifecycle configurations.

SC3

Create a SageMaker Notebook Instance

In the AWS Console from the Amazon SageMaker page, under “Applications and IDEs” select “Notebooks”, then select “Create notebook instance” to start the notebook creation wizard.

SC4

Give the notebook a name, select an instance type, and select the most recent Platform ("pykx-notebook, Amazon Linux 2, and Jupyter Lab 4).

Select the created lifecycle configuration:

  • Open the “Additional configuration” section
    • Select the lifecycle configuration you created above (install-pykx)

A Role will need to be created, and we will ensure that role has both SageMaker permissions and permissions FinSpace with Managed kdb Insights as well.

Under the Permissions and encryption section, under IAM role dropdown (Choose an IAM role), select ‘Create new role’

SC5 here

In the '“Create an IAM role” dialog, select the access you want to grant (we chose any S3 bucket) and create the role.

SC6

This will create a new role (this was called AmazonSageMaker-ExecutionRole-20241121T164422)

We need to further modify the role for FinSpace, so select the role name link to edit the IAM role further. This will open a new tab in your browser to edit the Role.

SC7

Modify the IAM role to add the kdb policy related to your environment (from the workshop this was called kdb-all) and modify the trust relationship to include FinSpace as well.

SC8
SC9

Now return to SageMaker to continue creation of the notebook instance.

To ensure connectivity to FinSpace clusters, configure the instance’s network section to match the FinSpace clusters you will be accessing. You can get the cluster’s network details from the console or by calling GetKxCluster API. Confirm the VPC, Subnet and security group match the running cluster.

SC10

With everything entered, select “Create notebook instance” at the bottom of the page

SC11

It will take a few minutes to create the instance, you can monitor the creation from the Notebook instances page

SC12

Once created you can open the instance in Jupyter or Jupyter Lab

SC13

Create FinSpace User for SageMaker Notebooks

To connect a q client to the running cluster, one will need the connection string (a signed URL) to the cluster, which can be generated using the FinSpace API GetKxConnectionString. To call the service API successfully, you will need to add a user in FinSpace and relate that user to the Role used by the SageMaker notebook.

Earlier you created an IAM Role for running sagemaker notebooks (ours was AmazonSageMaker-ExecutionRole-20241121T164422). We will use that role when creating the FinSpace User.

In the console, go to your FinSpace kdb environment page, select the “Users” tab, and select “Add User”.

SC14

In the Add user page, enter a username (we entered sagemaker) and select the IAM Role use by your SageMaker notebooks (ours was AmazonSageMaker-ExecutionRole-20241121T164422), and select “Add user” to create the user.

SC15
SC16

You have now created a FinSpace user (sagemaker) that is related to the role used by your notebooks.

Confirm SageMaker Notebook Instance Setup

PyKX and KX License

Launch Jupyter (in this case we selected Jupyter Lab) from the Notebook instances list of SageMaker.

SC17

Create a new notebook (menu: file → new → notebook) and select a kernel uch as “conda_pytorch_p310” (the lifecycle script had added pykx to all kernels of the notebook).

In a notebook cell, enter and run this code

import pykx as kx

# Validate the correct installation of your license
kx.q.til(10)

This should produce this output:

pykx.LongVector(pykx.q('0 1 2 3 4 5 6 7 8 9'))

Congratulations, PyKX is configured properly!

Permissions

Next we will confirm permissions, in another cell copy/paste the below function definitions in a new cell:

def get_kx_connection_string(client, clusterName:str, userName:str, environmentId:str):

    resp=client.get_kx_user(environmentId=environmentId, userName=userName)

    userArn = resp.get("userArn")

    resp=client.get_kx_connection_string(environmentId=environmentId, userArn=userArn, clusterName=clusterName)

    return resp.get("signedConnectionString", None)


def parse_connection_string(conn_str: str):
    conn_parts = conn_str.split(":")

    host=conn_parts[2].strip("/")
    port = int(conn_parts[3])
    username=conn_parts[4]
    password=conn_parts[5]

    return host, port, username, password


def get_pykx_connection(client, clusterName: str, userName: str, environmentId:str):
    conn_str = get_kx_connection_string(client, environmentId=environmentId, clusterName=clusterName, userName=userName)

    host, port, username, password = parse_connection_string(conn_str)

    return kx.SyncQConnection(host=host, port=port, username=username, password=password)

    return handle

Next, using the AWS boto3 Pyhon libraries, create a client to the FinSpace service

import boto3

# create finspace client
session = boto3.Session()
client = session.client(service_name='finspace')

Assign your environment ID to the Python variable ENV_ID, you can find your environment ID from the FinSpace Console page, like this:

SC18

With ENV_ID defined can connect to the cluster with PyKX. Below we are connecting the cluster created from the FinSpace Foundation Workshop “cluster_welcomedb” and using the finspace user “sagemaker” we created earlier.

ENV_ID="**YOUR ENVIRONMENT ID HERE**"

# Connect to cluster with PyKX
hdb = get_pykx_connection(client=client, clusterName="cluster_welcomedb", userName="sagemaker", environmentId=ENV_ID )

# list tables on the cluster
hdb("tables[]").py()

And the output below

['example']

Congratulations, Permissions are configured properly!

q Magic

PyKX installs a Jupyter q magic command, you can try it out as well, first get the connection string, but this time parse the bits into host, port, username, and password with the function we provided.

conn_str=get_kx_connection_string(client=client, clusterName="cluster_welcomedb", userName="sagemaker", environmentId=ENV_ID)

host, port, username, password = parse_connection_string(conn_str)

Then in another cell, use the q magic to pass the cell’s contents to the FinSpace cluster and display the response from the server, like this:

%%q --host $host --port $port --user $username --pass $password
tables[]

Outputs the same information for tables[] as before, but this time as text from the server, this is the same response you would see in the q console.

,`example

Troubleshooting

PyKX license not found when importing pykx

The kc.lic file is copied to the notebook instance as part of the lifecycle script, did you put the S3 path for your copy of the license file into the script?

“Host Not Reachable” when opening connection from notebook to cluster

This is a networking issue, and is likely related to the VPC, subnet, and security group used when creating the notebook and cluster. Confirm where the cluster is running is reachable by where the notebook instance is running.

ValidationException: An error occurred (ValidationException) when calling the GetKxUser operation: 1 validation error detected: Value at 'environmentId' failed to satisfy constraint: Member must satisfy regular expression pattern: ^[a-zA-Z0-9]{1,26}$

Did you put in your FinSpace environment ID for the value “YOUR ENVIRONMENT ID HERE” in the sample code provided?

References

KX: PyKX intallation guide
Configure PyKX
Amazon SageMaker Notebook Instances
Customization of a SageMaker notebook instance using an LCC script
Amazon SageMaker Notebook Instance Lifecycle Config Samples (github)
Sample Installing Python Packages (github)

Sample Policy: kdb-all

Substitute YOUR_ACCOUNT_ID with your AWS account ID.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "finspace:*",
            "Resource": "arn:aws:finspace:us-east-1:**YOUR_ACCOUNT_ID**:kxEnvironment/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": "arn:aws:iam::**YOUR_ACCOUNT_ID**:role/kdb-all-user"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectTagging",
                "s3:ListBucket"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Sample Trust Relationship for Role

Substitute YOUR_ACCOUNT_ID with your AWS account ID.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "sagemaker.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Principal": {
                "Service": "prod.finspacekx.aws.internal",
                "AWS": "arn:aws:iam::**YOUR_ACCOUNT_ID**:root"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Running PyKX in Unlicensed Mode

What if you don't don't have a kdb license to run PyKX in a licensed mode but still want to used Python as a q client, then you can run PyKX in what is called unlicensed mode, the LifeCycle script example below shows how to configure PyKX to always run in unlicensed mode. See PyKX's Modes of operation for more details.

#!/bin/bash

set -e

# OVERVIEW
# This script installs packages in all SageMaker conda environments, apart from the JupyterSystemEnv
# which is a system environment reserved for Jupyter.

# NOTE: if the total runtime of this script exceeds 5 minutes, the Notebook Instance will fail to start up.  If you
# would like to run this script in the background, then replace "sudo" with "nohup sudo -b".  This will allow the
# Notebook Instance to start up while the installation happens in the background.

# set environment variables
# ------------------------------------------------
touch /etc/profile.d/jupyter-env.sh
 
# Set SSL_VERIFY_SERVER environment variable
echo "export SSL_VERIFY_SERVER=0" >> /etc/profile.d/jupyter-env.sh

# Set the PYKX_UNLICENSED environment variable to run always unlicensed 
echo "export PYKX_UNLICENSED=1" >> /etc/profile.d/jupyter-env.sh

# ------------------------------------------------

# TCP KeepAlive settings
sudo sysctl -w net.ipv4.tcp_keepalive_time=300
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=60
sudo sysctl -w net.ipv4.tcp_keepalive_probes=25

sudo -u ec2-user -i <<'EOF'

# Set SSL_VERIFY_SERVER environment variable
echo "export SSL_VERIFY_SERVER=0" >> /home/ec2-user/.bashrc

# Set the PYKX_UNLICENSED environment variable to run always unlicensed 
echo "export PYKX_UNLICENSED=1" >> /home/ec2-user/.bashrc

# Note that "base" is special environment name, include it there as well.
for env in base /home/ec2-user/anaconda3/envs/*; do
    conda activate $(basename "$env")

    if [ $env = 'JupyterSystemEnv' ]; then
        continue
    fi

    pip install pykx pandas numexpr awswrangler

    conda deactivate
done

EOF

# restart command is dependent on current running Amazon Linux and JupyterLab
CURR_VERSION=$(cat /etc/os-release)
if [[ $CURR_VERSION == *$"http://aws.amazon.com/amazon-linux-ami/"* ]]; then
    sudo initctl restart jupyter-server --no-wait
else
    sudo systemctl --no-block restart jupyter-server.service
fi
profile pictureAWS
EXPERT
published 12 days ago34 views