Setup SageMaker Notebooks to Access FinSpace with Managed kdb Insights
This guide will help customers configure Amazon SageMaker managed notebooks to be used as q clients to Amazon FinSpace with Managed kdb Insights clusters.
This guide assumes you are familiar with the AWS tools such as the console and CLI, and have sufficient permissions to perform the tasks in the guide. Finally, please first have configured a FinSpace environment by running the FinSpace Foundations Workshop, the cluster created in the workshop will be connected to from the created SageMaker notebook.
Stage KX License File onto S3
First we need to put the KX license file (kc.lic) on S3 so that the lifecycle configuration script can copy the license file onto the notebook instance we will create. You will need the full S3 path of the license file when you create the lifecycle script (the script will copy the license file from S3 onto the notebook instance).
# Copy local license file to S3
aws s3 cp kc.lic s3://<PATH TO MY LICENSE FILE>/
Create a SageMaker Lifecycle Configuration
A lifecycle configuration is needed to setup the notebook instance by installing PyKX and the KX license. We will now create the lifecycle configuration for notebook instances.
In the AWS Console from the Amazon SageMaker page, select under “Admin configurations“ , ”Lifecycle configurations“, then under the ”Notebook Instance“ tab select ”Create configuration“.
In the “Create lifecycle configuration” wizard give the script a name (such as “install-pykx“) and for the script body, copy/paste the code below into the textbox under the ”Start notebook“ tab. Be sure to edit the script with the actual location of the kc.lic file in your S3 bucket.
Be sure to edit this line in the script to reflect the location of your license file (kc.lic).
aws s3 cp s3://<PATH TO MY LICENSE FILE>/kc.lic /home/ec2-user
Lifecycle Script
#!/bin/bash set -e # OVERVIEW # This script installs packages in all SageMaker conda environments, apart from the JupyterSystemEnv # which is a system environment reserved for Jupyter. # NOTE: if the total runtime of this script exceeds 5 minutes, the Notebook Instance will fail to start up. If you # would like to run this script in the background, then replace "sudo" with "nohup sudo -b". This will allow the # Notebook Instance to start up while the installation happens in the background. # set environment variables # ------------------------------------------------ touch /etc/profile.d/jupyter-env.sh # Set SSL_VERIFY_SERVER environment variable echo "export SSL_VERIFY_SERVER=0" >> /etc/profile.d/jupyter-env.sh # Set the QLIC environment variable, the location of the KDB License file echo "export QLIC=\"/home/ec2-user\"" >> /etc/profile.d/jupyter-env.sh # ------------------------------------------------ # TCP KeepAlive settings sudo sysctl -w net.ipv4.tcp_keepalive_time=300 sudo sysctl -w net.ipv4.tcp_keepalive_intvl=60 sudo sysctl -w net.ipv4.tcp_keepalive_probes=25 sudo -u ec2-user -i <<'EOF' # copy the kc.lic file from S3 _**aws s3 cp s3://<PATH TO MY LICENSE FILE>/kc.lic /home/ec2-user**_ # Set SSL_VERIFY_SERVER environment variable echo "export SSL_VERIFY_SERVER=0" >> /home/ec2-user/.bashrc # Set the QLIC environment variable, the location of the KDB License file echo "export QLIC=\"/home/ec2-user\"" >> /home/ec2-user/.bashrc # Note that "base" is special environment name, include it there as well. for env in base /home/ec2-user/anaconda3/envs/*; do conda activate $(basename "$env") if [ $env = 'JupyterSystemEnv' ]; then continue fi pip install pykx pandas numexpr awswrangler conda deactivate done EOF # restart command is dependent on current running Amazon Linux and JupyterLab CURR_VERSION=$(cat /etc/os-release) if [[ $CURR_VERSION == *$"http://aws.amazon.com/amazon-linux-ami/"* ]]; then sudo initctl restart jupyter-server --no-wait else sudo systemctl --no-block restart jupyter-server.service fi
Confirm Lifecycle configuration creation
With the lifecycle configuration created, you should now see its name “install-pykx” in the list of Notebook instance Lifecycle configurations.
Create a SageMaker Notebook Instance
In the AWS Console from the Amazon SageMaker page, under “Applications and IDEs” select “Notebooks”, then select “Create notebook instance” to start the notebook creation wizard.
Give the notebook a name, select an instance type, and select the most recent Platform ("pykx-notebook, Amazon Linux 2, and Jupyter Lab 4).
Select the created lifecycle configuration:
- Open the “Additional configuration” section
- Select the lifecycle configuration you created above (install-pykx)
A Role will need to be created, and we will ensure that role has both SageMaker permissions and permissions FinSpace with Managed kdb Insights as well.
Under the Permissions and encryption section, under IAM role dropdown (Choose an IAM role), select ‘Create new role’
In the '“Create an IAM role” dialog, select the access you want to grant (we chose any S3 bucket) and create the role.
This will create a new role (this was called AmazonSageMaker-ExecutionRole-20241121T164422)
We need to further modify the role for FinSpace, so select the role name link to edit the IAM role further. This will open a new tab in your browser to edit the Role.
Modify the IAM role to add the kdb policy related to your environment (from the workshop this was called kdb-all) and modify the trust relationship to include FinSpace as well.
Now return to SageMaker to continue creation of the notebook instance.
To ensure connectivity to FinSpace clusters, configure the instance’s network section to match the FinSpace clusters you will be accessing. You can get the cluster’s network details from the console or by calling GetKxCluster API. Confirm the VPC, Subnet and security group match the running cluster.
With everything entered, select “Create notebook instance” at the bottom of the page
It will take a few minutes to create the instance, you can monitor the creation from the Notebook instances page
Once created you can open the instance in Jupyter or Jupyter Lab
Create FinSpace User for SageMaker Notebooks
To connect a q client to the running cluster, one will need the connection string (a signed URL) to the cluster, which can be generated using the FinSpace API GetKxConnectionString. To call the service API successfully, you will need to add a user in FinSpace and relate that user to the Role used by the SageMaker notebook.
Earlier you created an IAM Role for running sagemaker notebooks (ours was AmazonSageMaker-ExecutionRole-20241121T164422). We will use that role when creating the FinSpace User.
In the console, go to your FinSpace kdb environment page, select the “Users” tab, and select “Add User”.
In the Add user page, enter a username (we entered sagemaker) and select the IAM Role use by your SageMaker notebooks (ours was AmazonSageMaker-ExecutionRole-20241121T164422), and select “Add user” to create the user.
You have now created a FinSpace user (sagemaker) that is related to the role used by your notebooks.
Confirm SageMaker Notebook Instance Setup
PyKX and KX License
Launch Jupyter (in this case we selected Jupyter Lab) from the Notebook instances list of SageMaker.
Create a new notebook (menu: file → new → notebook) and select a kernel uch as “conda_pytorch_p310” (the lifecycle script had added pykx to all kernels of the notebook).
In a notebook cell, enter and run this code
import pykx as kx # Validate the correct installation of your license kx.q.til(10)
This should produce this output:
pykx.LongVector(pykx.q('0 1 2 3 4 5 6 7 8 9'))
Congratulations, PyKX is configured properly!
Permissions
Next we will confirm permissions, in another cell copy/paste the below function definitions in a new cell:
def get_kx_connection_string(client, clusterName:str, userName:str, environmentId:str): resp=client.get_kx_user(environmentId=environmentId, userName=userName) userArn = resp.get("userArn") resp=client.get_kx_connection_string(environmentId=environmentId, userArn=userArn, clusterName=clusterName) return resp.get("signedConnectionString", None) def parse_connection_string(conn_str: str): conn_parts = conn_str.split(":") host=conn_parts[2].strip("/") port = int(conn_parts[3]) username=conn_parts[4] password=conn_parts[5] return host, port, username, password def get_pykx_connection(client, clusterName: str, userName: str, environmentId:str): conn_str = get_kx_connection_string(client, environmentId=environmentId, clusterName=clusterName, userName=userName) host, port, username, password = parse_connection_string(conn_str) return kx.SyncQConnection(host=host, port=port, username=username, password=password) return handle
Next, using the AWS boto3 Pyhon libraries, create a client to the FinSpace service
import boto3 # create finspace client session = boto3.Session() client = session.client(service_name='finspace')
Assign your environment ID to the Python variable ENV_ID, you can find your environment ID from the FinSpace Console page, like this:
With ENV_ID defined can connect to the cluster with PyKX. Below we are connecting the cluster created from the FinSpace Foundation Workshop “cluster_welcomedb” and using the finspace user “sagemaker” we created earlier.
ENV_ID="**YOUR ENVIRONMENT ID HERE**" # Connect to cluster with PyKX hdb = get_pykx_connection(client=client, clusterName="cluster_welcomedb", userName="sagemaker", environmentId=ENV_ID ) # list tables on the cluster hdb("tables[]").py()
And the output below
['example']
Congratulations, Permissions are configured properly!
q Magic
PyKX installs a Jupyter q magic command, you can try it out as well, first get the connection string, but this time parse the bits into host, port, username, and password with the function we provided.
conn_str=get_kx_connection_string(client=client, clusterName="cluster_welcomedb", userName="sagemaker", environmentId=ENV_ID) host, port, username, password = parse_connection_string(conn_str)
Then in another cell, use the q magic to pass the cell’s contents to the FinSpace cluster and display the response from the server, like this:
%%q --host $host --port $port --user $username --pass $password
tables[]
Outputs the same information for tables[] as before, but this time as text from the server, this is the same response you would see in the q console.
,`example
Troubleshooting
PyKX license not found when importing pykx
The kc.lic file is copied to the notebook instance as part of the lifecycle script, did you put the S3 path for your copy of the license file into the script?
“Host Not Reachable” when opening connection from notebook to cluster
This is a networking issue, and is likely related to the VPC, subnet, and security group used when creating the notebook and cluster. Confirm where the cluster is running is reachable by where the notebook instance is running.
ValidationException: An error occurred (ValidationException) when calling the GetKxUser operation: 1 validation error detected: Value at 'environmentId' failed to satisfy constraint: Member must satisfy regular expression pattern: ^[a-zA-Z0-9]{1,26}$
Did you put in your FinSpace environment ID for the value “YOUR ENVIRONMENT ID HERE” in the sample code provided?
References
KX: PyKX intallation guide
Configure PyKX
Amazon SageMaker Notebook Instances
Customization of a SageMaker notebook instance using an LCC script
Amazon SageMaker Notebook Instance Lifecycle Config Samples (github)
Sample Installing Python Packages (github)
Sample Policy: kdb-all
Substitute YOUR_ACCOUNT_ID with your AWS account ID.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "finspace:*", "Resource": "arn:aws:finspace:us-east-1:**YOUR_ACCOUNT_ID**:kxEnvironment/*" }, { "Effect": "Allow", "Action": [ "sts:AssumeRole" ], "Resource": "arn:aws:iam::**YOUR_ACCOUNT_ID**:role/kdb-all-user" }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectTagging", "s3:ListBucket" ], "Resource": [ "*" ] }, { "Effect": "Allow", "Action": [ "kms:Decrypt" ], "Resource": [ "*" ] } ] }
Sample Trust Relationship for Role
Substitute YOUR_ACCOUNT_ID with your AWS account ID.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" }, { "Sid": "Statement1", "Effect": "Allow", "Principal": { "Service": "prod.finspacekx.aws.internal", "AWS": "arn:aws:iam::**YOUR_ACCOUNT_ID**:root" }, "Action": "sts:AssumeRole" } ] }
Running PyKX in Unlicensed Mode
What if you don't don't have a kdb license to run PyKX in a licensed mode but still want to used Python as a q client, then you can run PyKX in what is called unlicensed mode, the LifeCycle script example below shows how to configure PyKX to always run in unlicensed mode. See PyKX's Modes of operation for more details.
#!/bin/bash set -e # OVERVIEW # This script installs packages in all SageMaker conda environments, apart from the JupyterSystemEnv # which is a system environment reserved for Jupyter. # NOTE: if the total runtime of this script exceeds 5 minutes, the Notebook Instance will fail to start up. If you # would like to run this script in the background, then replace "sudo" with "nohup sudo -b". This will allow the # Notebook Instance to start up while the installation happens in the background. # set environment variables # ------------------------------------------------ touch /etc/profile.d/jupyter-env.sh # Set SSL_VERIFY_SERVER environment variable echo "export SSL_VERIFY_SERVER=0" >> /etc/profile.d/jupyter-env.sh # Set the PYKX_UNLICENSED environment variable to run always unlicensed echo "export PYKX_UNLICENSED=1" >> /etc/profile.d/jupyter-env.sh # ------------------------------------------------ # TCP KeepAlive settings sudo sysctl -w net.ipv4.tcp_keepalive_time=300 sudo sysctl -w net.ipv4.tcp_keepalive_intvl=60 sudo sysctl -w net.ipv4.tcp_keepalive_probes=25 sudo -u ec2-user -i <<'EOF' # Set SSL_VERIFY_SERVER environment variable echo "export SSL_VERIFY_SERVER=0" >> /home/ec2-user/.bashrc # Set the PYKX_UNLICENSED environment variable to run always unlicensed echo "export PYKX_UNLICENSED=1" >> /home/ec2-user/.bashrc # Note that "base" is special environment name, include it there as well. for env in base /home/ec2-user/anaconda3/envs/*; do conda activate $(basename "$env") if [ $env = 'JupyterSystemEnv' ]; then continue fi pip install pykx pandas numexpr awswrangler conda deactivate done EOF # restart command is dependent on current running Amazon Linux and JupyterLab CURR_VERSION=$(cat /etc/os-release) if [[ $CURR_VERSION == *$"http://aws.amazon.com/amazon-linux-ami/"* ]]; then sudo initctl restart jupyter-server --no-wait else sudo systemctl --no-block restart jupyter-server.service fi
Relevant content
- Accepted Answer
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 3 years ago