By using AWS re:Post, you agree to the Terms of Use
/How to import postgresql or nosql datasets in Amazon Sagemaker?/

How to import postgresql or nosql datasets in Amazon Sagemaker?

0

Hi, I am new to Amazon Sagemaker. I noticed that available options to import datasets are through local files, S3, Redshift, Snowflake and Athena only. Is there a way to import PostgreSQL or MySQL or any other NoSQL datasets?

3 Answers
0

Hi! SageMaker is based on JupyterLab and you are completely free to leverage and install the appropriate libraries. For install, to load data from an SQLite database, you could write a code similar to this one:

import sqlite3
import pandas as pd

db_client = sqlite3.connect("path_to_database.db")
query = "SELECT * FROM table WHERE 1;"
df = pd.read_sql_query(query, db_client)
print(df.shape)
df.head()

Note that in this case, SQLite is already included in the standard Python library. You will have a similar approach to connect to MySQL, PostgreSQL, DynamoDB (using the boto3 API), etc. But may need to install additional libraries in your JupyterLab environment before you can do so (you can use !pip install lib from within a JupyterLab cell to install a library in the current Kernel).

If you want to leverage SageMaker Data Wrangler you may have to go through this step to then, upload your data to one of the natively support data source that you mentioned (Amazon S3, Redshift...).

Hope this helps!

answered 6 months ago
0

Depending on the Sagemaker Service you are using there could be different options.

If you are looking at one of the low code options, and exporting the data to the Data Lake (Amazon S3) is not an option, you could consider Athena Federated Query.

This will allow you to import the other sources through the Athena connection. Please review the considerations and limitation on Athena Federated Queries in the link above.

For more information on Athena Federated Queries you can also read this blog post.

EXPERT
answered 6 months ago
0

Another option would be to not import the data and handle it via Aurora PG integration with Sagemaker. Here is a blog that may help.

EXPERT
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions