Sagemaker and Data on Databases

0

A customer has a question about data sources

“most of our data is stored in SQL databases, while the SageMaker docs say that I have to put it all in S3. It’s not obvious what the best way to do this is. I can think for example of splitting my analysis code in two; one pre-processing step to go from SQL queries to tabular data, and e.g. store that as Parquet files. For high-dimensional tensor data it’s even less obvious.”

Can someone comment on that?

3개 답변
0
수락된 답변

We have an example notebook for interacting from Redshift data from a SageMaker managed notebook, which I believe is suitable for an Exploratory Data Analysis (EDA) use-case: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/working_with_redshift_data/working_with_redshift_data.ipynb

For production purposes, the customer should consider separating the job of first extracting data from relational databases to S3 (to build out a data lake), and then using that for downstream processing/machine learning (including SageMaker, EMR, Athena, Spectrum, etc.). Customers can build extraction pipelines from popular relational databases using AWS Glue, EMR, or their preferred ETL engines like those on the AWS Marketplace.

AWS
답변함 6년 전
profile picture
전문가
검토됨 4달 전
0

I'd recommend using SageMaker Data Wrangler to connects the dots of different SageMaker services. https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-import.html

AWS
답변함 2년 전
0

for custom models, you could use Import data into Canvas from database dircetly :Connect to data in Amazon S3, Amazon Athena, or Amazon RDS For Amazon RDS, if you have the AmazonSageMakerCanvasFullAccess policy attached to your user’s role, then you’ll be able to import data from your Amazon RDS databases into Canvas.

https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-importing-data.html

AWS
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠