Data sources, integrations and exporting


Hello there,

so I've been using SageMaker Canvas for quite a few machine learning models now (lots of them are simple two-category model types, some of them are a time forecasting type of models). Because I am farely new to machine learning technologies and found Sagemaker Canvas few months ago I am not able to implement some serious machine learning algorithms through a SageMaker Studio. Canvas seems convenient and very simple for use but I am looking for a way of implementing a solution for my use case. I have an use case of having a automated machine learning model that could train on a latest dataset from a existing MySQL database.

One idea for example is of automatic importing CSV files from a existing S3 bucket which would hold the latest datasets of data in a CSV file that some API would on specific time period export the data from a existing database in CSV format and upload to the S3 so that the Sagemaker Canvas could retrieve those files automatic and the training of a models would be automated as well. But I could not find any of the existing implementions of a similar solutions as well I think in this version, Canvas does not support any of those functionalities.

It would be also a convenient if I could target SageMaker Canvas through a AWS SDK Javascript V3 client for importing datasets & exporting predictions, but I could not find any of the documentation as well. I have found a re:Post question regarding to SageMaker Canvas integrations possibilities, but the answer provided regarding to the SageMaker Autopilot API seems useless for my usecase. Are there any possibilities of targeting Sagemaker via API? As well are there any plans of provided several new data sources for importing the data sets (the ones QuickSight support - especially MySQL data source could be much appreciated)?

I appreciate any information in advance, best regards

asked 5 months ago31 views
1 Answer

Hello @MK, thanks for re:posting this question, and glad to see that you're starting your ML journey with SageMaker Canvas! :D

As you have had the possibility to test and discover so far, Canvas is meant to be a UI-driven ML workflow enabler for business analysts and developers, with little to no knowledge about ML. Therefore, at this stage, we do not provide an API or an SDK to connect and use Canvas capabilities. Sorry about that! This situation might change in the future, so stay tuned.

Since that's the situation, I suggest you keep using Canvas for your "Exploratory data analysis" and your "Model experimentation" via Preview Models and Quick builds. I can tell you how you could automate some of these capabilities externally from SageMaker Canvas, with two capabilities:

  1. Create and automate your ETL workflow with AWS Glue DataBrew: AWS Glue DataBrew provides a UI interface for tabular data analysis and transformation, plus it supports a plethora of sources including JDBC connector to MySQL servers! Once you have defined your transformation (maybe even quality rules) you can create a Recipe job that can be scheduled to your preference, so that you have your ETL process automated. Learn how to automate/schedule Glue Databrew in this blog post.

  2. Train (and deploy) your model with SageMaker AutoPilot: SageMaker AutoPilot uses the same AutoML technology as SageMaker Canvas. This means that all you need to do is just to point to a specific S3 bucket where your CSV file is stored, and then call an API to get a model trained! You can set-up a Lambda function / StepFunction workload that triggers on a new file generated in the S3 bucket output of the recipe job above, which triggers an AutoPilot job. The output will be available either in SageMaker Studio or you can also describe the AutoML job to get the best candidate and deploy it using the information in InferenceContainers and calling, in order:

  3. create_model()

  4. create_endpoint_config() - a suggestion: use Serverless endpoint to save money!

  5. create_endpoint()

Hope this helps!

answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions