By using AWS re:Post, you agree to the Terms of Use

How to use categorical data without converting?

0

I am trying to develop a model in SageMaker, but my data is contained categorical type of data, and I would not want to convert it. In the old Machine Learning, I can use back the same data and train the model without any issue. However, when I tried the built-in algorithm, I got the error message that wanted me to convert the data. Is there anyway to do the same as the old Machine Learning without converting? Thank you.

4 Answers
0

Have you tried SageMaker Autopilot? https://aws.amazon.com/sagemaker/autopilot/

EXPERT
answered 10 months ago
0

What kind of ML problem are you working with? If it's regression or classification, you could try SageMaker Autopilot to avoid performing feature engineering steps yourself. If you're using Jupyter notebooks on Sagemaker Studio to do your model training, adding a preprocessing step to perform categorical encoding isn't too difficult. We have plenty of examples on Github.

Let us know in more detail what kind of algorithm are you dealing with and how you're doing ML on AWS today and I can give you some pointers.

answered 10 months ago
0

In addition to SageMaker Autopilot other folks mentioned, I'd suggest you to try SageMaker Canvas:

SageMaker Canvas offers a similar yet more powerful experience when compared to the previous-generation Amazon Machine Learning service.

answered 10 months ago
0

There are a couple of approaches that might be suitable, feel free to pick the one that fits your use case best.

  • SageMaker Canvas supports the ability to do machine learning on datasets in a no-code fashion, and performs this pre-processing for you. As described in that page, Canvas is suited for business analysts who may not have any coding experience.

  • SageMaker AutoPilot allows for AutoML where AutoPilot takes care of feature engineering and hyperparameter optimization. Autopilot is suited for developers who want to perform AutoML without doing the pre-processing themselves.

*You can preprocess the input prior to using prebuilt SageMaker algorithms, a basic example of that is shown here in the section "Data_Transformation". This approach is suited for data scientists who want fined-grained control over the transformation steps.

In your context, it seems to me that Canvas or AutoPilot might be most appropriate.

EXPERT
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions