Splitting data into test and train


Please guide me with a tutorial for splitting the data into train and test data. AWS is using Sagemaker for machine learning for the new customers.I am a beginner in machine learning.

Please throw some light

2 Answers
answered a month ago

there are quite a few resources you can start with, one of the resource I recommend is the Sagemaker notebook's build-in example, you can access it via https://[yournotebook].notebook.[yourregion].sagemaker.aws/tree#examples or the forth tab of the notebook. you can also view some good examples here - https://github.com/aws/amazon-sagemaker-examples or https://sagemaker-examples.readthedocs.io/en/latest/ generally speaking, you need to use the standard ML library to split the training and test dataset such as: sklearn.model_selection's train_test_split such as https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/scikit_learn_randomforest/Sklearn_on_SageMaker_end2end.html#Prepare-data. most of the examples needs skills on python pandas to clean and prepare your training data

another option I can think of is using AWS Data Wrangler such as from https://aws.amazon.com/blogs/machine-learning/create-train-test-and-validation-splits-on-your-data-for-machine-learning-with-amazon-sagemaker-data-wrangler/

hope that works.

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions