- Newest
- Most votes
- Most comments
From what you have described, Amazon SageMaker is your best bet. It is a managed service (meaning you dont have to worry about managing the platform) and has built-in support for feature engineering (data processing) in PySpark through SageMaker processing. You can find an example here. It natively integrates with S3, which allows you to fetch data from S3 at runtime, and it also saves model artefacts to an S3 bucket (you can specify which bucket and prefix). Commonly used frameworks/models like XGBoost and Sklearn are supported and you can run your model training tasks using SageMaker managed training. Finally, to make the code production ready, you can use SageMaker Pipelines, which is the MLOps tool that can take care of moving your code to the production environment.
Relevant content
- Accepted Answerasked a year ago
- asked 5 months ago
- asked 6 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a year ago
Thank you, I will investigate SageMaker Pipelines. Why would you not recommend Amazon EMR, is it overkill for this use case?
If your intent is to train a machine learning model, SageMaker is a better fit as it provides tooling for all steps of the model development lifecycle. You have access to features like SageMaker Model Monitoring (to monitor your model in production), SageMaker Debugger, and a number of model inference options.
I'm starting to dig into SageMaker pipelines and watch tutorial videos. Before I get too far, I want to ask... is it possible to create a highly custom modeling pipeline? For example, I have my own custom cross validation class, where hundreds of models are trained and tested, and my own functions which stops training when very niche custom criteria are hit. I'd like to save this class as a pickled object which is then picked up in production from S3 and fed the same set of features.
Is SageMaker Pipelines still the best route for me? Any suggestions would be appreciated, including if I should rethink how I'm doing things.