how to read delta lake format data for SageMaker training

0

I want to read data from Databricks output and format the data for SageMaker training

1 Antwort
1

You have a couple of options:

  1. Amazon SageMaker DataWrangler. You can use Databricks as a data source in SageMaker Data Wrangler. This allows you to interactively query the data stored in Databricks using SQL, and preview data before importing it. Once the data is imported you can cleanse, engineer features, and prepare it for training. Please refer to the blog below for more information: https://aws.amazon.com/blogs/machine-learning/prepare-data-from-databricks-for-machine-learning-using-amazon-sagemaker-data-wrangler/
  2. AWS Glue: Assuming the Delta Lake table are stored in S3 you can build a Glue job to read, transform , and prepare the data for Sagemaker training. More info can be found here: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html
AWS
NZ
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen