ETL using Aws Data Glue

0

I want to extract data from my PostgreSQL on RDS using Aws data glue, transform the data and export the data to s3 bucket. how do i do that. i need an AWS tutorial on this.

2 Answers
0

tutorial on how to extract data from PostgreSQL on RDS using AWS Data Glue, transform it, and export it to an S3 bucket. Here are the 50 detailed steps by chatGPT:

Log in to your AWS Management Console. Navigate to the AWS Glue service. Click on the "Crawlers" tab in the left sidebar. Click on the "Add crawler" button. Give your crawler a name. Select the "Data stores" option as your data source. Choose "JDBC" as your connection type. Enter the connection details for your PostgreSQL RDS instance. Click "Next." Select "Choose an existing IAM role" and select the role that has permissions to access your RDS instance. Click "Next." Choose "Specified path in my account" as the crawler output. Enter a path in your S3 bucket where you want the crawler output to be stored. Click "Next." Select "Add database" and enter a name for your database. Click "Create an IAM role" and enter a name for your role. Click "Next." Select "No" for the option to add another data store. Click "Next." Select "Run on demand" as the frequency for your crawler. Click "Next." Review the details of your crawler and click "Finish." Wait for the crawler to finish running. Navigate to the "Jobs" tab in the left sidebar. Click "Add job." Give your job a name. Select the IAM role that has permissions to access your S3 bucket. Choose "Spark" as your ETL language. Select the output S3 bucket and folder where you want the transformed data to be stored. Click "Next." Choose "Use for transform" as your data source. Select the database and table that you want to extract data from. Click "Next." Choose "Change schema" as your transform type. Use the schema editor to modify the schema of your data as needed. Click "Next." Select the "Glue generated script" option for your script options. Click "Next." Review your job settings and click "Finish." Wait for your job to finish running. Navigate to the S3 bucket where your transformed data was stored. Verify that the data is in the correct format. Navigate back to the AWS Glue service. Click on the "Triggers" tab in the left sidebar. Click "Add trigger." Choose "On demand" as your trigger type. Give your trigger a name. Choose your job as the action. Click "Create trigger." Run your trigger to extract, transform, and export your data to the S3 bucket. I hope this tutorial helps you extract data from PostgreSQL on RDS using AWS Data Glue, transform it, and export it to an S3 bucket. If you have any questions or need further assistance, please let chatGPT to know

For a more detailed walkthrough of these steps, including screenshots and code samples, you can follow this AWS tutorial: https://docs.aws.amazon.com/glue/latest/dg/getting-started.html

ZV
answered a year ago
0

Hello, it is not difficult, but need some steps with AWS Glue.

  1. Create a Glue Crawler to crawl the RDS
  2. Create a ETL job to transform and write the transformed data to S3
  3. Need to setup a RDS connection
  4. Need to setup a IAM role for the Crawler, and ETL job so it can write data to S3

I have a tutorial here but it is written in CDK HERE and GitHub.

hai
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions