Need inputs on Glue to move data from Aurora MySQL to Redshift

0

We are planning for a POC with AWS Glue between two different environments. In our case, Aurora MySQL (Read Instance) is Source and Redshift is Target.

Note: This involves only extract and load operation, it doesn't involve any transformation.

We have following clarification/details needed:

  1. How to create source connection for MySQL read replica?
  2. What are the configuration/services(other than AWS Glue) required in AWS, for reading from MySQL read replica?
  3. Our read replica instance is already used by one of the microservice. Will it have any impact if same instance is used by AWS Glue?
  4. Will AWS Glue expects to create any tables or log files in Source ?
  5. Any configuration changes needed on Source for Glue to Read MySQL and Load to Redshift?
  6. Is it possible to connect source read replica using JDBC URL?
  7. Does AWS Glue detect incremental changes from Source (MySQL)?
  8. Does AWS Glue lock the table in Source MySQL Aurora while reading incremental changes?

Thanks in advance

1 Answer
0

Hello,

If you are not performing any ETL actions then you can alternatively use the below to move your data

  • AWS DMS with source as Aurora and target as Redshift
  • AWS DataPipeline RDS to Redshift table copy template shown here

1. How to create source connection for MySQL read replica?

AWS Glue console -> connections -> Add connection -> Choose RDS -> Choose Amazon aurora -> Click next -> Under "Instance" you can choose your read replica instance

2. What are the configuration/services(other than AWS Glue) required in AWS, for reading from MySQL read replica?

If you are using AWS Glue to read from the Aurora replica. All you need is a Glue connection. Please ensure your networking is setup as shown here or here

3. Our read replica instance is already used by one of the microservice. Will it have any impact if same instance is used by AWS Glue?

This entirely depends on your Glue job which is reading from the read replica. If the job pulls lot of data which can increase the traffic, then your database instance must be properly sized. Please refer here and here for more info on Aurora mysql best practises

4. Will AWS Glue expects to create any tables or log files in Source ?

When reading data from your Aurora database using AWS Glue, it simply uses the JDBC utility to connect and read the data. This should not create any log files or tables in your source. You can refer this doc for more info on the logging options available in AWS Glue ETL jobs

5. Any configuration changes needed on Source for Glue to Read MySQL and Load to Redshift?

There is no additional configuration required at your source

6. Is it possible to connect source read replica using JDBC URL?

Yes.

AWS Glue console -> connections -> Add connection -> Choose JDBC as connection type -> Click next and provide your jdbc url in the below format

jdbc:mysql://<YOUR READ REPLICA ENDPOINT>:3306/<DBNAME>

7. Does AWS Glue detect incremental changes from Source (MySQL)?

You can enable "Job bookmarks" on your AWS Glue ETL job. This feature has the ability to track the data which was already processed by the job. Please refer to this doc for more info

8. Does AWS Glue lock the table in Source MySQL Aurora while reading incremental changes?

AWS Glue uses the JDBC driver to connect and issues select queries on the table to read the data. So, it will hold a read lock on the table.

AWS
SUPPORT ENGINEER
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions