I want to set up an AWS Glue job that allows me to move data between two JDBC data stores. But, the data stores exist in different AWS accounts.
To allow data movement between AWS Glue data stores in different AWS accounts, you must set up a cross-account AWS Glue connection.
To set up a cross-account AWS Glue connection, use one of the following methods:
Using VPC peering
If you use an Amazon Relational Database Service (Amazon RDS) DB instance in a private setup, then use VPC peering to set up the cross-account connection.
Note: To require specifying the hostname of the cross-account JDBC database in the JDBC URL, turn on DNS Resolution for the VPC Peering Connection. If you don't turn on this option, then the AWS Glue connection fails because it can't resolve the provided hostname. Passing only the private IP address doesn't cause the connection to fail. For more information, see Turn on DNS resolution for a VPC peering connection.
1. Create VPC peering between account A and account B.
- Prepare the VPCs - Both VPCs must belong to the same AWS account or to different accounts within AWS Organizations. The VPCs must be in the same AWS Region and have unique IP address spaces.
- Request a VPC peering connection - Open the VPC dashboard from the VPC in account B that you want to peer with the VPC in account A. Choose Peering Connections, and then choose Create VPC Peering Connection. Choose account A's VPC, and then configure the VPC peering connection.
- Accept the peering request - The owner of the Account A receives a peering request email notification. To accept the request, the owner of account A must log in to the account, and navigate to the VPC dashboard.
- Add a route to the peer VPC - After establishing the peering connection, add a route to the peer VPC in the route tables for the subnets in your VPC. This route specifies the IP address range of the peer VPC.
- Test the connection - To test the connection, launch an instance in each VPC. Verify that the instances can communicate with each other using their private IP addresses.
Note: To secure the VPC peering connection, use network access control lists (ACLs) or security groups to restrict traffic between VPCs. Also, the VPC peering connection doesn't allow instances in different VPCs to use public IP addresses to communicate with each other.
2. Create your AWS Glue connection. In AWS Glue Studio, choose Create Connection. Add all of the required Connection properties and Connection access details, and then choose Create connection.
3. In the Amazon RDS security group, add a rule to the inbound rule that allows the IPV4 CIDR of your AWS Glue subnet.
Using NAT gateway
Use this method to connect to and read from a JDBC resource that's publicly accessible and has a public IPV4 address attached to it. To do this, create a JDBC connection with the AWS Glue private subnet that has NAT gateway in Account A.
Note: For VPC traffic to reach the data source, the NAT gateway must have the capability to route traffic to the internet gateway.
1. Create your AWS Glue connection. In AWS Glue Studio, choose Create Connection. Add all the required Connection properties and Connection access details, and then choose Create connection.
2. In the Ingress rules of the Database security group, allow the Elastic IP address of account A's NAT gateway in the database's corresponding port. For Amazon RDS, use 3306. For Amazon Redshift use 5439, and for Amazon RDS for PostgreSQL use 5432.
3. After this setup is complete for Account A and Account B, test the connection. If the connection is successful, then run your extract, transform, load (ETL) jobs in AWS Glue.
Check if you can access the JDBC data source
Check if you can access the JDBC data source from the AWS Glue connection's subnet. Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance with SSH access to the same subnet and security groups that you use in your connection. Then, connect to the instance using SSH, and run the following commands to test connectivity.
$ dig hostname
$ nc -zv hostname port
Connection types and options for ETL in AWS Glue
Connecting to a JDBC data store in a VPC