AWS JDBC Driver for Aurora MySQL, internal Reader instance selection

0

Hi, interested in learning more about JDBC Aurora MySQL driver as documented here: https://aws.amazon.com/blogs/database/improve-application-availability-with-the-aws-jdbc-driver-for-amazon-aurora-mysql/ https://github.com/awslabs/aws-mysql-jdbc#readme

Had a couple quick qs to confirm (which didn't see details anywhere):

  1. Is there similar support/comparable drivers for other programming languages, other than Java? E.g. C++, etc.
  2. For Read requests, how does it use/weighs factors to pick a vaible read replica to route to, e.g. any use of:
  • number of connections
  • internal latencies or replication lag measurements
  • current health/connection-state of instances
  • any others?
  1. When calling the driver's getConnection(), can we pass in a list of potential Reader instance hosts rather than a single one (at a glance it looks like in github's java code, the getConnection(String url...) ends up internally calling getConnectionUrlInstance(url..) which looks like it indeed supports multiple hosts?).
  2. I also noticed this 'aws advanced jdbc wrapper' in awslab's official depots, is this related: https://github.com/awslabs/aws-advanced-jdbc-wrapper Thanks!
asked a year ago1027 views
1 Answer
0

Hello There,

I understand that you are interested in learning more about JDBC Aurora MySQL driver and looks like you have few questions regarding the same.

Let me shed some light on the questions/concerns one-by-one below:

  1. Is there similar support/comparable drivers for other programming languages, other than Java? E.g. C++, etc.
  • From the mentioned documentation by you, we already got to know that the AWS Database Drivers are designed to reduce application downtime and avoid DNS propagation issues. However, while going through the below article[1] AWS database drivers for MySQL and PostgreSQL uses only the open source JDBC Driver for MySQL and PostgreSQL respectively which is built for java applications.

Hence, this would mean that currently there is no similar/comparable drivers for other programming languages. However, in the same below article it is mentioned that If your clients don’t use the JDBC interface, you can still use the driver’s failover handling logic as inspiration for building similar functionality into your application.

  • The replication status tables for learning the cluster topology.
  • The innodb_read_only variable for determining instance role (writer or reader).
  • Health check queries with a timeout that’s short enough to detect issues quickly, but not short enough to cause false negative outcomes.

You can check out the below documentation/article for more information: [1] : https://aws.amazon.com/blogs/database/improve-application-availability-on-amazon-aurora/

  1. For Read requests, how does it use/weighs factors to pick a viable read replica to route to, e.g. any use of:
  • Here, I would like to inform you that by using the reader endpoint of the cluster. This roles out queries to each reader instance in a round robin fashion, it does not balance by resource intensity.

For load balancing of your read traffic, it is recommended to leverage some kind of client side load balancing solutions. I have linked some reference implementation for the same below-

[+] : How can I distribute read requests across multiple Amazon RDS read replicas? - https://repost.aws/knowledge-center/requests-rds-read-replicas

  1. When calling the driver's getConnection(), can we pass in a list of potential Reader instance hosts rather than a single one.
  • When you use JDBC driver for creating a connection via the getConnection() in your code you might also be aware that JDBC uses a connection string in the following format.

JDBC format - jdbc:driver://hostname:port/dbName?user=userName&password=password

And from the above format we can see that we will only be able to pass a single DB Name which is associated with its respective user name and password and might run into errors if it is given with multiple DB names.

[+] : More about JDBC driver in Amazon RDS - https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/java-rds.html#java-rds-drivers

  1. I also noticed this 'aws advanced jdbc wrapper' in awslab's official depots - https://github.com/awslabs/aws-advanced-jdbc-wrapper
  • Here, I was able to go through the above Github repository and I could see that it is actually from official verified AWS labs and it is quite related to Amazon Web Services JDBC Driver.

  • As mentioned in the repo itself, The Amazon Web Services JDBC Driver has been redesigned as an advanced JDBC wrapper. And This wrapper is complementary to and extends the functionality of an existing JDBC driver to help an application take advantage of the features of clustered databases such as Amazon Aurora.

You can refer the same repository for more information. However note that MySQL support is experimental at the moment, the aws-advanced-jdbc-wrapper is compatible with MySQL 5.7 and MySQL 8.0 as per the Community MySQL Connector/J 8.0 Driver.

I hope that the above information was quite helpful. Have a great day ahead!

AWS
answered a year ago
  • Thanks for the reply and info! This is very useful so far.

    A quick follow up quick confirmation, so when calling the standard C++ sockets' getaddrinfo("exampleUrl-myAuroraReaderEndpointEtc.region.rds.amazonaws.com".. ) I notice that sometimes (typically after a few seconds) I may get a different IP address returned. (... Similarly if I ping that endpoint URL over several seconds, every now and then the IP address seen changes).

    Is this IP address, returned by getaddrinfo() the IP address to actual Reader instance/node itself (e.g. reader1, 2, 3..), or this IP address from getaddrinfo() an IP address of the clusters single Reader endpoint (and somehow the DNS is internally changing to different IPs for the cluster Reader endpoint)?

    Offhand it does appear the number of these IP addresses that this can change to, does lines up with the number of direct individual Reader instance/hosts I've configured for the Aurora cluster. Thanks

  • Hello There,

    I am glad to hear that the above information was helpful for you. And I can see that you have a quick follow up question as well.

    Coming to your concern, in which you are calling the standard C++ socket via the function “getaddrinfo("exampleUrl-myAuroraReaderEndpointEtc.region.rds.amazonaws.com".. )” you have noticed that it is returned with different IP address typically after few seconds.

    Here, as correctly stated by you the IP address which is returned by the above function is indeed the IP address of individual read replicas present in the RDS/Aurora cluster. Moreover, to confirm the same you can run the following command against your individual reader endpoints which would provide you the IP address of your corresponding read replica.

    $ dig <RDS-reader-endpoint>

    Once you have the IP address for your individual Read Replica(s) you can cross-check the same from the output returned from the above function getaddrinfo("exampleUrl-myAuroraReaderEndpointEtc.region.rds.amazonaws.com".. ) if the IP address indeed belong to one of your Aurora read replica.

    That being said, as mentioned previously the Aurora cluster roles out queries/requests to each reader instance in a round robin fashion, it does not balance by resource intensity which explains the behaviour you are observing where different IP address is being fetched after every few seconds.

    I hope the above information was resourceful. Have a great day ahead!

  • A couple follow up questions. I saw this doc also: https://docs.aws.amazon.com/whitepapers/latest/amazon-aurora-mysql-db-admin-handbook/using-smart-drivers.html In the JDBC driver's source, I see it makes calls to the 'information_schema.replica_host_status' table which Aurora saves info to, which I see described in that doc.

    • Question 5: Is there any data on how up-to-date the info in this table is, if we query the table, compared to slower DNS propagation. Is this the best way to get the most up to date info on the cluster's instances (sounds like it may be)
    • Question 6: Given this information_schema.replica_host_status is Aurora specific, if we also based our similar customized smart driver on querying this table, is there assurance the format/existence of this table or its current contents won't be removed/renamed, going forward? (additions are ok of course)
  • Question 7: Inside the JDBC open source driver code, I see that it uses the information_schema.replica_host_status table to get the cluster endpoints' host name prefixes, e.g. the SQL query SELECT * from information_schema.replica_host_status may return SQL result response like:

    0: SERVER_ID(0)='myClusterIdEtc-0' SESSION_ID(1)='MASTER_SESSION_ID'  ...
    
    1: SERVER_ID(0)='myClusterIdEtc-1' SESSION_ID(1)='abc1..' ..
    
    2: SERVER_ID(0)='myClusterIdEtc-2' SESSION_ID(1)='abc2..' .. 
    
    3: SERVER_ID(0)='myClusterIdEtc-3' SESSION_ID(1)='abc3..' ..
    

    (NOTE: no .com suffix) Based on the driver's impln, readme (see https://github.com/awslabs/aws-mysql-jdbc/blob/main/README.md) & above docs, just to confirm (when making our own similar driver): Q: we can tell the "myClusterIdEtc-0" above is the writer instance, by the MASTER_SESSION_ID, correct?

    Q: The reader/replica instance DNS addresses would have the SAME suffix as the cluster endpoint, starting from the 'XYZ' correct, i.e.: like given String CONNECTION_STRING = "jdbc:mysql:aws://db-identifier.cluster-XYZ.us-east-2.rds.amazonaws.com:3306/employees"; then:

    Writer: myClusterIdEtc-0.XYZ.us-east-2.rds.amazonaws.com

    Reader1: myClusterIdEtc-1.XYZ.us-east-2.rds.amazonaws.com

    Reader2: myClusterIdEtc-1.XYZ.us-east-2.rds.amazonaws.com

    Reader3: myClusterIdEtc-1.XYZ.us-east-2.rds.amazonaws.com Correct? (q: By the way is there an official term for the 'XYZ' substring within these DNS names generated by AWS?) Thanks

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions