Forwarded connection on Writer terminated; try restarting transaction

0

Aws Aurora reader instance with write forwarding randomly throws an exception with message "Forwarded connection on Writer terminated; try restarting transaction" What does it mean, and how I should handle it?

gefragt vor 7 Monaten321 Aufrufe
2 Antworten
0

I have verified documentation and error “Forwarded connection on Writer terminated; try restarting transaction” occurs when "Session on master/primary gets timed out or terminated (e.g. killed)”

[+] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database-write-forwarding.html#aurora-global-database-write-forwarding-isolation

Please check your application transactions workload/concurrency (or) try changing the values of the aurora_fbd_master_idle_timeout (or) aurora_fwd_writer_idle_timeout parameter to make it a little larger, for example: from 60 to 600.

[+] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database-write-forwarding.html#aurora-global-database-write-forwarding-params

Please test these parameters in the Cluster Parameter Group of your testing environment before modifying it on production. Since it is a dynamic parameter, there is no need to restart the DB instance.

I was able to reproduce the issue in my lab environment well.

++++++++++
On Replica:
++++++++++

mysql> use testnew;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed

mysql> set aurora_replica_read_consistency = 'eventual';
Query OK, 0 rows affected (0.00 sec)

mysql> begin; insert into t1 values(1); select sleep(20);

====> Now, on master, kill the corresponding forwarded session. 

Query OK, 0 rows affected (0.01 sec)
Query OK, 1 row affected (0.02 sec)
+-----------+
| sleep(20) |
+-----------+
|         0 |
+-----------+
1 row in set (20.00 sec)

mysql> insert into t1 values(1);
ERROR 63982 (HY000): Forwarded connection on Writer terminated; try restarting transaction
mysql> 

++++++++++
On Master:
++++++++++

mysql> show full processlist;
+------+-----------------+--------------------+------+---------+-------+------------------------+-----------------------+
| Id   | User            | Host               | db   | Command | Time  | State                  | Info                  |
+------+-----------------+--------------------+------+---------+-------+------------------------+-----------------------+
|    5 | event_scheduler | localhost          | NULL | Daemon  | 15908 | Waiting on empty queue | NULL                  |
|   11 | rdsadmin        | localhost          | NULL | Sleep   |     0 |                        | NULL                  |
|   12 | rdsadmin        | localhost          | NULL | Sleep   |     0 |                        | NULL                  |
|   19 | rdsadmin        | localhost          | NULL | Sleep   |     0 |                        | NULL                  |
|   24 | rdsadmin        | localhost          | NULL | Sleep   |   182 |                        | NULL                  |
| 1387 | rdsadmin        | localhost          | NULL | Sleep   |     6 |                        | NULL                  |
| 1399 | admin           | 172.oo.oo.45:3oo36 | NULL | Query   |     0 | init                   | show full processlist |
+------+-----------------+--------------------+------+---------+-------+------------------------+-----------------------+
7 rows in set (0.00 sec)

mysql> show full processlist;
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
| Id   | User            | Host                | db      | Command | Time  | State                  | Info                  |
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
|    5 | event_scheduler | localhost           | NULL    | Daemon  | 15923 | Waiting on empty queue | NULL                  |
|   11 | rdsadmin        | localhost           | NULL    | Sleep   |     1 |                        | NULL                  |
|   12 | rdsadmin        | localhost           | NULL    | Sleep   |     0 |                        | NULL                  |
|   19 | rdsadmin        | localhost           | NULL    | Sleep   |     5 |                        | NULL                  |
|   24 | rdsadmin        | localhost           | NULL    | Sleep   |   197 |                        | NULL                  |
| 1387 | rdsadmin        | localhost           | NULL    | Sleep   |    12 |                        | NULL                  |
| 1399 | admin           | 172.oo.oo.45:3oo36  | NULL    | Query   |     0 | init                   | show full processlist |
| 1407 | admin           | 172.oo.oo.183:5oo70 | testnew | Sleep   |     4 | NULL                   | NULL                  |
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
8 rows in set (0.00 sec)

mysql> show full processlist;
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
| Id   | User            | Host                | db      | Command | Time  | State                  | Info                  |
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
|    5 | event_scheduler | localhost           | NULL    | Daemon  | 15927 | Waiting on empty queue | NULL                  |
|   11 | rdsadmin        | localhost           | NULL    | Sleep   |     1 |                        | NULL                  |
|   12 | rdsadmin        | localhost           | NULL    | Sleep   |     0 |                        | NULL                  |
|   19 | rdsadmin        | localhost           | NULL    | Sleep   |     9 |                        | NULL                  |
|   24 | rdsadmin        | localhost           | NULL    | Sleep   |   201 |                        | NULL                  |
| 1387 | rdsadmin        | localhost           | NULL    | Sleep   |     1 |                        | NULL                  |
| 1399 | admin           | 172.oo.oo.45:3oo36  | NULL    | Query   |     0 | init                   | show full processlist |
| 1407 | admin           | 172.oo.oo.183:5oo70 | testnew | Sleep   |     8 | NULL                   | NULL                  |
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
8 rows in set (0.00 sec)

mysql> show full processlist;
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
| Id   | User            | Host                | db      | Command | Time  | State                  | Info                  |
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
|    5 | event_scheduler | localhost           | NULL    | Daemon  | 15929 | Waiting on empty queue | NULL                  |
|   11 | rdsadmin        | localhost           | NULL    | Sleep   |     1 |                        | NULL                  |
|   12 | rdsadmin        | localhost           | NULL    | Sleep   |     0 |                        | NULL                  |
|   19 | rdsadmin        | localhost           | NULL    | Sleep   |     1 |                        | NULL                  |
|   24 | rdsadmin        | localhost           | NULL    | Sleep   |   203 |                        | NULL                  |
| 1387 | rdsadmin        | localhost           | NULL    | Sleep   |     3 |                        | NULL                  |
| 1399 | admin           | 172.oo.oo.45:3oo36  | NULL    | Query   |     0 | init                   | show full processlist |
| 1407 | admin           | 172.oo.oo.183:5oo70 | testnew | Sleep   |    10 | NULL                   | NULL                  |
+------+-----------------+---------------------+---------+---------+-------+------------------------+-----------------------+
8 rows in set (0.00 sec)

mysql> CALL mysql.rds_kill(1407);
Query OK, 0 rows affected (0.00 sec)

mysql> 
AWS
SUPPORT-TECHNIKER
beantwortet vor 7 Monaten
0

My team experienced this error recently and I hope that this may help others facing the same issue. We have a web app deployed in 2 AWS regions, each connected to its own DB instance of Aurora Global database with the first being the primary read/write database and the second being the secondary read-only database with write forwarding enabled to the primary instance.

In the web application, SQL connection pooling is enabled. This is common for web apps written in managed or scripted languages such as C#, Java, or Python when connecting to a SQL backend.

We observed that write-forwarded SQL connections from the secondary web app to the secondary database instance were failing often with the error message "Forwarded connection on Writer terminated; try restarting transaction". After some troubleshooting and investigation, we found that this was related to the SQL connection pool recycle/timeout value in the web app and the aurora_fwd_writer_idle_timeout value referenced above.

When the SQL connection pool recycle/timeout value is greater than the aurora_fwd_writer_idle_timeout value, then the SQL connection from the secondary web app to the secondary database instance may remain open even though its associated write-forwarded connection from the secondary database instance to the primary database instance has been closed due to the forwarded writer idle timeout. The problem will look like this:

  1. A client makes an API call to the secondary web server
  2. The web server creates a new SQL connection to the secondary database instance with write-forwarding enabled
  3. The secondary database instance creates an associated SQL connection to the primary database instance to forward SQL writes
  4. After the web server is done using the SQL connection to handle this request, it's returned to the SQL connection pool for re-use
  5. Some time greater than 'aurora_fwd_writer_idle_timeout' passes, and the primary database instance closes the write-forwarded connection from the secondary database instance
  6. The secondary web server receives a new API call, and uses a connection from the SQL connection pool to connect to the secondary database instance
  7. At this point, the SQL statement fails because the associated write-forwarded SQL connection was already closed on the secondary database instance

To solve this issue, you simply need to configure the aurora_fwd_writer_idle_timeout value on the primary database instance to be greater than your SQL connection pool recycle/timeout value. After we updated our configuration in this manner the issue stopped occurring and everything worked as expected.

beantwortet vor 8 Tagen

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen