URGENT: UUID is not being set for connected slave; replication fails

0

This is causing two of my slaves to have a blank UUID field and as such they are considered a duplicate. You can see one of the blanks in the 'show slave hosts' output here.

mysql> show slave hosts;
+------------+------+------+-----------+--------------------------------------+
| Server_id  | Host | Port | Master_id | Slave_UUID                           |
+------------+------+------+-----------+--------------------------------------+
|  651168699 |      | 3306 | 706684217 | 85bddaad-2f97-11e7-9b3c-02b50d934997 |
| 1110976164 |      | 3306 | 706684217 | 172fc2bb-7322-11e7-bc2a-067f0dd1e340 |
|  491358223 |      | 3306 | 706684217 |                                      |
+------------+------+------+-----------+--------------------------------------+
3 rows in set (0.03 sec)

There is a 4th slave that is getting an error b/c when it connects it also gets a blank UUID and thus is considered a duplicate host.

Here are the list of server_UUID/server_ids from my group.

name    server_uuid                              server_id
master  9cd972d1-2f78-11e7-8f84-069be90e42fb     706684217
shard1: dc911309-2f96-11e7-b0d6-06be9eee759d    491358223
shard2: 85bddaad-2f97-11e7-9b3c-02b50d934997    651168699
shard3: 87f7b13a-52d8-11e7-b8fe-065087710422     1076311080
shard4: 172fc2bb-7322-11e7-bc2a-067f0dd1e340      1110976164

shard1 and shard3 connect but always get a blank UUID in the master's show slave hosts command. Shards2 and shard4 behave normally.

The slaves are connected as remote hosts using the (CALL mysql.rds_set_external_master) but all databases are hosted RDS databases. We do this so the slave databases can have 'extra' schemas that are not replicated from the host.

The end result is the following error in the 'show slave status' output on whichever of the two problem slaves is started last:

              
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'A slave with the same server_uuid/server_id as this slave has connected to the master; the first event 'mysql-bin-changelog.157797' at 169188, the last event read from '/rdsdbdata/log/binlog/mysql-bin-changelog.157797' at 538718, the last byte read from '/rdsdbdata/log/binlog/mysql-bin-changelog.157797' at 538718.'

The mysql servers are all running mysql: Server version: 5.7.19-log MySQL Community Server (GPL)

This had been working for a long long time, for over a year... Other then mysql minor version upgrades, I don't think anything has changed.

Edited by: jeffwrule on Jun 17, 2019 6:30 PM

Edited by: jeffwrule on Jun 17, 2019 6:30 PM

Edited by: jeffwrule on Jun 17, 2019 6:32 PM

Edited by: jeffwrule on Jun 17, 2019 6:32 PM

asked 5 years ago223 views
6 Answers
0

Hello AWS can I get a little love on this? My replication is down and the normal things I would do to fix this are not available to me. The first would be to re-generate a new UUID by deleting the auto.cnf, not really possible with RDS. This needs to be looked at by the RDS team.

answered 5 years ago
0

The forums are for community discussion and do not function well for individualized technical support. Occasionally AWS personnel will respond to things here, but most of the time you need to open a support case.

Edited by: HalTemp on Jun 20, 2019 5:32 AM

HalTemp
answered 5 years ago
0

It looks like you are running self-managed read replicas to "have 'extra' schemas that are not replicated from the host." Are you aware that with RDS managed read replicas you can change the read-only flag to false in order to write data to replica slave and have 'extra' schema objects that do not conflict with the schema objects being replicated from the master?

I don't know how to answer the server-id question. Is it possible that you created duplicate UUID from restoring the same snapshot to different read replicas?

Perhaps you can consider using RDS managed read replicas. If you do you will need to be careful to avoid creating any conflicts that break replication.

-Phil

AWS
MODERATOR
philaws
answered 5 years ago
0

The UUID's are all unique and posted in the orig note. Gathered by connecting to each database and using the "show variables like '%server%' " command. We are sharding data across multiple databases with a common part of the database coming from a common (master) database. In fact they start out as read-replicas briefly and then I promote them add the new schema and make them a slave of the original database. Since the new schemas are not part of that database the only data that gets replicated is the orig common schemas.

I was not aware that you could continue to be a 'read replica' and set the read-only flag off. An interesting option. Any other side effects from doing this around backups etc? The data is important so can I schedule back-ups separately on each read replica? Need to be able to recover them individually.

All the options for AWS support models that I know of for RDS are too expensive for our startup.

It still makes no sense that the database is getting connected with a blank UUID from the masters perspective, and it works for 2 of the for databases but not the other 2.

Thanks for the feedback and any help/advice you can give is appreciated.

Edited by: jeffwrule on Jun 21, 2019 12:01 PM

answered 5 years ago
0

Yes, you can backup read replicas just as with any other RDS database. You can even convert them to multi-az and/or create chained read replicas from them.

But as I said, be careful when turning off read-only, you need to avoid changes on the read replica that conflict with the source master. As long as you only write to schema objects that do not exist on the source master you should be fine.

-Phil

AWS
MODERATOR
philaws
answered 5 years ago
0

This was fixed by rebooting the master server. The master had been up for 300+ days. I was not getting any specific error message, but something went wrong internally.

Rebooting the master and then restarting the slaves restored the correct UUID in the 'show slave hosts' from the master, and allowed the multiple slaves to run again in parallel. That is the master no longer saw all 4 slaves as having the same UUID (blank). They each have a unique UUID and things work as expected.

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions