Tag Archive for failover

Replication and auto-failover made easy with MySQL Utilities

MySQL utilities in Workbench

Utilities in MySQL Workbench

If you’re a user of MySQL Workbench then you may have noticed a pocket knife icon appear in the top right hand corner – click on that and a terminal opens which gives you access to the MySQL utilities. In this post I’m focussing on the replication utilities but you can also refer to the full MySQL Utilities documentation.

What I’ll step through is how to uses these utilities to:

  • Set up replication from a single master to multiple slaves
  • Automatically detect the failure of the master and promote one of the slaves to be the new master
  • Introduce the old master back into the topology as a new slave and then promote it to be the master again

Tutorial Video

Before going through the steps in detail here’s a demonstration of the replication utilities in action…

To get full use of these utilities you should use the InnoDB storage engine together with the Global Transaction ID functionality from the latest MySQL 5.6 DMR.

Do you really need/want auto-failover?

For many people, the instinctive reaction is to deploy a fully automated system that detects when the master database fails and then fails over (promotes a slave to be the new master) without human intervention. For many applications this may be the correct approach.

There are inherent risks to this though – What if the failover implementation has a flaw and fails (after all, we probably don’t test this out in the production system very often)? What if the slave isn’t able to cope with the workload and makes things worse? Is it just a transitory glitch and would the best approach have been just to wait it out?

Following a recent, high profile outage there has been a great deal of debate on the topic between those that recommend auto-failover and those that believe it should only ever be entrusted to a knowledgeable (of the application and the database architecture) and well informed (of the state of the database nodes, application load etc.) human. Of course, if the triggering of the failover is to be left to a human then you want that person to have access to the information they need and an extremely simple procedure (ideally a single command) to execute the failover. Probably the truth is that it all depends on your specific circumstances.

The MySQL replication utilities aim to support you whichever camp you belong to:

  • In the fully automated mode, the utilities will continually monitor the state of the master and in the event of its failure identify the best slave to promote – by default it will select the one that is most up-to-date and then apply any changes that are available on other slaves but not on this one before promoting it to be the new master. The user can override this behaviour (for example by limiting which of the slaves are eligible for promotion). The user is also able to bind in their own programs to be run before and after the failover (for example, to inform the application).
  • In the monitoring mode, the utility still continually checks the availability of the master, and informs the user if it should fail. The user then executes a single command to fail over to their preferred slave.

Step 1. Make sure MySQL Servers are configured correctly

For some of the utilities, it’s important that you’re using Global Transaction IDs; binary logging needs to be enabled; may as well use the new crash-safe slave functionality… It’s beyond the scope of this post to go through all of those and so instead I’ll just give example configuration files for the 5 MySQL Servers that will be used:

my1.cnf

[mysqld]
binlog-format=ROW
log-slave-updates=true
gtid-mode=on
disable-gtid-unsafe-statements=true # Use enforce-gtid-consistency from 5.6.9+
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
datadir=/home/billy/mysql/data1
server-id=1
log-bin=util11-bin.log
report-host=utils1
report-port=3306
socket=/home/billy/mysql/sock1
port=3306

my2.cnf

[mysqld]
binlog-format=ROW
log-slave-updates=true
gtid-mode=on
disable-gtid-unsafe-statements=true # Use enforce-gtid-consistency from 5.6.9+
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
datadir=/home/billy/mysql/data2
server-id=2
log-bin=util12-bin.log
report-host=utils1
report-port=3307
socket=/home/billy/mysql/sock2
port=3307

my3.cnf

[mysqld]
binlog-format=ROW
log-slave-updates=true
gtid-mode=on
disable-gtid-unsafe-statements=true # Use enforce-gtid-consistency from 5.6.9+
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
datadir=/home/billy/mysql/data3
server-id=3
log-bin=util2-bin.log
report-host=utils2
report-port=3306
socket=/home/billy/mysql/sock3
port=3306

my4.cnf

[mysqld]
binlog-format=ROW
log-slave-updates=true
gtid-mode=on
disable-gtid-unsafe-statements=true # Use enforce-gtid-consistency from 5.6.9+
master-info-repository=TABLE
relay-log-info-repository=TABLE
master-info-file=/home/billy/mysql/master4.info
datadir=/home/billy/mysql/data4
server-id=4
log-bin=util4-bin.log
report-host=utils2
report-port=3307
socket=/home/billy/mysql/sock4
port=3307

my5.cnf

[mysqld]
binlog-format=ROW
log-slave-updates=true
gtid-mode=on
disable-gtid-unsafe-statements=true # Use enforce-gtid-consistency from 5.6.9+
datadir=/home/billy/mysql/data5
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
#master-info-file=/home/billy/mysql/master5.info
server-id=5
log-bin=util5-bin.log
report-host=utils2
report-port=3308
socket=/home/billy/mysql/sock5
port=3308

The utilities are actually going to be run from a remote host and so it will be necessary for that host to access each of the MySQL Servers and so a user has to be granted remote access (note that the utilities will automatically create the replication user):

[billy@utils1 ~]$ mysql -h 127.0.0.1 -P3306 -u root -e "grant all on *.* to root@'%' with grant option;"
[billy@utils1 ~]$ mysql -h 127.0.0.1 -P3307 -u root -e "grant all on *.* to root@'%' with grant option;"
[billy@utils2 ~]$ mysql -h 127.0.0.1 -P3306 -u root -e "grant all on *.* to root@'%' with grant option;"
[billy@utils2 ~]$ mysql -h 127.0.0.1 -P3307 -u root -e "grant all on *.* to root@'%' with grant option;"
[billy@utils2 ~]$ mysql -h 127.0.0.1 -P3308 -u root -e "grant all on *.* to root@'%' with grant option;"

OK – that’s the most painful part of the whole process out of the way!

Set up replication

While there are extra options (such as specifying what username/password to use for the replication user or providing a password for the root user) I’m going to keep things simple and use the defaults as much as possible. The following commands are run from the MySQL Utilities terminal – just click on the pocket-knife icon in MySQL Workbench.

mysqlreplicate --master=root@utils1:3306 --slave=root@utils1:3307
# master on utils1: ... connected.
# slave on utils1: ... connected.
# Checking for binary logging on master...
# Setting up replication...
# ...done.

mysqlreplicate --master=root@utils1:3306 --slave=root@utils2:3306
# master on utils1: ... connected.
# slave on utils2: ... connected.
# Checking for binary logging on master...
# Setting up replication...
# ...done.

mysqlreplicate --master=root@utils1:3306 --slave=root@utils2:3307
# master on utils1: ... connected.
# slave on utils2: ... connected.
# Checking for binary logging on master...
# Setting up replication...
# ...done.

mysqlreplicate --master=root@utils1:3306 --slave=root@utils2:3308
# master on utils1: ... connected.
# slave on utils2: ... connected.
# Checking for binary logging on master...
# Setting up replication...
# ...done.

That’s it, replication has now been set up from one master to four slaves.

You can now check that the replication topology matches what you intended:

mysqlrplshow --master=root@utils1 --discover-slaves-login=root;
# master on utils1: ... connected.
# Finding slaves for master: utils1:3306

# Replication Topology Graph
utils1:3306 (MASTER)
   |
   +--- utils1:3307 - (SLAVE)
   |
   +--- utils2:3306 - (SLAVE)
   |
   +--- utils2:3307 - (SLAVE)
   |
   +--- utils2:3308 - (SLAVE)

Additionally, you can also check that any of the replication relationships is correctly configure:

mysqlrplcheck --master=root@utils1 --slave=root@utils2
# master on utils1: ... connected.
# slave on utils2: ... connected.
Test Description                                                     Status
---------------------------------------------------------------------------
Checking for binary logging on master                                [pass]
Are there binlog exceptions?                                         [pass]
Replication user exists?                                             [pass]
Checking server_id values                                            [pass]
Is slave connected to master?                                        [pass]
Check master information file                                        [pass]
Checking InnoDB compatibility                                        [pass]
Checking storage engines compatibility                               [pass]
Checking lower_case_table_names settings                             [pass]
Checking slave delay (seconds behind master)                         [pass]
# ...done.

Including the -s option would have included the output that you’d expect to see from SHOW SLAVE STATUSG on the slave.

Automated monitoring and failover

The previous section showed how you can save some serious time (and opportunity for user-error) when setting up MySQL replication. We now look at using the utilities to automatically monitor the state of the master and then automatically promote a new master from the pool of slaves. For simplicity I’ll stick with default values wherever possible but note that there are a number of extra options available to you such as:

  • Constraining which slaves are eligible for promotion to master; the default is to take the most up-to-date slave
  • Binding in your own scripts to be run before or after the failover (e.g. inform your application to switch master?)
  • Have the utility monitor the state of the servers but don’t automatically initiate failover

Here is how to set it up:

mysqlfailover --master=root@utils1:3306 --discover-slaves-login=root --rediscover

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Wed Aug 15 13:19:30 2012

Master Information
------------------
Binary Log File    Position  Binlog_Do_DB  Binlog_Ignore_DB
util11-bin.000001  2586

Replication Health Status
+---------+-------+---------+--------+------------+---------+
| host    | port  | role    | state  | gtid_mode  | health  |
+---------+-------+---------+--------+------------+---------+
| utils1  | 3306  | MASTER  | UP     | ON         | OK      |
| utils1  | 3307  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3306  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3307  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3308  | SLAVE   | UP     | ON         | OK      |
+---------+-------+---------+--------+------------+---------+

Q-quit R-refresh H-health G-GTID Lists U-UUIDs

mysqlfailover will then continue to run, refreshing the state – just waiting for something to go wrong.

Rather than waiting, I kill the master MySQL Server:

mysqladmin -h utils1 -P3306 -u root shutdown

Checking with the still-running mysqlfailover we can see that it has promoted utils1:3307.

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Wed Aug 15 13:21:13 2012

Master Information
------------------
Binary Log File    Position  Binlog_Do_DB  Binlog_Ignore_DB
util12-bin.000001  7131

Replication Health Status
+---------+-------+---------+--------+------------+---------+
| host    | port  | role    | state  | gtid_mode  | health  |
+---------+-------+---------+--------+------------+---------+
| utils1  | 3307  | MASTER  | UP     | ON         | OK      |
| utils2  | 3306  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3307  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3308  | SLAVE   | UP     | ON         | OK      |
+---------+-------+---------+--------+------------+---------+

Q-quit R-refresh H-health G-GTID Lists U-UUIDs

Add the recovered MySQL Server back into the topology

After restarting the failed MySQL Server, it can be added back into the mix as a slave to the new master:

mysqlreplicate --master=root@utils1:3307 --slave=root@utils1:3306
# master on utils1: ... connected.
# slave on utils1: ... connected.
# Checking for binary logging on master...
# Setting up replication...
# ...done.

The output from mysqlfailover (still running) confirms the addition:

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Wed Aug 15 13:24:38 2012

Master Information
------------------
Binary Log File    Position  Binlog_Do_DB  Binlog_Ignore_DB
util12-bin.000001  7131

Replication Health Status
+---------+-------+---------+--------+------------+---------+
| host    | port  | role    | state  | gtid_mode  | health  |
+---------+-------+---------+--------+------------+---------+
| utils1  | 3307  | MASTER  | UP     | ON         | OK      |
| utils1  | 3306  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3306  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3307  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3308  | SLAVE   | UP     | ON         | OK      |
+---------+-------+---------+--------+------------+---------+

Q-quit R-refresh H-health G-GTID Lists U-UUIDs

If it were important that the recovered MySQL Server be restored as the master then it is simple to manually trigger the promotion (after quitting out of mysqlfailover):

mysqlrpladmin --master=root@utils1:3307 --new-master=root@utils1:3306 --demote-master 
  --discover-slaves-login=root switchover

# Discovering slaves for master at utils1:3307
# Checking privileges.
# Performing switchover from master at utils1:3307 to slave at utils1:3306.
# Checking candidate slave prerequisites.
# Waiting for slaves to catch up to old master.
# Stopping slaves.
# Performing STOP on all slaves.
# Demoting old master to be a slave to the new master.
# Switching slaves to new master.
# Starting all slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Switchover complete.
#
# Replication Topology Health:
+---------+-------+---------+--------+------------+---------+
| host    | port  | role    | state  | gtid_mode  | health  |
+---------+-------+---------+--------+------------+---------+
| utils1  | 3306  | MASTER  | UP     | ON         | OK      |
| utils1  | 3307  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3306  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3307  | SLAVE   | UP     | ON         | OK      |
| utils2  | 3308  | SLAVE   | UP     | ON         | OK      |
+---------+-------+---------+--------+------------+---------+
# ...done.

As always, we’d really appreciate people trying this out and giving us feedback!