MySQL replication and crash recovery

Posted: Sat, 16 July 2011 | permalink | No comments

The question was recently asked, “How do I perform crash recovery on a MySQL master/slave database cluster? Because replication is async, a transaction that has been committed to the master may not be able to leave the master before the crash happens.”

The short answer is, you don’t. Asynchronous replication, by it’s very nature, is prone to these sorts of issues, and MySQL doesn’t make life easier in general by using query replay.

Some of the issues aren’t massive: for instance, if a slave crashes, the master will keep replication logs for the slave until it comes back up (as long as the slave comes back before the master decides some of the replogs are too old and starts deleting them, of course), so as long as you can live without a slave for a while, you can recover and continue along your merry way.

Promoting a slave to be a master, on the other hand, is a situation frought with peril. Here, you’ve just got to hope that your chosen slave is consistent with the master’s state at death (because it’s async, you have no guarantees about that), and that all the other slaves have the same ideas about what counts as the current state of the data. If your newly-promoted slave managed to apply an update that another slave didn’t, that slave will be out of sync with reality (until such time as the now-dead master comes back and replays all those queries… hahahaha). To even guarantee that any slaves have a consistent view of the data as compared to the new master, you’ve got to rereplicate everything from the new master, because MySQL’s “where are you up to?” counter is master-specific.

I’ve heard people who have to deal with this sort of thing these days say that the Maatkit tools are handy for dealing with various issues of this type (monitoring, repairing the database when it gets inconsistent).

However, I prefer to keep right the hell away from MySQL replication altogether, after a year of having to gently coax an insane MySQL “tree” replication setup to do the right thing for any extended period, and watching a group of DBAs at Engine Yard go slowly insane.

My preferred alternatives are:

For pure HA, replicate at the disk level using DRBD. You lose some of the supposed “benefits” of MySQL replication (“you can do dump backups off the slave! Woooooo!”), but you at least get consistency.
If you think you need a harem¹ for load purposes, use some other way of scaling MySQL that doesn’t involve throwing an update to an amorphous blob of servers and hoping it sticks in the right places. Sharding can work, as can aggressive caching (intelligent storage of precalculated query results in memcached can work wonders) or just buying a bigger machine.
If you want to do anything “fancy” (my tree-replication scheme falls into this category), just don’t. MySQL replication isn’t the answer to these sorts of situations, it’s the question. The answer is “No!”.

Never discount the value of the “Jaws” approach to scaling (“we’re gonna need a bigger boat^Wmachine”) – hardware is so ridiculously cheap, relative to the clueful sysadmin time needed to make (and keep) MySQL replication running, that you really need to be at the huge end of town (where other scaling issues are going to bite you in the arse first) before spending too much time on your database is worth the investment. Even then, you’ll get more bang for your buck from:

Optimising your queries;
Optimising your MySQL configuration;
Using a k/v store for really non-relational data (sessions, I’m looking at you)
Sharding your data

than you ever will from the constant, mind-numbing grind that is keeping MySQL replication going.

The term comes from a time when a colleague near-screamed “This slave cluster keeps fucking me!” at a particularly intransigent set of MySQL slaves. ↩

Comments on this post are closed.

Brane Dump

MySQL replication and crash recovery