People say, if this is your problem, it's the best kind of problem to have.
It all started when we launched some really awesome new features. Then a certain celebrity (who I WILL NOT name) started using our app, tweeted about it, and within days membership/usage has tripled. And it keeps growing...
Problem is, our MySQL database can't handle the load any more. And MY problem is, it's MY problem. Every night, between 8pm and 11pm, the load spikes so hard that the server sometimes (not always) crashes. By day we try to figure out better ways to optimize and scale this thing, and by night we're watching the "top" stats with itchy fingers on the "reboot/restart" button.
A brief summary: the data is flowing fast, stays in our custody for a while, then we delete it. But as this app has grown, the inserts are outnumbering the deletes, and the tables are growing out of control. Ratio of reads/writes is about 45/55. During peak hours, the db is handling in excess of 200 queries per second.
How do you scale MySQL?
- vertical partitioning. Keep different kinds of data on separate machines, all finetuned to handle load
- horizontal partitioning. Keep complete schemas on each machine, but split the data into chunks on each one
- Beef up the hardware
- Switch more tables over to the "MEMORY" engine. Is this a good idea?
- give up on MySQL and move everything over to something like Cassandra. Lovin the geek factor there, though I can't afford the three months it'll take to do it
How does Twitter do it?
I don't know much on the MySQL side, so my answers may sound very MS SQL-centric.
If you're getting that many writes, your indexes may be getting out of whack. Do you have a maintenance schedule to periodically rebuild them?
I don't know how (or if) MySQL handles the issue of replication, but having 1 DB server for writes that pushes changes to another DB server that handles all the reads can often make a huge difference.
MySQL does support Master/Slave replication. In fact a master (for writes) can have multiple slaves (read only). That's a good idea - and it'll buy us time while we explore how to scale up the DB that handles all the writes. Going forward, a master/slave setup solves only half of the problem.
I'd like to learn how to introduce some redundancy into our data setup, so if one machine crashes, others can keep the data flowing. AFAIK the only open-source database that offers true redundancy for reliability is Cassandra. Am I wrong?