|Saving the state of memcache when rebooting a system|
Does anyone do this? I was thinking of writing a custom program.
Just curious how many memcache users out there do anything special to save the memory state of memcache before doing a reboot, and restore it after?
In my particular use of memcache, we accumulate quite a cache over time. It grows and is kept current via dirty bits, etc. After a few weeks our cache can reach close to 2 GB in size. A reboot of our server really hits our performance since the cache essentially has to be rebuilt slowly (since too fast would take up too many resources and bog the server down for hours).
The pages we cache are extremely heavy database I/O pages, hence once we get them cached, it is a huge benefit, but we can't cache them all quickly or our server would slowly melt.
My plan was to create a simple program that saves the state of memcache before we do any server reboots. Either off in another database, or a flat file. Then when the server comes back up, have a second program take that data and re-populate the memcache RAM with it.
I figured if I worked with a flat file, I could bring the server back online and reload the cache without bogging down the database.
Anyone else have any experience in doing something like this? Pitfalls or words of advice?
Looks like an interesting idea. I see some problems though in a practical situation.
Memcache has no dump feature to move all its contents directly to a file and also no index feature to list the keys of all the stored data items. You need to program that by hand. To do that you have to know the naming of the data items to retrieve them. And as memcache is a self-cleaning cache which removes stale items, you may have to request much more keys than are actually available.
Another problem I can see is execution time. Even though memcache is designed to be fast, pumping 2GB of data into a file and back into memory may be a time consuming operation. Ideally these actions should be performed asynchronously where the memcache buffer is stored and filled slowly while your application is running. Otherwise a simple reboot may cause a long delay until your application is back up and running again.
Thanks for the response lammert. Great points.
Regarding the dumping of the cache, my implementation is custom built (by me), so that should be fine since I know the keys and how to pull the data.
You do raise some very valid points on execution time of reloading the cache. 2GB of data is either a big file, or a lot of time pulling from a database. I think I would definitely load it asynchronously. But the tricky part is when my server is rebooted, often the machine will get over whelmed until the cache gets caught up, which can take a few hours. I think I would almost have to use a flat file to keep the database from getting bogged down. Since I have a dual-processor (quadcore) machine I think it could handle the asynchronous work just fine. We will see. I am going to start working on something this week.
I'll post back how things work out.
Thanks again for your thoughts.
so how are the things going with your implementation?
Sorry for the late reply. Project has been on the back burner. I did end up writing the first program, to save the state of my memcache.
I ended up choosing to save it to a separate database instead of a flat file. This gave me much more control over the data, and doesn't tie up my live database while re-loading the data.
My test runs loaded over 75,000 pages of full cached html pages into a separate database (drop table and then inserts) in under 2 minutes. I also kept the state of the dirty bit as well, so when I restore it, the site/software will know the state of each page in memory as well. The dirty bit is a custom memory field I created to flag when source data has changed that affects a page.
I've also written the memcache_reload.pl script. It simply loops through the records and restores all the data to memory. This piece I have not tested yet, since it involves essentially bringing my server down and back up. I'm planning on doing some heavy testing during a late weekend night when traffic is at a low point.
This solution has also prompted me to think of other utilities I want to build. For instance sometimes I need to make a small change to a page template, however doing so involves reloading 75,000 generated pages from the database. I'm designing a program that will simply take a snippet of code and do a replace on it with a new snippet in the memcache memory. This will allow me to make minor tweaks to the template and not have to start all over re-building my memcache from scratch. The process would backup the memory as I do on a reboot before making any changes, just in case a problem occurred. I could then restore the state of the memory back to what it was if needed (using my first set of programs...=).
Yeah, it is late here and I'm probably biting off more than I can chew, but it is definitely a problem I need to solve since my hands are typically tied when it comes to making changes to these heavily cached pages. I can't afford to have my server bogged down for half the day whenever I need to tweak one of the page templates.
I'll stop back when I have gotten a bit further. I'm confident the first process is going to work fine (saving and reloading the cache). Not so sure about my second idea of replacing snippets of code directly in memory. =)
Thanks for sharing the preliminary results with us. 75000 items in less than 2 minutes sounds very promising. Better than half a day of slow server performance due to regeneration of the cached items. Looking forward to see the follow-up posts.
Just a quick update. I did end up completing a script that would replace chunks of data in memcache for minor changes. It worked for my first test run, but I quickly realized it was just too much maintenance and risk to use it regularly.
So I designed a new script to essentially "reload" memcache whenever I make a change to the affected scripts. It is quite simple and it involves a perl script that just loops and calls my pages with "wget". I had to configure my stat software to ignore the requests, and I also had to configure my anti-scraper software to ignore the requests as well.
I added in some controls to be able to space the requests out over time to reduce load on the server. It works very well. It takes a full night of running to completely reload the cache, but my server is pretty slow at night anyway, so it works out perfectly. Plus it keeps me on a controlled change release schedule. Any code changes get rolled out over night and not during high traffic times. A win-win solution. =)