Mmap

Perl Tiered Cache Benchmarks

These benchmarks are for the Perl tiered caching modules discussed here originally. Ruby tiered cache benchmarks shall follow later.

All tests performed on a Xen virtual machine with 512MB of RAM using one core of an Intel Core2 Duo E8500 @ 3.16GHz. The Perl version is 5.10.0 running on Debian 5.0. The remote memcached server is 1.2.5, local memcached is 1.2.2. The modules tested were Cache::Memcached::Fast, Cache::Mmap, Cache::FastMmap, and the custom Cache::Tiered and Cache::TieredFast modules (modules and benchmarking code are here.

The tiered configurations all use a remote memcached server. Cache::Tiered using a local Cache::Mmap cache, Cache::TieredFast uses a local Cache::FastMmap cache, and Cache::TieredLmc uses a local memcached server.

Benchmarking results below the fold. The first set of benchmarks is read-heavy, setting a value and reading it back 50,000 times (and resetting it if need be), the second set of benchmarks set and read a value 50,000 times. The test value used is the zlib HOWTO html file. NB: Right now, with the tiered cache modules, you pass in the memcached or mmap objects you want to use. Eventually, they will make them themselves.

Update:Forgot to add that the mmap files are stored in /dev/shm, but any memory based filesystem should perform the same. Disk-based mmap were not tested, but they don't seem as likely to give the same speed benefit.

Tiered Caching Thoughts

I should have written about this a while ago, but predictably I never got around to it, so I have to recreate the benchmarks. These are just some initial comments.

One of the best things I did for Daily Kos when I was getting ready for the 2006 midterm elections was to start using memcached. When I was getting ready for the 2008 elections, I greatly expanded the use of memcached throughout the backend, and it was one of the biggest factors in getting through Election Day in one piece. With a multi-webhead setup like ours, we were able to spread the work around to all the webheads with memcached so the webheads could take advantage of processing done to render pages done by other webheads. Really handy, and it helped keep our load below 3 on Election Night.

Unfortunately, I saw areas that memcached wasn't perfect. No matter what, it's better than not caching at all, or just caching locally, but using it as your only cache isn't ideal either. Fetching data from memcached over the network takes time that can really add up. You could cache stuff locally in your app too, but the other processes wouldn't be able to take advantage of it.

What I've experimented with, and have perl and ruby clients for, is a tiered cache. Basically, the cache uses memcached for persistent and distributed cache storage, but keeps a fast cache that quickly times out (I've been using 5 seconds, but a shorter time might be better) that's local to the machine it's running on, so all of the processes of your app can make use of it. When your app gets something from the cache, it first looks in the local cache, and if it's not there it looks in memcached. If it's in memcached, it retrieves the data (or, if it's not in memcached, it does whatever the app would do without the cache of course and then puts it in memcached for future use) and stores it in the local cache before moving on. After that, until it times out from the local cache, the app can fetch the data from the fast local cache without having to hit memcached at all. The short expiration time attempts to avoid the problem of data going stale, but both clients provide ways to delete data from the local cache at the same time it's deleted from memcached.

To implement the local cache, the Perl client uses Cache::Mmap (although now I've found Cache::FastMmap, which I don't remember seeing before). The ruby client, developed with the help of the ever helpful wycats uses  Berkeley DB for the local cache, but since the ruby client is built with Moneta, swapping BDB out for whatever else you'd prefer would be easy.

I can't find the benchmarks I did back in March or thereabouts, so I'm going to have to recreate them. Some random guy on the internet's memory isn't a good benchmark, after all. I remember that the difference between the tiered cache and just memcached was pretty astonishing, but I don't remember the specifics. I'll get some benchmarks together over the next few days. I do remember that the difference wasn't as extreme if the cache was write heavy, but the tiered cache was significantly faster with a read heavy benchmark (which is more like DKos' cache usage). When the new benchmarks are done, I'll post them.