I should have written about this a while ago, but predictably I never got around to it, so I have to recreate the benchmarks. These are just some initial comments.
One of the best things I did for Daily Kos when I was getting ready for the 2006 midterm elections was to start using memcached. When I was getting ready for the 2008 elections, I greatly expanded the use of memcached throughout the backend, and it was one of the biggest factors in getting through Election Day in one piece. With a multi-webhead setup like ours, we were able to spread the work around to all the webheads with memcached so the webheads could take advantage of processing done to render pages done by other webheads. Really handy, and it helped keep our load below 3 on Election Night.
Unfortunately, I saw areas that memcached wasn't perfect. No matter what, it's better than not caching at all, or just caching locally, but using it as your only cache isn't ideal either. Fetching data from memcached over the network takes time that can really add up. You could cache stuff locally in your app too, but the other processes wouldn't be able to take advantage of it.
What I've experimented with, and have perl and ruby clients for, is a tiered cache. Basically, the cache uses memcached for persistent and distributed cache storage, but keeps a fast cache that quickly times out (I've been using 5 seconds, but a shorter time might be better) that's local to the machine it's running on, so all of the processes of your app can make use of it. When your app gets something from the cache, it first looks in the local cache, and if it's not there it looks in memcached. If it's in memcached, it retrieves the data (or, if it's not in memcached, it does whatever the app would do without the cache of course and then puts it in memcached for future use) and stores it in the local cache before moving on. After that, until it times out from the local cache, the app can fetch the data from the fast local cache without having to hit memcached at all. The short expiration time attempts to avoid the problem of data going stale, but both clients provide ways to delete data from the local cache at the same time it's deleted from memcached.
To implement the local cache, the Perl client uses Cache::Mmap (although now I've found Cache::FastMmap, which I don't remember seeing before). The ruby client, developed with the help of the ever helpful wycats uses Berkeley DB for the local cache, but since the ruby client is built with Moneta, swapping BDB out for whatever else you'd prefer would be easy.
I can't find the benchmarks I did back in March or thereabouts, so I'm going to have to recreate them. Some random guy on the internet's memory isn't a good benchmark, after all. I remember that the difference between the tiered cache and just memcached was pretty astonishing, but I don't remember the specifics. I'll get some benchmarks together over the next few days. I do remember that the difference wasn't as extreme if the cache was write heavy, but the tiered cache was significantly faster with a read heavy benchmark (which is more like DKos' cache usage). When the new benchmarks are done, I'll post them.