This is not a huge update for mod_mcpage, but it changes the receive timeout, and adds configuration options for auto ejecting hosts, the server failure limit, and the retry limit. This fixes a problem where if a memcached server goes down lighttpd + mod_mcpage would wait way too long before giving up on memcached. The README.mod_mcpage has more information on how to use the new options. The repository is, of course, at https://github.
This new version of the mod_mcpage only adds one new feature (an option, mcpage.announce, to enable or disable the 'X-served-by-memcached' response header), but features a number of behind the scenes improvements. Internally, the request handling was cleaned up and taking content from the chunkqueue was streamlined. The module will also store pages that were compressed by the backend. If the client sends the appropriate headers, it will receive the pre-compressed page, but if it does not the module will decompress it before sending it to the client.
In preparation for some DK4 stuff, I went and revisited my mod_mcpage work and finally knocked a couple of little things off the list of things to do with it. As usual, this is a patch for lighttpd 1.5. As usual again, there's a tarball too. This patch still only applies to lighttpd 1.5; a version for 1.4 is on the to-do list. In terms of obvious new features, this one doesn't have much.
And working quite nicely, too. This is a mod_mcpage patch for lighttpd 1.5. (There's a tarball too.) This is the long awaited mod_mcpage where you can add a local memcached instance as a fast local cache with a quick timeout. From README.mod_mcpage in the tarball: This patch adds the ability to add a local fast memcached with a quick time-out for extra quick serving. It tiers with the other memcached servers (which you can have an array of) so that data from the remote memcached servers gets put into the fast local memcached server, where it will live for a few seconds until it times out and gets fetched from the remote again.
Apparently I'll be able to get the patch ready sooner than I expected. I had been experimenting with using localmemcache's C API to store pages in the fast local cache, assuming that it would be faster than using a local instance of memcached running on the server, and that I'd just have to add expiration and LRU removal to localmemcache (not the smallest task, but certainly quite doable). After converting mod_mcpage to use the binary protocol, I got a bit of a speed boost, so that mod_mcpage serves stuff up about as quickly as it lighttpd serves a static file (sometimes a few microseconds faster, sometimes a few microseconds slower).
No patch yet, but I added a local memcached instance that only stores stuff for a few seconds to mod_mcpage to see how it worked, and got some very interesting benchmarks. My original plan had been to craft up a cache with POSIX shared memory (which is pretty sweet), semaphores, and Glib hashes, but I thought about it some and decided to just let memcached handle the housekeeping stuff, since with a custom shared mem cache I'd have to worry about expiring older stuff and removing unused entries myself.
In summary, though, being served from a paravirtualized one proc Xen VM with 128MB RAM, a local memcached instance of 16MB, and a remote memcached server with 64MB running on a fully virtualized one proc Xen VM with 128MB RAM (processor-wise, the Xen host has a Intel Core2 Duo E8500 @ 3.16GHz):
Running 1000 requests with a concurrency of 1 had 95% of all requests served in 3ms with both local and remote memcacheds, 95% within 17ms with just the remote memcached, and 95% within 780ms with no memcached at all.
Running 1000 requests with a concurrency of 5, with both 95% were within 135ms (80% within 9ms, 50% within 7ms), with just the remote 95% were within 90ms (80% were within 69ms and 50% were within 63ms), and no memcaching at all 95% were within 3766ms (80% were within 3621ms and 50% were within 3554ms).
Mean times for concurrency of one with both, remote only, and no memcached: 4.004ms, 13.878ms, and 778.267ms.
Mean times (across all requests) for concurrency of five with both, remote only, and no memcached: 3.926ms, 13.359ms, and 718.912ms.
Update: As I was cleaning up mod_mcpage to make a patch out of it in the relatively near future, I took out a bunch of random debugging stuff that was going into the log and re-ran ab again, this time with the index page of the site I was testing with using the tiered local + remote caches against the same file being served statically. Over 10,000 requests at a concurrency of one, the tiered cache had a mean time per request of 2.216ms. The static file had a mean time of 2.110ms. Over 10,000 requests with a concurrency of five, the tiered cache had a mean time per request of 1.120ms across all requests. The static file had a mean time of 1.025ms across all requests.
Not too bad. Doing it in shared memory would be blazingly fast, but who knows when I'll have time to get all the little bits of that done. Something worth shooting for down the road though.
Full benchmarks (including the update) below the fold.
This is a working mod_mcpage patch for lighttpd 1.5. The issues working with compressed data, mod_deflate, uncompressed data, and some other strange combinations has been ironed out. Local files, fastcgi, and proxy data have been testing with various combinations of mod_deflate being turned on and off (in lighttpd and the backend), and it's all working now. Took some jumping around, too. This module stores content, either local or proxied, in memcached so it can be served out of there rather than hitting the disk or the backend server.
Another preliminary patch for lighttpd 1.5 mod_mcpage has been released. It has all the features of the previous version, with some resolved issues: Checks added so it doesn’t try to load objects larger than 1MB (or a limit you define at compile time) into memcached. Stores Expires: and Cache-Control: HTTP headers. To do: Needs to be non-blocking. Option to MD5 keys Binary & local data needs more testing MIME type checking for compression & appending debug data to pages.
These benchmarks are for the Perl tiered caching modules discussed here originally. Ruby tiered cache benchmarks shall follow later.
All tests performed on a Xen virtual machine with 512MB of RAM using one core of an Intel Core2 Duo E8500 @ 3.16GHz. The Perl version is 5.10.0 running on Debian 5.0. The remote memcached server is 1.2.5, local memcached is 1.2.2. The modules tested were Cache::Memcached::Fast, Cache::Mmap, Cache::FastMmap, and the custom Cache::Tiered and Cache::TieredFast modules (modules and benchmarking code are here.
The tiered configurations all use a remote memcached server. Cache::Tiered using a local Cache::Mmap cache, Cache::TieredFast uses a local Cache::FastMmap cache, and Cache::TieredLmc uses a local memcached server.
Benchmarking results below the fold. The first set of benchmarks is read-heavy, setting a value and reading it back 50,000 times (and resetting it if need be), the second set of benchmarks set and read a value 50,000 times. The test value used is the zlib HOWTO html file. NB: Right now, with the tiered cache modules, you pass in the memcached or mmap objects you want to use. Eventually, they will make them themselves.
Update:Forgot to add that the mmap files are stored in /dev/shm, but any memory based filesystem should perform the same. Disk-based mmap were not tested, but they don't seem as likely to give the same speed benefit.
I should have written about this a while ago, but predictably I never got around to it, so I have to recreate the benchmarks. These are just some initial comments.
One of the best things I did for Daily Kos when I was getting ready for the 2006 midterm elections was to start using memcached. When I was getting ready for the 2008 elections, I greatly expanded the use of memcached throughout the backend, and it was one of the biggest factors in getting through Election Day in one piece. With a multi-webhead setup like ours, we were able to spread the work around to all the webheads with memcached so the webheads could take advantage of processing done to render pages done by other webheads. Really handy, and it helped keep our load below 3 on Election Night.
Unfortunately, I saw areas that memcached wasn't perfect. No matter what, it's better than not caching at all, or just caching locally, but using it as your only cache isn't ideal either. Fetching data from memcached over the network takes time that can really add up. You could cache stuff locally in your app too, but the other processes wouldn't be able to take advantage of it.
What I've experimented with, and have perl and ruby clients for, is a tiered cache. Basically, the cache uses memcached for persistent and distributed cache storage, but keeps a fast cache that quickly times out (I've been using 5 seconds, but a shorter time might be better) that's local to the machine it's running on, so all of the processes of your app can make use of it. When your app gets something from the cache, it first looks in the local cache, and if it's not there it looks in memcached. If it's in memcached, it retrieves the data (or, if it's not in memcached, it does whatever the app would do without the cache of course and then puts it in memcached for future use) and stores it in the local cache before moving on. After that, until it times out from the local cache, the app can fetch the data from the fast local cache without having to hit memcached at all. The short expiration time attempts to avoid the problem of data going stale, but both clients provide ways to delete data from the local cache at the same time it's deleted from memcached.
To implement the local cache, the Perl client uses Cache::Mmap (although now I've found Cache::FastMmap, which I don't remember seeing before). The ruby client, developed with the help of the ever helpful wycats uses Berkeley DB for the local cache, but since the ruby client is built with Moneta, swapping BDB out for whatever else you'd prefer would be easy.
I can't find the benchmarks I did back in March or thereabouts, so I'm going to have to recreate them. Some random guy on the internet's memory isn't a good benchmark, after all. I remember that the difference between the tiered cache and just memcached was pretty astonishing, but I don't remember the specifics. I'll get some benchmarks together over the next few days. I do remember that the difference wasn't as extreme if the cache was write heavy, but the tiered cache was significantly faster with a read heavy benchmark (which is more like DKos' cache usage). When the new benchmarks are done, I'll post them.
If it seems like all I'm writing about right now is mod_mcpage, well, it's what I've been doing the most work on. Thanks to the helpful folks in #lighttpd on freenode, who made some suggestions and found some problems, there've been some changes. I've made a new patch for mod_mcpage, this time against the svn branch of lighttpd 1.4.x, although it should apply against 1.4.19 and 1.4.20 as well. However, if you apply this patch against an earlier release (which I haven't actually tested if it would work yet), you would at minimum need to rerun autoreconf -fi, and possibly .
From the NYT: A new digital library of Europe’s cultural heritage crashed just hours after it went online and will be out of operation for several weeks, the European Commission said Friday, attributing the embarrassing failure to overwhelming public interest. Europeana, a Web site of two million documents, images, video and audio clips, opened on Thursday with international publicity and acclaim from researchers. But by Friday, those trying to log on were greeted with a message telling them that the service may not be running again until mid-December, while computer capacity is upgraded.
A while back, I discussed the caching used on DailyKos with a hacked up mod_magnet and a lua script to serve up pages for anonymous users out of memcached, so it would avoid hitting the much larger mod_perl apaches on the backend running Scoop. It worked very well, but I never liked having all the extra overhead of mod_magnet when I was only using a small amount of its functionality.
The two-tiered setup with webservers is common knowledge. As time goes by and your site's traffic starts increasing, however, performance will begin to suffer again and you'll have to find different ways to scale.
A common way to do that is to cache dynamic content for anonymous users. A good rule of thumb is that anonymous browsers make up 90% of your website's traffic, so if you can avoid regenerating those pages constantly for anonymous visitors you can theoretically reduce your website's load to a tenth of what it was. Writing pages out to disk is a good way to cache pages, but it isn't perfect. If you're going to cache more than just the index page and other pages found at a consistent URL, you'll have to find a way to dump the pages out to disk and use some rewrite trickery to properly serve up the cached pages. If you have more than one webserver in a pool, then the cache isn't shared between the machines. Finally, and this was becoming a problem on Daily Kos, you can have performance issues caused by dumping the files to disk by itself when you have multiple webservers all updating their caches simulateously.
I solved this problem by moving from an Apache 1.3 proxy to a hacked up lighttpd proxy using mod_magnet and memcached to cache anonymous pages, sharing the work of caching them between all the webservers, and greatly expanding how much content I could cache. Below the fold is a description of how it works, how to set it up for yourself, caveats, and ideas for future development.
Note: This works for me, but like all things computer, it may not work for you for no clear reason whatsoever. Always make backups and have a disaster recovery plan in place.