Time To Pull The Plug

This is a subtitle. There are many like it, but this one is here.

Working on Mod_mcpage - Stupid Programming Tricks Edition

One of the things I want to do with mod_mcpage is move placing pages into memcached out of the backend application and into lighttpd, so it’s all handled there. To do so, the page’s content type will need to be stored along with the page in memcached. The easiest way to store it was to just put the content type at the beginning of the string to store, but I had to think for a bit to figure out the best way. I came up with two. Each involves jumping through some hoops, but at different points.

The First Way

This is probably the more correct way to store the content type, but requires somewhat fancier footwork further down the line. Here, we store the content type and the page together with null bytes separating them.


char *content_type = “text/plain”;
char *page = “Some text and whatnot”;

char *store = malloc(strlen(content_type) + strlen(page) + 2);

char *i; /* We’ll want this later to reset the pointer */
i = store;

strcat(store, content_type);
store += strlen(content_type);
*(++store) = ‘\0’;
strcat(store, page);

/* Reset *store back to the beginning */

store = i

The hoop jumping here is that strlen() won’t work anymore, since there’s a null byte in the middle of the string, so you’ll need to get the length of the string to be stored in two steps. You’ll also need to extract the content type and the page in the same manner. Below, we’ll go ahead and do both at the same time to illustrate.


size_t contype_len = strlen(store);
i = store; /* Using char *i from our earlier example - if you want to free the stored string, you’ll need the pointer somewhere. */

char *contype = malloc(contype_len + 1);
strcpy(contype, store); /* Now we have the content type in its own string */

/* Advance *store past the end of the content type. */

store += contype_len + 1;

/* Now let’s get the length of the page, and the overall length of the stored string */
size_t page_len = strlen(store);
size_t overall_len = contype_len + page_len + 2;

/* Now we have both the content type and the original page (accessed from *store) so we can use them. */

… do stuff …

/* Once we’re done, we can free them  up. This is why we saved the pointer in i = freeing *store will lead to Bad Things. */

free(i);
free(contype);

Again, pretty straightforward. The downside is that if you want the length of the string to store, you need to make sure you count both null terminated strings inside of it.

The Second Way

This was actually the first way I thought of, but is probably the less correct way to do it. However, it avoids having null bytes in the middle of the string. With this method, we store the length of the content type in one and a half chars at the beginning of the string we’re storing. Using one and a half chars does limit the possible content type lengths to 4096 bytes (the first char is ORed with 0xF0 to avoid a null byte at the beginning of the string). The bit manipulation here may also cause problems with endianness - if I end up using this way, I’ll have to set up Debian/390 under Hercules and see what happens.


char *content_type = “text/plain”;
char *page = “Some text and whatnot”;

char *store = malloc(strlen(content_type) + strlen(page) + 2);

/* Get the content type length and transfer it to the chars. */

#define MAX_CONTENT_TYPE_LEN 4095
size_t c = strlen(content_type);
/* Make cure that the length of the content type string isn’t over 4K - 1. Unlikely, but you never know. */
if(c > MAX_CONTENT_TYPE_LEN){
   … handle the error …
   }

unsigned char b, d;
unsigned int fl = 0x00000F00;
unsigned int fg = 0x000000FF;
b = (c & fl) >> 8; /* mask out everything but the second byte, shift right by 8 bits so the second byte becomes the first, and assign to b. */
d = c & fg; /* mask out everything but the first byte. */
b |= 0xF0; /* Don’t want to end up with a null byte there. */

*store++ = b;
*store++ = c;
strcat(store, content_type);
strcat(store, page);
store -= 2; /* get it back down to where we started. */

So now the string is ready for storage. What do we do when we want to use it?


char *i; /* For storing *store’s starting point for later */
i = store;

unsigned char n = *store++;
unsigned char m = *store++
n &= 0x0F; /* clear the bits in the top half of that byte that were there to prevent having a null byte. */

unsigned int clen = 0;
clen = n << 8;
clen |= m;

int y;
char *content_type = malloc(clen + 1);
char *bb;
bb = content_type;
for(y = 0; y < clen; y++)
   *content_type++ = *store++;
*content_type = ‘\0’;
content_type = bb;

/* Now we have the content type string and the page (accessed, as in the previous example, from *store). */

…. do stuff ….

/* We remembered to save the stored strings pointer in *i earlier, because we need it to free our memory. */
free(i);
free(content_type);

Those were the two ways I’d come up with to handle storing the content type. The second one occurred to me because I wanted to avoid dealing with embedded nulls, but it’s also pretty complicated. The first requires remembering to handle the null bytes, but is a lot less complicated (and can handle theoretical content type strings longer than 4K, should they come up). I’m planning on using the first method for storing the pages in memcached, but I’m keeping the second in reserve should there end up being some overriding reason. Also, it’s so ridiculous, I couldn’t help but share it with the world (although the world will likely point and laugh, and I’ll deserve it).

Comments imported from the old site.
It occurs to me

… that *(++store) = ‘\0’; isn’t strictly necessary, since there would already be a null byte there. *store++ would be sufficient.

Further, it occurs to me

With the first way of doing it, you don’t need to malloc a new string for the content type either, or assign the stored string’s value to another char pointer. Rather, this seems to work too.

Assume you have char *mc_ret, which is the value returned from memcached. Once you’ve determined if it was gzipped and needs to be decompressed or not, you can just do this to get the content type and the page into their own strings.

char *content_type, char *page;
size_t u = strlen(mc_ret);
content_type = mc_ret;
page = mc_ret + u + 1;

Now you have pointers to the content type and the page you want to return, but can leave the original string returned by memcached (or the decompressed version) untouched, and can free it when you’re all done with everything. Of course, if you have to modify the content type string or the page further, they’ll need to be copied to their own area of memory.