For the last couple of months, I’ve been working off and on on porting xiafs, an ancient Linux filesystem and competitor to ext2 that lost out and was dropped from the Linux kernel back in 1999, to a modern kernel. I’m happy to say that I’m able to announce the release of modern-xiafs on github. Currently it works with the 2.6.32 kernel that shipped with Debian squeeze, but I intend to get up up to date with the latest kernel versions shortly.
You may, quite reasonably, wonder why someone would go to the effort of porting a 20 year old filesystem. I wanted to learn more about filesystems, but filesystems are a pretty hard area to break into if you don’t already know how what you’re doing.
I remembered seeing xiafs as an option when installing Slackware 3.6 when I was first starting to use Linux. I think I made a few floppies with xiafs on them, but even by that point xiafs had fallen out of use. I always wondered about it, though, and realized that porting it to modern Linux would be a great way to experiment and learn about filesystems because:
- It used to be in the Linux kernel, but wasn’t anymore. It had worked with Linux once, so it could do so again.
- It’s fairly simple. One of the problems xiafs had versus ext2 was that xiafs, being a relatively uninspired extension of the Minix filesystem, did not have much room to improve and expand, so while ext2 grew and expanded and it (and its descendents) still roam the Earth to this day, xiafs stagnated and eventually died off. That simplicity and relative lack of features, however, make it an ideal learning tool because it’s easier to understand.
- It’s a simple filesystem, but still has features like variable length directory entries and atime/ctime/mtime timestamps.
- Since it was originally an extension of the Minix filesystem, the existing Minix filesystem code that is still in the Linux kernel could be used as a guide to facilitate porting.
- Finally, no one else had really worked on this much over the years. For good or ill, this would have to be my work.
That said, there were some strange decisions that went into it. One of the oddest things I found while digging around was the fact that the number of blocks used by an inode is stored in the first byte of the first three block pointers in the inode. It’s also a bit surprising that xiafs only uses doubly indirect block pointers, thus limiting file sizes to 64MB. The inodes are only 64 bytes, too, so as it is it can’t be made any bigger. I assume it seemed like a reasonable limit in 1993, but I can’t help but think that it might have seemed a bit short-sighted even then.
I did my development work in 32 bit and 64 bit Debian squeeze VirtualBox virtual machines. Most of the work was done in the 64 bit VM, but for various reasons I wanted to have a 32 bit machine available to test on. I also had a VM running Slackware 3.5 to be able to make the xiafs filesystems I was working on, since the 2.0.34 kernel that shipped with that version of Slackware had xiafs available as a module. This way, I’d be able to confirm that this code would work with actual xiafs filesystems from back in the day, as well as have a filesystem to work with before I got mkfs.xiafs ported. This turned out to be important later.
My first thought was to take the xiafs code from the 2.1.20 kernel (which was the last kernel version before it was dropped) and port that directly to 2.6.32, using the changes in the Minix filesystem code between 2.1.20 and 2.6.32 to guide me. I battled with doing it that way for a while, but there were some massive massive changes to Linux’s handling of filesystem stuff between the 2.4 and 2.6 series, so that didn’t end up working out very well. I finally decided that I would be better off to go back to square one and use the Minix filesystem code in 2.6.32 as the basis for the port.
I did get a semi-working module out of that initial push, although it never got further than being able to mount without panicking and read the root directory in a somewhat garbled fashion. It wasn’t wasted work, though; while that first attempt at porting was a failed effort, it got me a lot more familiar with what the code was supposed to do and I learned that the original code was not even remotely 64 bit clean. I learned this after the 64 bit module couldn’t read the xiafs disk from the Slackware VM, while the Slackware VM couldn’t read the disk created with the version of mkfs.xiafs I got to compile on the 64 bit Debian VM. Looking over hexdumps of the device partitions, I noticed that while the right magic number was in the superblock of both disks, they were in slightly different places. In fact, everything on the 64 bit version’s superblock was spaced out further than it was on the superblock of the filesystem created on the 32 bit Slackware VM.
It turned out that there were assumptions all over the place in the old xiafs code about the width of integers and long integers. Ooops. This was easy enough to fix, at least. Specifying the size of the various integers more precisely let the 64 bit VM read the superblock of the Slackware-created disks.
Once I decided that trying to port the old xiafs code directly was a dead-end, I decided to try hollowing out the minix fs code instead and work from there. This was relatively straightforward, with most of the work involved being to look at the minix fs behavior, compare it with what the xiafs code did, and work out how to adapt the minix code to work that way. The minix fs and xiafs don’t work exactly the same way, and because there were so many changes to the filesystem code between when xiafs was in the kernel and now, it wasn’t always obvious how to change the minix code to work with the xiafs data structures. The most difficult parts involved the code that creates directory entries, updating the i_blocks count, and getting doubly indirect block pointers working.
Updating i_blocks wasn’t too bad in the end; I just had to find the right place to do it, and creating directory entries required a deeper understanding of how xiafs does the variable length directory entries. The minix fs uses a fixed length dentry, so before I could get all of that working correctly I had to figure out how to deal with the variable length entries and have it work with the page cache.
The doubly indirect block thing was way trickier. It manifested when I’d copy large files onto the xiafs disk and then check the disk, either on the Slackware VM or after I finished porting fsck.xiafs to the modern 64 bit Debian box. As far as fsck.xiafs was concerned, a bunch of free blocks had been improperly marked as used, while the i_blocks count would be too big according to the inode. I went down a lot of rabbit holes trying to figure out why i_blocks wasn’t being incremented correctly, and trying to figure out why used code might be improperly marked free. Once I figured out that the inode actually had the right number of blocks recorded, I started moving in the right direction. I compared hex dumps of a xiafs disk with a large file copied onto it from the Slackware VM and from the Debian VM and noticed the off by one error with the block pointer. Moving the pointer over with a hex editor fixed everything; reading the file was happy, fsck.xiafs was happy, everyone was OK. Editing block devices with a hex editor is not a good long term solution, however, so I turned back to the minix fs based block getting code and kept comparing it the code that did that in the xiafs code, looking for anything that might cause that off by one 32 bit int error. After calculating a bunch of bitshifts with a calculator and finding that everything kept checking out and wouldn’t explain the error, it hit me.
Turned out that I neglected to decrease the block count before trying to look up the first doubly indirect block pointer.
After I fixed that, it worked. Everything had come together, and I felt the xiafs port was ready to release.
There’s still some more porting work to be done. This module doesn’t work with the 3.2.0 kernel that shipped with wheezy (and presumably doesn’t with kernels even more recent than that), and hasn’t been tested on big-endian systems at all. I’d like to get those issues ironed out, and also get to the bottom of an issue where sometimes unmounting very large (like close to the limit in allowed size) xiafs filesystems can cause kernel panics (smaller filesystems appear to be safe, however). This should never be used in a production environment, of course, but it shouldn’t cause panics like that anyway.
Update: The unmounting large filesystems bug appears to be fixed, and it now supports both 2.6.32 and 3.2.0 (in other words, the default Debian kernels for the last two releases). I’ll write more about the unmounting large filesystems bug later, but I learned a valuable lesson about cargo culting. It still hasn’t been tested on big-endian systems, however, and you really shouldn’t be using it on a production system in any case.
It would also be fun to use it as a base to learn about other features like journaling (a journaling filesystem with a 64 MB max filesize and 2GB max volume size might be kind of pointless, but then again all of this is a little pointless), or experiment with it in other ways like porting it to FUSE or making it an in-kernel filesystem on other operating systems.
The other thing I’d like to do is make an annotated version of the xiafs source. At the moment the xiafs source has less than 2,000 lines of code, but still covers a lot of important concepts. Ext2 in the 3.8 kernel, on the other hand, has over 6,700 lines of code, while ext3 and ext4 have over 12,500 and 30,000 respectively. I think an annotated version of the xiafs code might help others more easily learn what I had to figure out on my own. There seems to be a bit of a gap between high level descriptions of what filesystems do and OH JESUS THAT MAKES ABSOLUTELY NO SENSE. It would be awesome to fill in that gap some.