Mysterious xtrabackup segfault problem solved

I’ve been battling with a mysterious problem for about a month or so where running xtrabackup from a cron job to do a full database backup would mysteriously fail when running on Sunday night, but would run fine when run by hand or when run in the cron at some other time. It would always fail at the same place in the database backup, but only leave the mysterious error [11592490.825845] xtrabackup[3436]: segfault at 84 ip 0000000000512fe5 sp 00007f179f3fbdf0 error 4 in xtrabackup[400000+260000] behind in the logs.

I poked and prodded at it, kept adding new tweaks that I thought might fix it, and would have them work fine when run by hand or in the cron right away. I would then sit back and wait for the next Sunday run to see if it worked, only to find each time that it had once again segfaulted. Finally I added something to the cron job itself to write a log when the offending cron ran, and I finally saw what was breaking it:

/usr/bin/xtrabackup: Error writing file '/data/mysql/current/base/./foobar/somerandomtable#P#p8.ibd' (Errcode: 28)

perror 28 is, of course, OS error code 28: No space left on device. So, if you see this error like that, you should check that you’re not running out of disk space on the device that you’re writing the xtrabackup backup files to.

 
comments powered by Disqus