Software RAID Drive Replacement
Software RAID is such a finicky thing. I've blogged in the past on how to grow a software RAID array in Linux, but have since switched to using hardware RAID in all of my servers and a Drobo at home (which I wrote a little about here). My Drobo just got 2 new 2TB drives and all that took was sliding the drives in, but one of my old servers lost a software-RAID drive last week.
This crashed the server for some reason, and a matching replacement drive for the Hitachi Ultra320 SCSI 73GB drives was $400 or so, so I bought an IBM Ultra320 SCSI 73GB drive online for $50 shipped. It arrived this week and I headed to the datacenter today to install it. What should be a stupid-simple process like it is with Drobo was a lot more involved. This was made even more painful because while the labeled size of the drives was identically, the new one had a few less actual blocks on it than the old one. Should this happen to you with an ext2/3/4 data volume, (or me again in the future) these are the steps to take:
- Shut down the system
- Replace the failed drive
- Boot up from a recovery CD (I used a Gentoo install CD)
- Use fdisk to partition the new drive with as close of a partition layout as you can to the old drive. Here are my two drives:
Disk /dev/sda: 73.4 GB, 73407868928 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xc36bc36b
Device Boot Start End Blocks Id System
/dev/sda1 * 1 5 40131 fd Linux raid autodetect
/dev/sda2 6 249 1959930 82 Linux swap / Solaris
/dev/sda3 250 8924 69681937+ fd Linux raid autodetect
Disk /dev/sdb: 72.9 GB, 72892735488 bytes
255 heads, 63 sectors/track, 8862 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xf496138a
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 5 40131 fd Linux raid autodetect
/dev/sdb2 6 249 1959930 82 Linux swap / Solaris
/dev/sdb3 250 8862 69183922+ fd Linux raid autodetect
- Note any block sizes that are different. In my case, the new sdb3 is 69183922 which is smaller than 69681937 on the old drive
- Use resize2fs to resize the existing partition on the good device to the size of the new partition on the new device. Yes, you are modifying it directly instead of going through the software RAID (one of the few nice things about software RAID):
resize2fs -f /dev/sda3 69183922
- Use mdadm to shrink the raid device to the size of the new partition on the new device:
mdadm /dev/md127 --grow --size=69183922
- Just to be safe, run a filesystem check on the new RAID volume:
e2fsck -y /dev/md127
- Add in the new drive to the RAID array:
mdadm /dev/md127 --add /dev/sdb3
- And wait for it to finish resyncing:
watch cat /proc/mdstat
If you run into device busy errors, you may need reboots, stopping the raid device (mdadm --stop /dev/md127), etc. And if you screw up the block sizes, all kinds of bad things happen. Also, if you just shrink the RAID volume but don't shrink the filesystem, or you shrink the RAID volume first, be prepared to spend far to much time trying to fix things! Sometimes it's faster (if your volume isn't close to full) to resize2fs down to a very small number, shring the raid volume to a very small number, let all the synchronizing happen, and then grow the raid volume with "--grow --size=max" and then resize your filesystem up to the new size of the RAID volume.
There are worse things to spend part of a saturday afternoon on, but I'd rather be outside!
comments powered by