[Mar. 10th, 2004|06:45 pm]
The Veritable TechNinja
[status |exhaustedexhausted]

So... SQL server I support goes down at about 6PM last night. We get the service call from the DBA that there's been multiple drive failures. Why he didn't call us after the first one failed, I don't know. We get to the site, sure enough there's two bad disks in a 6-disk RAID5 array. Spooky thing is, this system has a separate 2-disk RAID1 array for the C: partition, which is POSTing OK, but it's not booting. So we try to reseat the dead drives, nada. We move drives from slot to slot to see if it's a backplane issue, the failures follow the drives. I leave, knowing the server is fux0rated, and the hardware contract has expired. The DBA has already ordered 3 disks. I come back today, he's managed to get a different brand disk (that's the same SCSI revision and size) to try to rebuild to. It won't join the array, it'll only come up as "ready" or it's own volume. I decide to say "fuck it" at this point, force one of the dead disks to online, and rebuild it. The rebuild completes. Now the OS will try to boot, but BSODs, saying it can't load the software reg hive. I hose the OS partition, reinstall, and look at D:. It's readable, but there's some file corruption. So the DBA is going to verify what files he can, I'm going to restore what files are gone from backup tape, then we're going to put the new disk in to bring the array all the way online, reformat the partiton, and copy all the known good data back over. Once the other disks come, one's going in as a hot spare, one's sitting in the rack as a "cold spare". I'm going to go home now.

