Wednesday, August 03, 2005

Woes Of a Custom System

Monday night, horror struck. I mean, the worst kind -- the kind that strikes fear in most sys -admins.

I was using my linux desktop, which contains all of my pertinent data, when it decided to stop working. I right clicked to copy something, and gedit opened and the letter 'u' started going across the screen. I let it do this for about 1 minute before pressing backspace. Evolution (my email client), closed. I stopped and thought about it. I pressed the Esc key. Firefox opened. I pressed Ctrl-Alt-BkSpace to restart X. It closed out X-Windows, as it should.

So I stood there, thinking "should I restart?" or "should I make a recent backup?", or what? What could be so-wrong?

I changed the batteries on my wireless keyboard and mouse. Nothing worked. The mouse didn't move, the keyboard didn't respond. I couldn't switch virtual terminals. Yikes! So, I slowly reached down and hit the reset key, which got me two blank screens, and a spinning hard drive light. Word to the non-techies ... that's BAD.

I turned it completely off for about 2 minutes, grabbed a Dr. Pepper, and turned it back on. The boot screen came up immediately, and it began running. It stared the network. It found the LVM volume groups. And then it had a problem running fsck.ext2 on one of the partitions. It sounds reasonable. Except, that my desktop is exclusively running the reiser file system. There aren't any ext2 or ext3 partitions. It dropped me to a shell, after which I ran reiserfsck on all of my LVM volume groups (the system is raid 1'd, followed by LVM, followed by the necessary partitions) -- except my root file system. Everything was fine.

So, I restarted, and got the same error -- I could only assume that it was complaining about the root file system, which it was able to mount fine. So I grabbed a copy of Knoppix, a Fedora Core 3 rescue CD, and a linux image called sysrescue, [one could think I've done this before] and started in.

Turns out, root partition does have the problem, and it's not able to fix it. Recommended running reiserfsck with --fix-fixable. So I ran that. Came out with still more problems. So, I ran it with --rebuild-trees. Worse idea, but had no choice. The system got about 20% done, and then croaked about being out of disk space. Ooops.

Hans Reiser, it would seem, implemented some checking on --rebuild-trees, which will mark the partition as un-readable until it successfully completes the rebuilding. Until that time, the partition is unusable. IE, rebooting it left me without my LVM or RAID, so I had to manually rebuild the raidtab to get RAID started, and then forced a check with LVM (which found my logical volumes and got them up and running).

At this point, the only thing to do was create a new partition, binary copy over the 2.5GB of data into a larger partition using dd, and then rebuild the trees. If that worked -- great, I have the recoverable data, but will need to build a new system to use it. But, my data was not on the root, it was on /home and /opt, which are different partitions, so I don't need to go through all of that hassle. But, I did need a new system to put my old data on.

At which point in time, I trucked over to my brand new HP zv6008cl, and started to put it through its paces ........

0 Comments:

Post a Comment

<< Home