Tag Archives: failure

S.M.A.R.T. (or not so smart)

Computer Hard DriveI really get frustrated when it comes to hard drive diagnostics.   Specifically: S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology).

The theory behind S.M.A.R.T. technology is that the hard drive is supposed to provide predictive failure information for hard drives.

The problem is, I’ve never had a S.M.A.R.T. technology accurately report a failure.

Continue reading

Another upgrade, another problem

Yep, it happened again … about this time last year I was trying to upgrade my servers to Fedora Core 6 and ran into some problems.

Well, I decided it was time to upgrade to Fedora 8 … and, since I have time off, I figured this was a fine time to do it again.

Bad move.

Of course, in retrospect … there never would have been a good time to do the upgrade, based on the problems I encountered. At least I know my backup procedure is fairly good now.

I had been planning this upgrade for weeks … everything was set. In fact, the first half of the upgrade went smooth as silk. I upgraded the main web server (gondor) to Fedora 8 and it went pretty nicely. Only two issues, both of which were solved after a little research.

This gave me the confidence to proceed to upgrade Rivendell to Fedora 8.

I started the upgrade by booting from the CD so I could install Fedora from the DVD ISO image I had on a USB hard drive. Problem is, the system wouldn’t boot this way.

Continue reading

Hard Drive Failures

Hard DriveI’ve noticed something … in recent memory, I have not suffered one single hard drive failure.

I’ve only suffered multiple hard drive failures … all my drive failures seem to happen in batches.

Last weekend the refurbished Seagate hard drive in my laptop (Rohan) started generating errors. About the same time, the main drive in Gondor started to flake out.

My laptop had been recently backed up with ghost, so getting it restored , to a spare 100gb hard drive I had, wasn’t a problem. I did struggle a bit because there was a Linux partition on the replacement drive … that Ghost didn’t know how to delete.

The drive in Gondor was a bit more problematical … although Linux was reporting problems with the drive, the Dell hard drive diagnostics reported problems with the drive, when I ran Spinrite over it, no problems were reported.

I decided to let the drive sit and see if the problems came back.

Obviously they did … this time, however, when I ran Spinrite on the drive it found a bad cluster. Luckily it was able to recover the cluster. After Spinrite was done, I copied the old drive to a new 300gb drive. Now I just have to get Dell to send me a new drive. Not sure what I’m going to do with a spare 80gb SATA drive.

Of course, all these hard drive problems got me to thinking … why the heck don’t operating systems raise serious alerts when a drive failure is detected?

On Windows XP, the drive problem was silently being logged to the “System Event Log”. I think it should have popped up a warning message telling me that something was wrong.

On Linux, the drive problems were also being logged to syslog … but if you aren’t actively monitoring the systems logs, it’s easy to miss something like that. I’m going to investigate some system monitoring software (something like Nagios) to keep an eye on problems of this nature.