# new computer, bad hard drive?

## Xero

I recently got a new computer (p4 2.4ghz, abit ic7-g) and I put the harddrive in it from my previous computer, a wd1200jb. Soon after I notice things getting a little weird and while I'm doing a emerge sync my whole computer starts to freeze up. I reboot and I hear the click of death. I power it off completely, restart it, and it's gone.

I manage to boot into gentoo and there's some file system corruption. By this point I'm flipped out so after making some last minute backups I ran fsck -c -c -C on all my partitions to check for bad blocks. Everything turned out good. I figured it was just software being screwy and that the click went away. To make sure the OS isn't corrupted I decide to reinstall gentoo, which I planned to do anyway to take advantage of p4 optimizations.

While I'm trying to reinstall gentoo, I go to mkreiserfs /dev/hda3 and it gives me some weird HDA:DMA INTERRUPT error and I reboot and still get it. I decide to take off the SATA-PATA adapter I was using and thought maybe it was causing it. I reboot and this time it gets farther. So I'm starting from a stage 2 and everything seems to be going good until I get the error again, this time with the click of death following.

Did the new computer kill the harddrive? Is it some kind of weird kernel I/O bug causing it? The fact that the hard drive decides to do this after I get my new computer is driving me nuts. I seriously doubt I damaged it in the process of moving it over, I was pretty careful. The only guess I have is that in my old computer, my hard drive was held on its side due to an odd case, now it's mounted horizontally like most cases hold them. Perhaps the head alignment is screwy because of that?  I plan to RMA the harddrive as soon as possible, but is it really the hard drive or could the computer have somehow caused it? I'd appreciate any insight.

I have an IBM 75gxp still surviving after almost 2 years of being on constantly and now my ~6 month old wd1200jb decides to go? I'm pissed.

----------

## robdavies

I doubt it's your hard drive, download manufacturer fix utility if problem persists. with the drive.

Much more likely is SATA issues, and relative newness of your mobo etc.  Are you running with 2.4.22 vanilla sources, or on of the other sets like -ac which aim to provide state of art hardware support?

----------

## Malakin

Weird clicking noises usually mean a bad hard drive.

If the problem isn't totally reproducable try heating up the system a little more then usual like putting it in a bag or something, just don't overdo it. This will probably make it click like crazy and freeze up pretty quick if it's really the hard drive.

Like robdavies said try WD's diagnostic software on it first and see if it comes up with anything. I've seen lots of bad drives that diagnostic software thought was ok so it doesn't always work but it works about 70% of the time so it's definitely worth trying before you try anything else.

If you have any data on the drive that you wish to get off you can leave it alone for a while to cool down and then blow a fan on the drive while you copy the data off, this usually keeps it going for longer before it freezes again.

----------

## Xero

I tried another hard drive and it's doing the same thing with it, I have a feeling it's not the harddrive now. I stopped using the sata-pata adapter and the problem still persisted. I was using kernel 2.4.22 but the gentoo install disk has 2.4.21 which is what it was using while I was installing. I've emailed abit in hope for a response but I'm not sure what they can do. I don't want to return something only to find out it wasn't the problem.

err..more specifically when doing the first emerge sync when reinstalling gentoo it started giving reiserfs file system corruption errors ,not relatively the dma related one that the other drive was giving nor the clicking, but considering the similarity and what not, I have no doubt it's some kind of ide controller or motherboard related issue. I'm starting to wonder if the motherboard is damaged or something. I suppose the possibility of a software related error is possible like a buggy kernel driver but if I'm not even using sata it seems that it'd be less likely, plus I looked around and made sure people have had success with this motherboard before buying it and never heard of any problems like this.

----------

## robdavies

What you've now said, convinces me it's most unlikely to be the drive.  I diasgree somewhat, with the l33t poster, in that I know most drive returns work when tested after being RMA-ed, all the issues I've had, have been fdisk(8) corrupting partition tables and been fixable.

Can you put back your drive in with old mobo etc and test?

Your friend with the Abit mobo, is google.com/linux see if others have similar issues.  Maybe you ought consider reading the LKML faq, and posting, at least do a search on list archive for issues.  SATA is just too new to rely on.

One thing I mean to mention was checking IDE cables, it's amazing how a small nick in one can cause trouble.  It can happen when you built a new box.

----------

## Xero

I did search google, but not google.com/linux. It doesn't seem to turn up any more information then google did. As for putting the drive back in with my old computer, I plan to do that ASAP, I'll probably do it after I post this. Considering I've basically tried 2 cables (sata with sata-pata adapter, and a regular pata cable) I am ruling that one out. I could try another cable but I am pretty sure it's not that. When I swap the mobo, if I still see the problem (i doubt i will) I'll swap the cable. If this is indeed an ide controller error, I still am not sure about the clicking, but I suppose it's possible that the harddrive was receiving bad signals and it caused it to flip out a bit. I've read that clicking doesn't always mean a bad hard drive, and I have to agree with you that it does seem unlikely to be the drive. I'm off to swap the mobo now.

----------

## Malakin

 *Quote:*   

>  I diasgree somewhat, with the l33t poster, in that I know most drive returns work when tested after being RMA-ed, all the issues I've had, have been fdisk(8) corrupting partition tables and been fixable.

 I think you misunderstood me. Diagnostic software only correctly identifies bad drives about 70% of the time. I've never had to re-rma a drive.

If two different drives were clicking on the same motherboard then I think it's safe to say it's not the drives. Very odd, I've never seen a drive click before that wasn't dying.

If it's clicking in the installer maybe try something else like a Knoppix CD or even something from the evil empire.

----------

## robdavies

Yes, I think I thought you were saying the drive  manufacturers diagnostics often failed to call a failed drive, failed, and therefore it was likely a paperweight.

----------

## Xero

I wasn't having 2 seperate dries clicking. It was just the one, the other was receiving I/O errors that were similar to the first drive, it didn't click though. It makes me think that the drive was getting some weird requests from the ide controller causing the heads to flick around. Whether or not it's actually damanged, I'm about to find out, as I got another motherboard in and I'm about to start the installation.

on another note while trying to mount a partition I had made on the mobo that the problem was on, it didn't mount and was screwy. I reformatted the partition and then it mounted fine. My home partition, which I had never even touched during all this mess, also mounted fine.

----------

## Xero

welp, the drive started clicking using a different motherboard. It must indeed be bad. DMA Lost interrupt error again. As for the motherboard, I'll have to do more testing to find out whats really going on. I'm going to put the new motherboard back in and try that other drive again.

----------

## Xero

Using the drive that wasn't clicking, I've been able to get rid of the weird error I was getting with it, it was not the same one as the clicking drive was getting, it was some reiserfs error that after doing some searching, may have been caused by not rebooting after running fdisk. Anyway I redid the partition tables,saved,rebooted and decided to use ext3 just to be safe, and so far so good. I'm still not entirely sure what was going on but one thing is sure I have one dead hard drive.

----------

## robdavies

Look, run the manufacturer's diagnostic utility and try 'repair drive'.  They can click, through messed up partition tables etc  I've had drives which appeared to fail, but which I got the data off, and got 'fixed', which still work fine today, years after 'problem.

----------

## Xero

I've downloaded the diagnostic tool and I'm running it now. I'm not too worried about getting data off it as I've pretty much got that covered, but considering the fact that many hard drives aren't actually bad as you say, something I've also noticed, I figure it's worth a shot. I'm going to run as many diagnostic things on this drive as possible. The quick test reported a read error and told me to run the full media scan to correct it which is going to take about 40 minutes according to the estimate. It's running as of now.

I'm more worried about the hard drive independent errors I seem to be getting. I can't even get emerge -u system to work during installation. I get these "internal error: segmentation fault" messages which is apparently GCC segfaulting. I installed slackware real quick on the spare drive to see if it'd still happen and indeed it did. If I keep re-emerging/compiling eventually it seems to get past but that's no solution, espcially on larger things. It did it twice compiling X at 2 different points and seemed completely unable to compile a copy of GCC.

I'm even more clueless then before with this whole situation. I still think the motherboard may be bad, I've never heard of anything like this before with the i875p chipset, or the specific motherboard I have. It could also be the chip I suppose. I really don't know.

update: the diagnostic tool finished with code 223 which means errors were found and repaired, and the drive should now be defect free. I reran the quick test and it returned no errors. Perhaps the drive is now indeed fixed. I haven't tried using the drive yet but I will as soon as I get the chance.

----------

## Xero

The drive is still doing weird hda errors and I've searched about the compiler errors and did a memtest, turns out one of my sticks of ram has tons of errors. With that out, so far the compile errors seemed to have disapeared.

----------

## Malakin

If you don't plan on returning the bad ram you can try upping the dimm voltage by .1 or .2, that will fix it sometimes. Obviously you can also try lowering the memory speed/timings  but this doesn't seem to work as often and then you're also decreasing your performance.

I run memtest on all new systems I assemble, memory errors are very common these days unfortunately. It's also common for a stick to fail in one machine but run fine in another at the same speed.

----------

## Xero

I plan to return the ram, It's top notch corsair xms 3200LLPT ram, I paid extra for it, I expect it to work properly, especially at stock speeds. I did try lowering the timing settings with no luck. I am pretty sure newegg will take it back wiithout much hassle, but I'll have to wait until monday to find out.

----------

