# Hardrive Errors (BadCRC), Getting more and more serious

## magiuspendragon

I was running gentoo with latest gentoo-sources as of last week on an amd64 machine, and I was getting BadCRC errors in my kernel log. At first it was only while running mythtv, but later on it began to happen all the time. I tried running smartctl, badblocks, switched cables, changed my dma settings... Tests came up clean, and i was still getting the error. So i decided to try a new install, off the 2005.0 livecd that i had lying around. The install outputed a BadCRC error again on the first emerge sync of the install, and so i bought a new harddrive and the same thing happened. I then tried running my harddrive vendor's diagnostic tool and that came up clean as well. I continued with the install and got another crc error after emerege system.

*EDIT* I also ran fschk off the livecd. The only time it tried to fix anything was when i accidently ran it on my windows partition :c/

*END EDIT*

Any thoughts on what I should do? Asus tech support suggests that its the drivers, but it was happening on both my system and the livecd. I cant seem to consistantly reproduce the error right now also.

I'm running an Athalon64 FX55 on an Asus A8N SLI-Deluxe motherboard, i have a maxtor diamondback 200 and 250 gig ide harddrives in there now (the 250 is the new drive i bought), Nvidia video card, and soundblaster audigy sound card. I cant provide actual kernel log or the numbers for the error because neither drive have a logger at the moment.

Any other information you need, please let me know.

Any help would be appreciated,

ThanksLast edited by magiuspendragon on Sat Feb 25, 2006 3:58 pm; edited 1 time in total

----------

## magiuspendragon

Just to follow up, if i boot without any hdparm settings, then I dont get the BadCRC error, but just the other half, which was the 

```

dma_intr: status=0x51 { DriveReady SeekComplete Error }

```

happnes abotu once every 2 or 3 mins

any thoughts?

----------

## linuxtuxhellsinki

https://forums.gentoo.org/viewtopic-t-384809-highlight-driveready+seekcomplete+error.html

----------

## magiuspendragon

That worked for a little bit, but eventually i still got the error back. Changing the hdparm settings only prolongs teh time in between errors.

----------

## magiuspendragon

So I tried installing an x86 version of Gentoo, and that also turned up the same error when watching tv. It compiled everything okay, but runnnig myth has reproduced the error.

Currently the only solution I can find is to turn off dma with hdparm -d0A0 /dev/hda.

Is there any better fix? Someone had said that setting -Xudma5 had solved they're error, but that hasnt worked for me. Is there naything else I should try?

----------

## Garvonis

I sometimes get that error too. And I too, just got done with badblocks (god it took forever, about 2 days). It turned up some actual "bad blocks" too. Though I'm confused at the moment because it seems if I use "fdisk /dev/hda" then I get odd messages that seem to say that it'll toss away all of what badblocks did.

But, in case I get that error again, do those "hdparms" belong in the fstab or what?

----------

## electrofreak

sorry to bump this but I'm getting frustrated to no end over this.

I'm getting the same errors every so often in 'dmesg' on linux. I asked about it on #gentoo-amd64, and they basically said to just ignore it. So I did.

But, then I also dual boot with windows xp sp2. I actually use Windows XP more than linux (sad to say). Windows XP is 32-bit, but I'm running 64-bit gentoo, and everything is working fine btw.

But, now the frustrating part. In linux, nothing seems to happen except dmesg complains about the BadCRC errors from time to time, the rate varies, so I have no idea what to think. But, in windows, apparently they are also happening, because, now this took a bit of research to find out, but every 6 of these errors that windows sees, it will "downgrade" my transfer mode (DMA 5, then 4, then 3,...etc) all the way to eventually PIO, which is extremely slow. Now, the odd thing is that it only does it on my Primary Channel on the master drive. So, I'm actually wondering if perhapes it is my hard drive giving me problems.

I know this isn't a windows support forum, but does anyone have any idea what I can do about this. I have to reboot every few days to upgrade the transfer mode back to UDMA 5. It's very frustrating because other than that, my computer is running fairly stable. Under linux, it doesn't seem to adjust the transfer mode at all, but if it did, it can be changed on the fly anyway without a reboot.

Just so everyone knows:

Primary:

   Master: WDC 160GB 8MB cache IDE

   Slave: WDC 200GB 8MB cache IDE

Secondary:

   Master: LITE-ON DVDRW

   Slave: DVDRW IDE 16X

Then now (as of 3 days ago) on SATA I have two 160GB HDs on ports 1 and 2. No problems have been seen at all with these drives and their transfer mode.

It is ONLY the 160GB IDE drive that does it. And the drive has been used in another computer with no noticeb problems. (Didn't really know of any issue, but I could be wrong). I'm about to switch my Primary and Secondary to see if the problem just happens on the Primary Master channel or if it only happens on this hard drive.

 *Garvonis wrote:*   

> But, in case I get that error again, do those "hdparms" belong in the fstab or what?

 

To answer your question, hdparms are set with the command "hdparm [options] /dev/[device]" and they can be set as defaults in the config file '/etc/conf.d/hdparm' and run at boot with 'rc-update add hdparm default'

----------

## electrofreak

I would like to report to everyone the (probable) fix for these BadCRC errors. It seems that running two hard drives on one channel (both on primary for example) can cause problems, such as these BadCRC errors. So, the solution is to set everything up in a way so that there are not two hard drives running on the same channel. Depending on the managability of your case and cables and such, they may be rather difficult to achieve without an add-on IDE controller, which is what I had to do. Since I've done this, I have had no problems with either drive. This is obviously a large relief because of all the trouble that was had due to these problems.

----------

## widan

BadCRC errors can also be caused by bad IDE cables and/or connections.

----------

## electrofreak

possibly. I replaced my IDE cable a few times, so that apparently wasn't the problem.

However, all of a sudden, I'm getting the errors again. GRRRR. Gonna try replacing the cable again.

----------

## magiuspendragon

This has now gotten very serious. I came home last night to a locked computer, couldnt ssh or anything, so i had to restart. On restart, hdb1 failed to mount, dmesg shows the following:

```

Buffer I/O error on device hdb1, logical block 342801

hdb: dma_intr: status=0x51 { DriveReady SeekComplete Error }

hdb: dma_intr: error=0x40 { UncorrectableError }, LBAsect=2742975, high=0, low=2742975, sector=2742959

ide: failed opcode was: unknown

end_request: I/O error, dev hdb, sector 2742959

```

smartctl -a reported the following:

```

smartctl version 5.33 [x86_64-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model:     Maxtor 6B200P0

Serial Number:    B40NJHEH

Firmware Version: BAH41B70

User Capacity:    203,928,109,056 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:   7

ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0

Local Time is:    Sat Feb 25 09:27:36 2006 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x80)   Offline data collection activity

               was never started.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        (1622) seconds.

Offline data collection

capabilities:           (0x5b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               No Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               No General Purpose Logging support.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     (  82) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  3 Spin_Up_Time            0x0027   206   206   063    Pre-fail  Always       -       17488

  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       91

  5 Reallocated_Sector_Ct   0x0033   252   252   063    Pre-fail  Always       -       17

  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0

  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0

  8 Seek_Time_Performance   0x0027   247   243   187    Pre-fail  Always       -       47698

  9 Power_On_Hours          0x0032   230   230   000    Old_age   Always       -       24738

 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       196

192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0

193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0

194 Temperature_Celsius     0x0032   035   253   000    Old_age   Always       -       27

195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       13365

196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0

197 Current_Pending_Sector  0x0008   252   252   000    Old_age   Offline      -       11

198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0

200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0

201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       0

202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0

203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       1

204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0

205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0

207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0

208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0

209 Offline_Seek_Performnce 0x0024   253   253   000    Old_age   Offline      -       0

210 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0

211 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0

212 Unknown_Attribute       0x0032   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1

ATA Error Count: 79 (device log contains only the most recent five errors)

   CR = Command Register [HEX]

   FR = Features Register [HEX]

   SC = Sector Count Register [HEX]

   SN = Sector Number Register [HEX]

   CL = Cylinder Low Register [HEX]

   CH = Cylinder High Register [HEX]

   DH = Device/Head Register [HEX]

   DC = Device Command Register [HEX]

   ER = Error register [HEX]

   ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 79 occurred at disk power-on lifetime: 7554 hours (314 days + 18 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  5a 4a 08 3f 00 48 f0

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  f0 00 08 3f 00 48 f0 00      00:03:11.800  [VENDOR SPECIFIC]

  f0 00 08 3f 00 44 f0 00      00:03:10.392  [VENDOR SPECIFIC]

  f0 00 08 3f 00 40 f0 00      00:03:10.384  [VENDOR SPECIFIC]

  f0 00 08 3f 00 3c f0 00      00:03:10.375  [VENDOR SPECIFIC]

  f0 00 08 3f 00 38 f0 00      00:03:10.366  [VENDOR SPECIFIC]

Error 78 occurred at disk power-on lifetime: 7554 hours (314 days + 18 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  5a 4a 08 66 8f 33 e0

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  f0 00 08 3f 00 4c f0 00      00:00:59.294  [VENDOR SPECIFIC]

  f0 00 08 3f 00 48 f0 00      00:00:59.244  [VENDOR SPECIFIC]

  f0 00 08 3f 00 44 f0 00      00:00:55.301  [VENDOR SPECIFIC]

  f0 00 08 3f 00 40 f0 00      00:00:55.292  [VENDOR SPECIFIC]

  f0 00 08 3f 00 3c f0 00      00:00:55.284  [VENDOR SPECIFIC]

Error 77 occurred at disk power-on lifetime: 7554 hours (314 days + 18 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  5a 4a 48 3e 7d f4 e0

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  f0 00 08 3f 00 44 f0 00      00:00:55.301  [VENDOR SPECIFIC]

  f0 00 08 3f 00 40 f0 00      00:00:55.292  [VENDOR SPECIFIC]

  f0 00 08 3f 00 3c f0 00      00:00:55.284  [VENDOR SPECIFIC]

  f0 00 08 3f 00 38 f0 00      00:00:55.275  [VENDOR SPECIFIC]

  f0 00 08 3f 00 34 f0 00      00:00:55.266  [VENDOR SPECIFIC]

Error 76 occurred at disk power-on lifetime: 7554 hours (314 days + 18 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  5a 4a 08 3f 00 50 f0

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  f0 00 08 3f 00 50 f0 00      06:16:10.575  [VENDOR SPECIFIC]

  f0 00 08 3f 00 4c f0 00      06:16:09.171  [VENDOR SPECIFIC]

  f0 00 08 3f 00 48 f0 00      06:16:09.076  [VENDOR SPECIFIC]

  f0 00 08 3f 00 44 f0 00      06:16:07.669  [VENDOR SPECIFIC]

  f0 00 08 3f 00 40 f0 00      06:16:07.660  [VENDOR SPECIFIC]

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      5209         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

I'm really at my wits end. For those of you who read the whole post, this is the older harddrive , that started all this failure problem. The new one SEEMS to be running, but i still get DMA errors unless i turn off DMA compleatly. I read on another forum that its bad to leave dma off, so i turned it back on, saying that i'd look for any major signs of failure...

this seems like one to me.

Any advice?

----------

## electrofreak

Sry, no idea what to say about that magiuspendragon.

Ok, I think I have solved the problem... but I'm not completely satisified with the solution, but I guess I can be happy with it.

Ok... I've been conversing on the abit forums about the issue, then finally decided to RMA the board. I got approved and everything, then someone posted saying they had the same problem, and in further posting they recommended upgrading the BIOS. So I did that. And that didn't do anything. People are saying that it is actually an nForce4 issue.

So since it was completely isolated to Primary Master for me, I figured I'd just switch Primary and Secondary and try working that way. Well, since then I've had no errors, and I guess it's ok since I don't actually use my DVD drives all that much anyway.

Now, I'm not thrilled with the solution because by my good practices... CD/DVD drives shouldn't go before hard drives. Though, I haven't really had any trouble sleeping with it setup this way, so I guess it's alright. I certainly would prefer not to have to RMA the board because it's a lot of trouble to replace a motherboard.

But, other people I've talked to with the problem say that they have the problem else where, if not on all drives. So this is by no means a universal solution. The whole problem is very weird. And I guess I can't hate abit for it. But, next time I'm going with asus just because.

----------

## magiuspendragon

Actually it turned out that error was from a superblock corruption. I am actually running an Asus board, so its not limited to abit. I upgraded hdparm the other day, and havent really seen the issue since, but if it comes back, i'll consider your solution. Hopefully it wont botch my system  :Smile: 

thanks for the feedback

----------

