# Overheating and bad blocks [SOLVED]

## pmatos

Hello all, 

I think that due to some overheating while compiling some software, I've generated some bad blocks on my 320Gb disk. Now, random crc errors while loading the kernel and some crashes during startup are becoming more frequent. Is there a way to check for these and solve them or replacing the disk is the only way out?

Regards,

Paulo MatosLast edited by pmatos on Thu Jan 11, 2007 12:38 pm; edited 1 time in total

----------

## Sten

 *pmatos wrote:*   

> Hello all, 
> 
> I think that due to some overheating while compiling some software, I've generated some bad blocks on my 320Gb disk. Now, random crc errors while loading the kernel and some crashes during startup are becoming more frequent. Is there a way to check for these and solve them or replacing the disk is the only way out?
> 
> Regards,
> ...

 

You can check for reallocated (Reallocated_Sector_Ct) and bad sectors (Current_Pending_Sector) by invoking 

```
smartctl -a drive
```

 and execute long SMART test (useful is there are no pending sectors but you think there's an error on the drive) by invoking 

```
smartctl --test=long drive
```

 but if it will found some errors on the drive I strongly advise you to replace it (easiest way is to dd to another 320 GB drive, but since unreadable sectors might damaged files you should recompile the whole system after transfer).

----------

## pmatos

 *Sten wrote:*   

>  *pmatos wrote:*   Hello all, 
> 
> I think that due to some overheating while compiling some software, I've generated some bad blocks on my 320Gb disk. Now, random crc errors while loading the kernel and some crashes during startup are becoming more frequent. Is there a way to check for these and solve them or replacing the disk is the only way out?
> 
> Regards,
> ...

 

Thanks for the suggestion. I did that and reiserfsck. Everything ok with drive. Problem is with memory. 4Gb of RAM to waste. memtest86 shows thousands of errors! :-\

----------

## net-0

I think the same thing happened to me... only I get really impatient and forget to read things through... Im going to stop before I break more.....

Anyways I think I borked my system due to over heating on the hard drive, I'm using reiserfs and I use this system as a fileserver so its always running... I started noticing files missing so I googled a few howto to repair the damaged or missing files.

And right off the bat I thought I should try, 

reiserfsck --rebuild-tree -S -l /root/recovery.log /dev/hda4 based on this howto..... 

http://antrix.net/journal/techtalk/reiserfs_data_recovery_howto.comments

It stops about 5 minutes into the rebuild and says bad blocks.... little did I know its recommend to make a copy which makes since... but now I already have ran this... so is there still hope in recovering any of my information?

I dont think the disk is damaged I think there is coruption and I'm clueless in recovery... because Im so impatient but I have infromation I would like to save... and any help is great.

Thanks,

And hopefully someone will answer... since it seems SOLVED has been added.

----------

## feld

follow the badblocks howto to locate the files that are on the bad blocks and get the bad blocks relocated. it should take care of you.

http://smartmontools.sourceforge.net/BadBlockHowTo.txt

----------

## net-0

I tried some of that howto and keep getting this 

```

gentoo web # smartctl -l selftest /dev/hda

smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.

=== START OF READ SMART DATA SECTION ===

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed: read failure       40%     13288         7100389

# 2  Short offline       Completed: read failure       60%      6926         38415721

# 3  Offline             Aborted by host               70%         0         -

gentoo web # smartctl -A /dev/hda

smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.

=== START OF READ SMART DATA SECTION ===

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  3 Spin_Up_Time            0x0027   181   181   063    Pre-fail  Always       -       29235

  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       149

  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       3

  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0

  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0

  8 Seek_Time_Performance   0x0027   249   237   187    Pre-fail  Always       -       40791

  9 Power_On_Minutes        0x0032   215   215   000    Old_age   Always       -       185h+55m

 10 Spin_Retry_Count        0x002b   252   252   157    Pre-fail  Always       -       1

 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       278

192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0

193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0

194 Temperature_Celsius     0x0032   043   253   000    Old_age   Always       -       49

195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       7836

196 Reallocated_Event_Count 0x0008   250   250   000    Old_age   Offline      -       3

197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       3

198 Offline_Uncorrectable   0x0008   252   251   000    Old_age   Offline      -       1

199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0

200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0

201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       2

202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0

203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0

204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0

205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0

207 Spin_High_Current       0x002a   252   252   000    Old_age   Always       -       1

208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0

209 Offline_Seek_Performnce 0x0024   240   240   000    Old_age   Offline      -       162

210 Unknown_Attribute       0x0032   253   251   000    Old_age   Always       -       0

211 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0

212 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0

```

I think I made it worse with

reiserfsck --rebuild-tree -S -l /root/recovery.log /dev/hda4

----------

## net-0

And for giggles here is the fdisk, and partition layout.

```

gentoo web # fdisk -lu /dev/hda

Disk /dev/hda: 203.9 GB, 203928109056 bytes

255 heads, 63 sectors/track, 24792 cylinders, total 398297088 sectors

Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/hda1              63       80324       40131   83  Linux

/dev/hda2           80325     1092419      506047+  82  Linux swap / Solaris

/dev/hda3         1092420    10876004     4891792+  83  Linux

/dev/hda4        10876005   398283479   193703737+  83  Linux

```

I really want to recover the information on partition 4, any help I will be grateful for.

--neto

----------

## net-0

Might be useful info too,

```

gentoo web # reiserfsck /dev/hda4

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/hda4

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Sun Jan 21 00:08:52 2007

###########

Replaying journal..

Reiserfs journal '/dev/hda4' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..

Bad root block 0. (--rebuild-tree did not complete)

Aborted

```

Like I said I have no idea where to start.... and the rebuild tree thing is a horrible place to start.

----------

## net-0

I ran badblocks too

```

gentoo ventserver # badblocks /dev/hda4

13769856

13769857

13769858

13769859

14986680

14986684

14986685

14986686

14986687

63389936

63389948

63389949

63389950

63389951

63389952

63389953

63389954

63389955

63389956

63389957

63389958

63389959

63389960

63389961

63389962

63389963

63389964

63389965

63389966

63389967

63389968

63389969

63389970

63389971

71939208

71939224

71939225

71939226

71939227

71939228

71939229

71939230

71939231

```

So does it look like the drive is dying? Or possible badblocks brought on by corruption?

----------

