# partition problems

## upgrdman

I have a 250GB parellel ATA hard drvie, and it has only one, 250GB, ext3 partition on it. (/dev/hdb1) I think it's failing:

http://www.cgartwork.com/misc/partition_problems.jpg

dmesg shows:

```
hdb: dma_intr: status=0x51 { DriveReady SeekComplete Error }

hdb: dma_intr: error=0x40 { UncorrectableError }, LBAsect=143, high=0, low=143, sector=143

ide: failed opcode was: unknown

end_request: I/O error, dev hdb, sector 143

EXT3-fs: can't read group descriptor 9
```

Looking at that photo I took of my screen, it looks like the partition table is messed up, since the partition should be the full size of the disk...

I tried changing IDE cables, thinking maybe a defective one was the cause, but nothing changed. This drive used to work perfectly, and I first had problems about two days ago. First thing I noticed was that when I deleted (not just Trash'd...deleted) files, the amount of free space remained the same. So I rebooted, and then it showed the right amount of free space. Later that day I realized it was mounted read-only for some weird reason, so I rebooted again, and I got that screen you see in the JPEG image. Rebooted several times, same thing. So I switched IDE cables, and I still get the error  :Sad: 

So is there any way to fix this, and if not, is there any last-ditch attempt I can make to get any of my data back? Someone in IRC mentioned something about "dd" and "dd-rescue" but they left before I had a chance to furthur inquire.

I have some partial back-ups, but they're a bit old, and scattered amoung 100 or so CD-R's and DVD-R's...

I was thinking about setting up a software RAID mirror array in the future, would that have prevented this, or would the same error just have been mirror'd to the other drive(s)?

Thanks,

--Farrell F.

----------

## masteroftheuniverse

software RAID will help, you'll have a clean backup that way.  before you do anything else you should make a backup image of your drive with dd:

```

dd if=/dev/hdb1 of=/dev/hda1/HDBAKUP

```

then 

```

fsck -V /dev/hdb1

```

and see what happens

----------

## syg00

Getting a backup is fine, but who happens to have a spare 250Gig floating around on the primary drive ???.

Lucky you if so, but not me  ...   :Shocked: 

What you're looking for is probably dd_rescue

Seen it mentioned in posts, haven't tried it personally. Give it a go.

Don't worry about that message from fdisk, it's kinda common.

*WORRY* about the other messages - looks like the disk is on the way out.

Get a backup ASAP.

----------

## xxxx

My tips:

- try another (smaller, new) cable, this solved this problem in my system

- use the HDD manufacturer's diag sw

- overclocking?

----------

## upgrdman

I tried a different cable, and that didn't give any noticable difference. And no, I am not overclocking anything...

I looked in portage, and fount smartmontools, emerged it. I ran smartctl as root, and got a long responce, with some discouraging results...but since I have close to no idea what they mean, I don't know how "trash worthy" this hdd is... or more likely... warrenty service since it's less than a year old.

```
farrell root # smartctl -a /dev/hdb -s on

smartctl version 5.33 [x86_64-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model:     Maxtor 6Y250P0

Serial Number:    Y61ZHAAE

Firmware Version: YAR41BW0

User Capacity:    251,000,193,024 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7

ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0

Local Time is:    Sat Apr  9 01:09:52 2005 PDT

SMART support is: Available - device has SMART capability.

SMART support is: Disabled

=== START OF ENABLE/DISABLE COMMANDS SECTION ===

SMART Enabled.

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x80) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                 ( 363) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off supp ort.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        No General Purpose Logging support.

Short self-test routine

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        ( 107) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_ FAILED RAW_VALUE

  3 Spin_Up_Time            0x0027   185   185   063    Pre-fail  Always       -        20161

  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -        72

  5 Reallocated_Sector_Ct   0x0033   250   250   063    Pre-fail  Always       -        36

  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -        0

  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -        0

  8 Seek_Time_Performance   0x0027   253   249   187    Pre-fail  Always       -        63853

  9 Power_On_Minutes        0x0032   237   237   000    Old_age   Always       -        299h+33m

 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -        0

 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -        0

 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -        124

192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -        0

193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -        0

194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -        37

195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -        3152

196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -        0

197 Current_Pending_Sector  0x0008   250   250   000    Old_age   Offline      -        34

198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -        0

199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -        0

200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -        0

201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -        2

202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -        0

203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -        0

204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -        0

205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -        0

207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -        0

208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -        0

209 Offline_Seek_Performnce 0x0024   253   253   000    Old_age   Offline      -        0

 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -        0

100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -        0

101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -        0

SMART Error Log Version: 1

Warning: ATA error count 443 inconsistent with error log pointer 5

ATA Error Count: 443 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 443 occurred at disk power-on lifetime: 5377 hours (224 days + 1 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 8f 00 00 f0  Error: UNC 8 sectors at LBA = 0x0000008f = 143

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 8f 00 00 f0 08      11:09:56.784  READ DMA EXT

  25 00 08 87 00 00 f0 08      11:09:56.784  READ DMA EXT

  25 00 08 7f 00 00 f0 08      11:09:56.784  READ DMA EXT

  25 00 08 77 00 00 f0 08      11:09:56.784  READ DMA EXT

  25 00 08 6f 00 00 f0 08      11:09:56.784  READ DMA EXT

Error 442 occurred at disk power-on lifetime: 5377 hours (224 days + 1 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 2c 93 00 00 f0  Error: UNC 44 sectors at LBA = 0x00000093 = 147

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 2c 93 00 00 f0 08      11:09:52.736  READ DMA EXT

  25 00 30 8f 00 00 f0 08      11:09:51.712  READ DMA EXT

  25 00 34 8b 00 00 f0 08      11:09:50.688  READ DMA EXT

  25 00 38 87 00 00 f0 08      11:09:49.664  READ DMA EXT

  25 00 3c 83 00 00 f0 08      11:09:48.624  READ DMA EXT

Error 441 occurred at disk power-on lifetime: 5377 hours (224 days + 1 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 30 8f 00 00 f0  Error: UNC 48 sectors at LBA = 0x0000008f = 143

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 30 8f 00 00 f0 08      11:09:51.712  READ DMA EXT

  25 00 34 8b 00 00 f0 08      11:09:50.688  READ DMA EXT

  25 00 38 87 00 00 f0 08      11:09:49.664  READ DMA EXT

  25 00 3c 83 00 00 f0 08      11:09:48.624  READ DMA EXT

  25 00 40 7f 00 00 f0 08      11:09:47.600  READ DMA EXT

Error 440 occurred at disk power-on lifetime: 5377 hours (224 days + 1 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 30 8b 00 00 f0  Error: UNC 48 sectors at LBA = 0x0000008b = 139

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 34 8b 00 00 f0 08      11:09:50.688  READ DMA EXT

  25 00 38 87 00 00 f0 08      11:09:49.664  READ DMA EXT

  25 00 3c 83 00 00 f0 08      11:09:48.624  READ DMA EXT

  25 00 40 7f 00 00 f0 08      11:09:47.600  READ DMA EXT

  25 00 44 7b 00 00 f0 08      11:09:46.576  READ DMA EXT

Error 439 occurred at disk power-on lifetime: 5377 hours (224 days + 1 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 30 87 00 00 f0  Error: UNC 48 sectors at LBA = 0x00000087 = 135

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 38 87 00 00 f0 08      11:09:49.664  READ DMA EXT

  25 00 3c 83 00 00 f0 08      11:09:48.624  READ DMA EXT

  25 00 40 7f 00 00 f0 08      11:09:47.600  READ DMA EXT

  25 00 44 7b 00 00 f0 08      11:09:46.576  READ DMA EXT

  25 00 48 77 00 00 f0 08      11:09:45.552  READ DMA EXT

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

farrell root #
```

Anyone know what the above means, or can link to any good sites about how to make sence of the above error codes, etc.?

Thanks again,

--Farrell F.

----------

## matador

upgrdman,

I've had a similar problem but as stated on this thread, you have Reallocated_Sector_Ct 36 and Current_Pending_Sector 34. As stated on the thread, that is a very bad thing and indicates the end of the drive = backup! 

I would suggest you to check if your manufacturer has an diagnostics program (Seagate has one for their drives) that could check the drive. If there is a problem many manufacturers has a long term warranty that might apply... 

I don't know if you've solved it already but it's good to rule out hard drive error.

Good Luck

----------

## Sysa

 *Quote:*   

> 
> 
> I have a 250GB parellel ATA hard drvie, and it has only one, 250GB, ext3 partition on it. (/dev/hdb1) I think it's failing:
> 
> http://www.cgartwork.com/misc/partition_problems.jpg
> ...

 

Do fdisk -lu /dev/hdb and show the result.

 *Quote:*   

> 
> 
> I tried changing IDE cables, thinking maybe a defective one was the cause, but nothing changed. This drive used to work perfectly, and I first had problems about two days ago. First thing I noticed was that when I deleted (not just Trash'd...deleted) files, the amount of free space remained the same. So I rebooted, and then it showed the right amount of free space. Later that day I realized it was mounted read-only for some weird reason, so I rebooted again, and I got that screen you see in the JPEG image. Rebooted several times, same thing. So I switched IDE cables, and I still get the error 
> 
> So is there any way to fix this, and if not, is there any last-ditch attempt I can make to get any of my data back? Someone in IRC mentioned something about "dd" and "dd-rescue" but they left before I had a chance to furthur inquire.
> ...

  *Quote:*   

> 

  *Quote:*   

> 

 

----------

