# inconsistency on ext3(now ext4) partition

## Tonglebeak

I've had two boot ups in a row where something has came up regarding my ext3 partition. Yesterday, during bootup, it said an inconsistency was detected, and fsck ran automatically. Now it booted up again, saying an inconsistency was detected and to run fsck manually (without -a or -p). Is this a sign that my hdd is dying?

[    6.264290] EXT3-fs warning: mounting fs with errors, running e2fsck is recommended

[    6.264530] EXT3 FS on sda2, internal journal

Linux h4x0r 2.6.30-gentoo-r5 #1 SMP Sat Aug 15 18:12:06 EDT 2009 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ AuthenticAMD GNU/LinuxLast edited by Tonglebeak on Fri Sep 04, 2009 4:36 pm; edited 1 time in total

----------

## NeddySeagoon

Tonglebeak,

Its a sign that your filesystem is lightly toasted. It may not relate to problems with the drive.

Run smartmontools on the drive to read its internal error log.

Run the short test and the long test, which reads the entire drive.

If you are paranoid, get the drive test software from the makers site. Be aware that this might want to do a destructive write test, which you probably don't want.

It the drive looks good, back it up with dd before you run fsck on it.

Making the metadata consistent may destroy some of your data. If so, it will end up in lost+found, but not in a very useful format.

In short - proceed cautiously.

If you have backups, cut your losses. Remake the filesystem and restore from your backups,

----------

## Tonglebeak

the shorttest came back clean. The long test is running now..

h4x0r aaron # smartctl -l selftest /dev/sda

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%      9545         -

Also, the internal error log showed a total of two errors during the entire hard drive's life. First occured after 542 hours, and the second occured at 6158. It's now on 9545 hours.

----------

## d2_racing

When you finish the other test, post the result and also what command that you have run too.

----------

## Tonglebeak

```
h4x0r aaron # smartctl -a /dev/sda                                           

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen 

Home page is http://smartmontools.sourceforge.net/                           

=== START OF INFORMATION SECTION ===

Model Family:     Maxtor DiamondMax Plus 9 family

Device Model:     Maxtor 6Y250M0                 

Serial Number:    Y66KKFBE                       

Firmware Version: YAR511W0                       

User Capacity:    251,000,193,024 bytes          

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7                                              

ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0               

Local Time is:    Sat Aug 29 19:11:28 2009 EDT                   

SMART support is: Available - device has SMART capability.       

SMART support is: Enabled                                        

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x85) Offline data collection activity

                                        was aborted by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.           

Self-test execution status:      (   0) The previous self-test routine completed         

                                        without error or no self-test has ever           

                                        been run.                                        

Total time to complete Offline                                                           

data collection:                 ( 363) seconds.                                         

Offline data collection                                                                  

capabilities:                    (0x5b) SMART execute Offline immediate.                 

                                        Auto Offline data collection on/off support.     

                                        Suspend Offline collection upon new              

                                        command.                                         

                                        Offline surface scan supported.                  

                                        Self-test supported.                             

                                        No Conveyance Self-test supported.               

                                        Selective Self-test supported.                   

SMART capabilities:            (0x0003) Saves SMART data before entering                 

                                        power-saving mode.                               

                                        Supports SMART auto save timer.                  

Error logging capability:        (0x01) Error logging supported.                         

                                        No General Purpose Logging support.              

Short self-test routine                                                                  

recommended polling time:        (   2) minutes.                                         

Extended self-test routine                                                               

recommended polling time:        ( 106) minutes.                                         

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:  

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  3 Spin_Up_Time            0x0027   180   180   063    Pre-fail  Always       -       22917    

  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       626      

  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0        

  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0        

  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0        

  8 Seek_Time_Performance   0x0027   251   245   187    Pre-fail  Always       -       64472    

  9 Power_On_Minutes        0x0032   224   224   000    Old_age   Always       -       353h+27m 

 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0        

 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0        

 12 Power_Cycle_Count       0x0032   247   247   000    Old_age   Always       -       2583     

192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0        

193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0        

194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       37       

195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       3250     

196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0        

197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0        

198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0        

199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0        

200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0        

201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       0        

202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0        

203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0        

204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0        

205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0        

207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0        

208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0        

209 Offline_Seek_Performnce 0x0024   191   191   000    Old_age   Offline      -       0        

 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0        

100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0        

101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0        

SMART Error Log Version: 1

ATA Error Count: 2        

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX] 

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]  

        DC = Device Command Register [HEX]

        ER = Error register [HEX]         

        ST = Status register [HEX]        

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,    

SS=sec, and sss=millisec. It "wraps" after 49.710 days.  

Error 2 occurred at disk power-on lifetime: 6158 hours (256 days + 14 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 b1 db 97 e8  Error: ICRC, ABRT at LBA = 0x0897dbb1 = 144169905

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 b1 db 97 e8 00      00:10:37.152  READ DMA

  c8 00 08 a9 d7 97 e8 00      00:10:37.136  READ DMA

  c8 00 08 79 68 97 e8 00      00:10:37.136  READ DMA

  c8 00 08 f9 67 97 e8 00      00:10:37.136  READ DMA

  c8 00 08 59 64 97 e8 00      00:10:37.136  READ DMA

Error 1 occurred at disk power-on lifetime: 543 hours (22 days + 15 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 b0 21 06 e0  Error: ICRC, ABRT at LBA = 0x000621b0 = 401840

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 b0 21 06 e0 00      00:08:27.968  READ DMA EXT

  25 00 40 d0 09 06 e0 00      00:08:27.968  READ DMA EXT

  25 00 08 c8 09 06 e0 00      00:08:27.968  READ DMA EXT

  25 00 08 c0 e9 05 e0 00      00:08:27.968  READ DMA EXT

  25 00 08 f8 dd 05 e0 00      00:08:27.952  READ DMA EXT

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      9547         -

# 2  Short offline       Completed without error       00%      9545         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

h4x0r aaron # smartctl -H /dev/sda

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

h4x0r aaron # smartctl -l selftest /dev/sda

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      9547         -

# 2  Short offline       Completed without error       00%      9545         -

h4x0r aaron #

```

Seems good to me. So what's the next step I should take?

Also, before I forget yet again, recently I added a sync attribute to my Steam directory in wine.

http://appdb.winehq.org/appview.php?iVersionId=1554

Says to use 

```
chattr -R +S ~/.wine/drive_c/Program\ Files/Steam
```

 so I did. This couldn't be causing the problem could it?

----------

## NeddySeagoon

Tonglebeak,

Your drive looks ok to me and no configuration commands you give your system can make your filesystem inconsistant.

Your next step is to read the fsck man page, or e2fsck man page if its an extX filesystem

Some filesystem checkers have a pretend option, which lets you see what will be done without doing anything to the filesystem.

If you have space, back the partition up into a file on another partition before you allow fsck to make any changes

----------

## Tonglebeak

For now I'm cp -a all my files to another hard drive, then I'm going to wipe out my main ext3 system and rebuild an ext4 (also plan on nuking the windows partition at the same time  :Smile: )

----------

## energyman76b

while the disk itself looks good the dma errors worry me. 

You should check your cabling - maybe replace it.

----------

## d2_racing

In fact, change your cable and in a couple of hours or days, rerun the test.

----------

## Tonglebeak

Well after hell (did not know the things I had to do, like have a livecd beforehand > :Sad: ) I rebuilt the system using ext4 and copied it all over. When I finally got things running again, I got the same inconsistency error...did a fsck.ext4 and I'll have ot wait to see what happens. And my oh my how fast the fsck.ext4 was  :Smile: Last edited by Tonglebeak on Fri Sep 04, 2009 4:08 pm; edited 1 time in total

----------

## Tonglebeak

GRRR, it did it again. What the _hell_ could be causing this? And yes I did swap sata cables already. Is it possible my hard drive is starting to fail even though SMART sees no issues? I've seen no visible data loss either.

----------

## d2_racing

Can you try your HDD inside an another box.

Maybe your motherboard is dying too.

----------

## Jaglover

I'd take a multimeter and check actual voltages. Motherboard sensors are not to be trusted, out-of-spec PSU can cause weirdest errors.

My 2 cents.   :Smile: 

----------

## Tonglebeak

I just don't understand why it's just this one drive that is causing the problems. I ahve a second hdd that has two ext4 partitions, and a swap partition, and they never had any issues (one partition mounts portage, the other is tmp).

----------

## d2_racing

Ok, maybe it's the plug attached to the motherboard that cause this.

Can you plug your HDD on an another SATA plug on the motherboard ?

----------

## NeddySeagoon

Tonglebeak,

Do not conclude that its a drive problem yet.

Its something in the data path from the CPU to the drive platter but almost certainly not the drive, CPU or RAM.

The CPU and RAM being common to other drives. 

That leaves the South Bridge, the drive controller chipset, the data cable, the connectors both ends of the data cable, the motherboard tracking, which connects the various parts together and possibly your PSU, especially if you have different loads in the power wires to the hard drives or the wire lenghts are different. Yes, we are looking for something that marginal.

----------

## devsk

Run badblocks on this disk right away. It will take a while if the disk is large, so start it at night.

If you have the backup for the whole disk, run the destructive read-write badblocks. That's more exhaustive and faster than non-destructive read-write test. On newly acquired drives, I typically run: 

```
badblocks -t random -s -v -w <disk>, where <disk> is like /dev/sda
```

On a used disk, I replace -w with -n. In this case, I have the full backup of the disk before I run badblocks.

BTW, how are you creating your ext4 filesystem?

----------

## Tonglebeak

I read up more on badblocks, and saw it could be invoked through e2fsck. So I ran e2fsck -cvf on /dev/sda1 (using a LiveCD of course, with the partition not mounted). It took about an hour and a half. It found 10-15 different inodes that were part of a "corrupted orphaned linked list", whatever that means. It also found "block bitmap differences" and sections of disk space that was reported as "free" when it wasn't (or maybe it was the other way around). Of course I allowed e2fsck to repair all of it. 0 bad blocks were found btw. So we'll see what happens...

----------

## Tonglebeak

Well, this has happened 4 more times. Is there anyway I can have fsck tell me which file/directory/whatever is being affected?

----------

## Tonglebeak

Here we go:

Extended attribute block for inode 200353 (/usr/src/linux-2.6.31-gentoo/kernel/dma.o) is invalid (1073741824)

I've already swapped cables, swapped sata ports, ran badblocks (which found nothing wrong), SMART is OK...I'm out of ideas other than the drive itself is going wonky.

Remember, I have a 2nd hdd in this which has never had any errors (ext4 partitions as well), and I tried using my primary drive on that same port that the 2nd one has never had an issue with, yet I still end up with these errors...

----------

