# EXT3 getting corrupted

## cotam

I've had gentoo installed for a year (but never used it for a couple months).  So I've decided to finally boot into it, and found when I dcc'd a big file, around 200-300mb to my computer, at about 20% it stopped.  So I tried deleting it, finding out that the filesystem is read-only.  So I rebooted, and of course it detected some corruption (which was eventually fixed with manual fsck).  Since then, it would corrupt itself at random times.

So I decided to repartition/reformat and reinstall Gentoo, which no doubt worked for a while.  Until I extracted a couple 700mb rar's, deleting the rar's and keeping the cue/bin's afterwards.  So I burned them to a dvd with k3b, and left the computer running for the night.  Only to find out, that my filesystem was read-only again, with a good couple corruptions after reboot  :Smile: 

Any idea what could be the cause of this?  It's a 120GB Seagate SATA drive, which is fairly new to be corrupt.  I have XP and a backup NTFS partition created on the same drive, which no doubt never had a problem.  Is EXT3 that sensitive to storing/deleting large files?  Any advice would be appreciated.  Not sure if switching to ReiserFS or XFS would solve my problems.

----------

## skally

Cotam, 

this does not sound nice at all!

But let me assure you that ext3 of recent kernels is usually rock stable. Many people (including me) regularly copy, move or delete files > 700MB to and from ext3 without any problems.

Cannot say much about SATA, but I think it should be equally reliable.

Anyway, important for both is:

Which kernel version do you use and is it patched?

Without this info, nobody can check for known bugs or equivalent issues.

Good Luck!

----------

## cotam

I'm using the gentoo-dev-sources (2.6.10-r5).  I didn't use any extra patches with it.  I have an Asus A7N8X Deluxe, Seagate 120GB SATA drive (with SATA support enabled in kernel).  I see that 2.6.10-r6 is out, doubtful that it'll fix my problem though  :Sad: 

----------

## skally

Ok, perhaps you may be interested in this...?

http://www.techspot.com/vb/topic5457.html (Asus Bios update cures corruption)

Good Luck, again! 

 :Wink: 

----------

## cotam

Thanks, but I have the 1007 bios revision installed.  That seems to be talking about the 1004 beta BIOS, so I don't think that'd be the issue.

----------

## lotw

 *skally wrote:*   

> Cotam, 
> 
> this does not sound nice at all!
> 
> But let me assure you that ext3 of recent kernels is usually rock stable. Many people (including me) regularly copy, move or delete files > 700MB to and from ext3 without any problems.
> ...

 

I am using EXT3 on my 10,000 RPM SATA drive for a few months new and I delete regularly 1.4g to 2g files without problems.  I love seeing when I copy from one SATA drive to another the 89MB/s

----------

## cotam

Okay, It happened again overnight (upon getting a 4 gig file off bittorrent, stopping at 1.2GB and saying it's read-only filesystem).   Here is my dmesg:

sh-2.05b$ dmesg

status=0x51 { DriveReady SeekComplete Error }

ata2: error=0x40 { UncorrectableError }

ata2: status=0x51 { DriveReady SeekComplete Error }

ata2: error=0x40 { UncorrectableError }

SCSI error : <1 0 0 0> return code = 0x8000002

Current sda: sense = 70  3

ASC=11 ASCQ= 4

Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x110x04 0x00 0x00 0x00 0x00

end_request: I/O error, dev sda, sector 221994901

ata2: status=0x51 { DriveReady SeekComplete Error }

ata2: error=0x40 { UncorrectableError }

ata2: status=0x51 { DriveReady SeekComplete Error }

ata2: error=0x40 { UncorrectableError }

ata2: status=0x51 { DriveReady SeekComplete Error }

ata2: error=0x40 { UncorrectableError }

ata2: status=0x51 { DriveReady SeekComplete Error }

ata2: error=0x40 { UncorrectableError }

ata2: status=0x51 { DriveReady SeekComplete Error }

ata2: error=0x40 { UncorrectableError }

SCSI error : <1 0 0 0> return code = 0x8000002

Current sda: sense = 70  3

SCSI error : <1 0 0 0> return code = 0x8000002

Current sda: sense = 70  3

ASC=11 ASCQ= 4

Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x110x04 0x00 0x00 0x00 0x00

end_request: I/O error, dev sda, sector 229272213

EXT3-fs error (device sda3): ext3_free_branches: Read failure, inode=657045, block=2238126

Aborting journal on device sda3.

ext3_abort called.

EXT3-fs error (device sda3): ext3_journal_start_sb: Detected aborted journal

Remounting filesystem read-only

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device sda3) in ext3_reserve_inode_write: Journal has aborted

ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device sda3) in ext3_reserve_inode_write: Journal has aborted

EXT3-fs error (device sda3) in ext3_orphan_del: Journal has aborted

EXT3-fs error (device sda3) in ext3_truncate: Journal has aborted

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

__journal_remove_journal_head: freeing b_committed_data

journal commit I/O error

journal commit I/O error

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

EXT3-fs error (device sda3) in start_transaction: Journal has aborted

Any help would be appreciated.  Thanks.

----------

## lotw

 *cotam wrote:*   

> Okay, It happened again overnight (upon getting a 4 gig file off bittorrent, stopping at 1.2GB and saying it's read-only filesystem).   Here is my dmesg:
> 
> sh-2.05b$ dmesg
> 
> status=0x51 { DriveReady SeekComplete Error }
> ...

 

By that message it sounds like the SATA has a bad sector or sectors that the OS is trying to read of failing.

----------

## cotam

Well, I've repartitioned before reinstalling, and this seemed to happen again.  It's a fairly new drive, not sure how it'd get bad sectors already  :Smile:   But it sure looks like something is wrong.   Perhaps the Silicon Image SATA module in the kernel has some problems?  Wish someone that is using the A7N8X Deluxe with SATA would tell me if they're experiencing or experienced any problems like this.

----------

## ronvenema

There is an option in your kernel:Device Drivers > Block Devices > ATA/ATAPI/MFM/RLL support > Include IDE/ATA 2 disk support > Use Multimode by default. Try that.

----------

## cotam

That's already compiled into the kernel.  Thanks.

----------

## jbpros

 *cotam wrote:*   

> Well, I've repartitioned before reinstalling, and this seemed to happen again.  It's a fairly new drive, not sure how it'd get bad sectors already   But it sure looks like something is wrong.   Perhaps the Silicon Image SATA module in the kernel has some problems?  Wish someone that is using the A7N8X Deluxe with SATA would tell me if they're experiencing or experienced any problems like this.

 

Even if it is a new drive you should consider trying another disk. I had seen many cases were "fairly new drives" were bad. Those are soooo fragile!

```
end_request: I/O error, dev sda, sector 221994901
```

This is unfortunatly explicit.  :Rolling Eyes: 

----------

## albrow

Actually, I've been getting exactly the same errors.  I'm also using a Silicon Image controller.

It almost seems as though the hard drive has too much activity and then trips out.  For example, I have a small script that scans playlists and creates a large directory tree of symlinks to my MP3s, all sorted by album.  (I'm quite proud of it!  :Cool:  )  I was running Grip last Thursday at the same time as running the script and the drive fell over.  This morning it happened again whilst I was copying a few large files from a Win2k machine through Samba on to the Linux box, as well as deleting another few files off.

I'm running 2.6.10-gentoo-r6 at the mo.  It's also happened once before a while back.  The disk, I admit, is nearly full, but it's been this full before and hasn't had any problems, which is why I'm not sure if it really is a physical error.  Unfortunately I can't test it because the SATA drivers don't have SMART capabilities.

----------

## etnoy

How is your HDD controller? I'd check it and the HDD to see whether there are any problems with it. I cannot recall the names, but I am pretty sure that there are some  good disk checkers around. Try google.

----------

## barrct

Same sort of issues here, but I'm not on SATA, I'm running SCSI.

We had a power outage that lasted for about an hour so the box had to shut it's self off. I'm running apcupsd and it worked correctly, doing a graceful shutdown. Upon reboot, some things were missing/damaged, I took the box back down to run drive checks and I've got about 200M worth of lost data. I never have the switching to ro mode though, just data loss on reboot after running for a while.

I'm running 2.4.24 on a Sun with RAID and the qlogicpti driver in the kernel.

The drive/array that I'm getting the most corruption on is a set of 8 drives split into a mirrored pair of 4 drives in a raid5. This is a production server and I didn't feel like worrying about dropping a drive and loosing the box.  :Smile: 

There are two physical controllers with separate SCSI arrays attached to each one. There is a single raid5 on each array and then mirrored across the scsi chains just in case a cable or controller bites the big one.

Also, it never seems as though the journal is really kicking in. I never see any sort of journal-type things. Is there a way to see what in the journal?

----------

