# ext2, ext3, xfs corruption - hardware related???

## Ace_of_Spades

i'v a brand new system here:

   * MSI K7D Master-L (BIOS v1.5)

   * 2x AMD Athlon MP 2000+

   * 2x 512MB hynix PC2100 CL2,5 ECC

   * 2x Promise Ultra 133 TX2 (PDC20269 with latest BIOS)

   * 4x Maxtor 6E030L0 Ultra DMA 133 7200 RPM - 30GB

   * Enermax PSU EG-651P-VE (550W)

   * Nvidia Riva-TNT2 M64 32MB

I'v had no problems installing Gentoo 1.2 and 1.4_rc3 on LVM on software raid5 (3 raid disks, 1 spare). But when I create an ext2 or ext3 filesystem larger than 30GB e2fsck -f finds 10000s of errors without having the filesystem mounted at all.

BTW xfs behavior is also strange. Creating large files (>1GB) of random numbers and checking them with md5sum causes the system to fail. After that many executable files are not found even if they are on other partitions. After a reboot (power off - reboot never was found after that) and a raid-resync everything is fine again - same with ext2, ext3.

Onother command to kill the system in the above mentioned way is 'tar -cWvplf' for large paths like portage or src.

What I did to find the error:

   * HD check with Powermax from Maxtor

   * changed UDMA 133 cables

   * tried with only one DIMM

   * change CPUs

   * tried with each CPU alone

   * tried another UDMA controller (CMD649) and onboard IDE controller 

      instead of Promise controllers

   * tried with UDMA5 (-X69) instead of UDMA6

   * tried kernels: gentoo, 2.4.21pre5-ac1, redhat, 2.4.19 ....

Now memtest86 is running since 12 hours on extended tests with no error messages.

I do not know what to try next!!!

 :Question:  Could anybody give me a hint please  :Question: 

----------

## bLanark

I suspect hardware here. 

Is there any chance that you can take the drives out and put them in a windows machine for some non-destructive testing using the utilites on the IBM or Maxtor site?  Or if your BIOS has a S.M.A.R.T. test option (possible on the CD that came with it?) then run that. 

I had a laptop HDD go and the symptoms were the same - repeated fscks for no reason. I had a server IDE drive go but the symptoms were entirely different - any process accessing the drive would just hang.

Of course, it might not be hardware. Can you remove LVM from the equation and create partitions on each drive in turn? 

BTW, I use LVM with ReiserFS with a partition greater than 100G without problems.

----------

## Ace_of_Spades

I did all the test of Powermax from Maxtor's website (low level format inclusive) - no errors!!

----------

## bLanark

 *Quote:*   

> I did all the test of Powermax from Maxtor's website (low level format inclusive) - no errors!!

 

Hmm, yes, missed that at first.

OK, the drives are 30 gb and the errors are only when the partitions are bigger than 30 gigs - right? 

Anything in the syslog? Can you turn up the logging level for the LVM stuff? I can't see how to, but it must be possible, certainly most of the utilities (e.g. vgdisplay) have a -v or -vv option. 

You might want to try evms instead of lvm, if you're at a stage where you can afford to start again. 

Ally

----------

## Ace_of_Spades

My first try on installing gentoo 1.2 was directly on raid1 and raid5 partitions. So I think LVM isn't the bad boy.

On partitions up to 25GB (i don't know the exact limit) e2fsck -f works without problems after creating the filesystem. But the strange errors described above (tar, md5sum) occure on that partitions too.

On partitions of about 50GB (raid5) e2fsck -f finds 10000s of errors like:

"Inode XXX is in use, but has dtime set", "Inode XXX has compression flag set on filesystem without compression support", "Inode XXX has illegal block(s)",  "Too many illegal blocks in inode XXX"

even if the filesystem wasn't mounted anyway.

Cannot try anything at the moment becouse memtest86 v3.0 is running (test11) and I'm looking foreward to the results --> probably no errors I bet!

----------

## taskara

hardware.. or perhaps KERNEL

----------

## Ace_of_Spades

 :Twisted Evil:    thx a lot taskara for your v e r y helpfull posting! 

 :Question:   what kind of hardware

 :Question:   do you have a kernelconfig for me - or what do you mean by KERNEL 

----------

## bLanark

OK, for what it's worth, I've got this version of LVM, and this kernel:

```

altair root # vgcreate --version

vgcreate: Logical Volume Manager 1.0.5

Heinz Mauelshagen, Sistina Software  15/07/2002 (IOP 10)

```

```

altair root # uname -a

Linux altair 2.4.19-gentoo-r9

```

I am NOT currently spanning more than one HDD, I have a single 120 gb drive, one volume group and one physical volume - if I remember the terminology correctly

The other drive I was using is dodgy and is back with the vendor for "testing". I am using reiserFS, not xfs or ext2 or ext3.

I'm NOT using RAID either.

Sorry I can't be of more help. 

Oh, this machine is under gentoo 1.2, not 1.4 (i.e. 

```

gcc version 2.95.3 20010315 (release)

```

----------

## taskara

 *Ace_of_Spades wrote:*   

>    thx a lot taskara for your v e r y helpfull posting! 
> 
>   what kind of hardware
> 
>   do you have a kernelconfig for me - or what do you mean by KERNEL 

 

 :Razz: 

I'm just saying that maybe the kernel you are using is corrupting the filesystems.

try a vanilla kernel, or beta kernel..

is it happening when you INSTALL gentoo.. or AFTER you have mae your system and put in your own kernel ?

----------

## Ace_of_Spades

thx, but as mentioned in my first post I tried nearly every 2.4 kernel on the market.

memtest-86 v3.0 has finished all test (extended included) without errors!

----------

## taskara

try a new hard drive

try a different ide controller

upgrade your powersupply

get a shotgun

----------

## Ace_of_Spades

I did another install last night which seems to work without problems.

the following two parameters have been changed:

 * partitiontables were made with cfdisk instead of fdisk

 * 2 60cm (24")  UDMA133 cables were replaced by 2 45cm (18") shielded ones

The system now runs gentoo 1.4_rc3 with 2.4.20-gentoo-r1 on ext3 on lvm on raid5 without any errors.

 :Wink:  Special thanks to my little pink dancing elephant who put me back on the right way!

Finaly - can anybody explain to me which of the above mentioned changes did the trick (if cfdisk not only is a frontend to fdisk, could it be that fdisk is buggy?)

----------

## taskara

I would say it was the cables.. fdisk has nothing to do with your filesystem.. so it doesn't seem like the culprit.

long ide cables are notorious for problems.. expecially if they aren't shielded.

then again.. maybe there was a new package released since you did your first install.. who knows!? At least it works!  :Wink: 

----------

## klimg

I woldn't bet that you are out of the woods.I always get I/O errors from the hd after 3-4 month with ext2/3 filesystems on a samsung drive which does point to a defective drive.

The first time that happened I ran a hd test  from Cerberus that is supposed to destroy defective hardware for a week flat.No problems.With reiserfs everything works fine.

----------

## Ace_of_Spades

 *Quote:*   

>  I always get I/O errors from the hd after 3-4 month with ext2/3 filesystems

 

You are right - e2fsck reported errors again, but the oher tests (tar; generating 4 files from /dev/random of 4GB  size simultaniously and checking them with md5sum) work fine at the moment.

Think I will switch to xfs.

----------

## taskara

I went to reiserfs.. it's nice  :Smile: 

----------

## klimg

I am pretty sure in my case it's some issue with my cheapass everything onboard mobo (seen some stuff about I/O errors with sis chipset on the kernel list).But reiser never gave me any trouble.

----------

## taskara

OOOOHHHH you didn't mention you have an SIS chipset!!! lol  :Wink: 

----------

## dweigert

The MAXIMUM length for an ide cable is 18 inches.  Otherwise you do get corruption.

Dan

----------

## Ace_of_Spades

Solved the problem by myself and the help of google.

The solution is posted in:

https://forums.gentoo.org/viewtopic.php?t=41321

----------

## ben_h

Jeez that's interesting. Glad you've got it solved!

Although, I think the PS/2 issue was only half your problem -- the other half was probably the IDE cables being too long.

In any case, enjoy your (now stable) dual MP box   :Very Happy: 

----------

