# Filesystem corruption using ext3 & md driver

## hades

I have recently impletemented Raid 1 using the standard kernel md driver. I am having problems with directories suddenly becomming corrupt. 

By corrupt it mean that when I ls (or any other file op) I get xxx file not found for each file (with the xxx being the filename). It seems to affect a whole directory at a time, with the rest of the filesystem appearing fine.

As these are ext3 partitions, I have tried touching /forcefsck so that a complete fsck is done. This returned no issues. I then copied off what I could and recreated the fs, copied data back. This fixed the issue, for a while.

Two days later, the problem came back this time on the root fs. I bit the bullet and reinstalled gentoo from stage1.

Again, everything was dandy for a day or too. Today it's back again. This time I tried running debugfs on the affending fs. What I noticed that that I can cd the directory, and can even cat one the files (it happens to be /usr/include/linux this time). But when I ls from the shell, I get the same xxx file not found.

This is how I have my box setup

- Eden ITX-800 motherboard with 2 80GB HDs, each the master on the prim/sec channels

- md for raid 1 is compiled into the kernel, not a module

- all partitions are type fd to autostart the arrays

- /dev/md0 is /boot

- /dev/md1 is /

- /dev/md2 is /usr2

- /proc/mdstat output is fine, no issues with the array

- kernel is the gentoo-sources, 2.4.19

- flags are -march=i586 -03 -pipe

- gentoo 1.4rc1

This sounds wierd to me. Why is fsck returning no errors when there is an issue? Why can I ls & cat files in debugfs, but not outside? Why do I have corruption when my filesystems have been shutdown nicely (the journal should protect me from this anyway)? Is there something I can do in debugfs to fix the dir?

I have to put it down as a md / kenrel issue bug, as the raid is the onlything that I have introduced that is new.

If you have read this far, thanks   :Smile: 

----------

## hades

Using debugfs I have tracked the problem to the inode flags displayed with the stat <dir> command.

Normal dirs have 0x0, where as the "problem" ones have 0x1000.

Changing the flags to 0x0 fixes the directory, but some get changed straight back. Weird!!!

I have done some searching to find out that the flags mean, but no luck yet. All I know is they are for extended functionality & can get listed /changed with lsattr & chattr, but lsattr does not show anything of interest   :Confused: 

Next step is to compile a vanilla kernel to see if the gentoo one is the problem....

----------

## edcjones

I have the same problem but I don't use raid. "dumpe2fs" shows that my ext3 partitions all have the needs_recovery flag set. If I mount the ext3 partitions as ext2, things are better. See https://forums.gentoo.org/viewtopic.php?t=24848

----------

## hades

well, I compiled a vanilla kernel & the problem vanished. Looks like the plain jane souces for me.

Still would like to know what the 0x1000 exended attribute means.

Hades

----------

## tytso

It sounds like the gentoo kernel has an early version of the htree patches that is corrupting directories.  My guess is that it's the fencepost bug when splitting a node.   

An updated set of kernel patches can be found here:

http://thunk.org/tytso/linux/extfs-2.4-update

The 2.4.20-rc1 patches are missing one or two minor bug fixes that are in the 2.5 code base (I'll get them updated versus 2.4.20 when I have a moment), but they should work a whole lot better than what gentoo is currently using.

----------

