# fsck on RAID devices breaks everything - HPT370

## ybby

hi all,

i spend many hours on setting up a raid device, but fsck always complain and break the system. i'd like to expose the problem i have to get your opinion.

i bought an Adaptec ATA RAID 1200A pci ide controller with chipset HPT370. i plugged my 2 seagate hdd on the first ide wire. i recompiled my kernel with HPT37X support + raid1 as a module. after reboot chipset seems to be recognised and my new hdd are here (/dev/hde + /dev/hdf).

first thing i tried is to set up RAID via the Adaptec BIOS, i made an array but as i quickly realized it was not useful at all because it can only do Software Raid, i gave up and i deleted the array in the adaptec bios. 

after that i performed a mkraid /dev/md0 (/dev/hde + /dev/hdf) , then the mke2fs -j /dev/md0 + mount /dev/md0 was OK, i began working on it and fill the /dev/md0 with lots of data. but all began to go wrong at the first fsck -y .... there was no way to restart the array ...

```
md: raidstart(pid 14532) used deprecated START_ARRAY ioctl. This will not be supported beyond 2.6

md: invalid raid superblock magic on hde1

md: hde1 has invalid sb, not importing!

md: autostart failed!

tour1 root # raidstart /dev/md0

/dev/md0: Invalid argument

```

ok what happened then? perhaps i forgot to set up the partition type to fd? perhaps i should make only an array made of /dev/hde1 + /dev/hdf1 ? i did this... but e2fsck always breaks everything and fsck -y even sometime enters an endless loop!

WTF? i read about wiping out all the disk, so then go for a dd if=/dev/zero of=/dev/hde .... i lost the initial geometry of my seagate. fdisk and cfdisk are unable to set it back... no matter. lets go for the new geometry and retry all the mkraid+mke2fs+fsck /dev/md0 circus.... the same.... fsck endless loop ....

WTF again? RAID 1 does not work ? lets try all this  directly on /dev/hde1 without RAID .... the same, many dd if=/dev/zero performed, still fsck endless loop .... and no way to mount....

```

mount /dev/hde1 /mnt/hde1

mount: wrong fs type, bad option, bad superblock on /dev/hde1,

       or too many mounted file systems

dmesg :

EXT3-fs error (device hde1): ext3_check_descriptors: Inode table for group 2 not in group (block 8388608)!

EXT3-fs: group descriptors corrupted !

```

 .... are my hdd dead? they are brand new, SMART does not report anything, badblock says it is ok so .... let plug that wire of 2 seagates to the mother board controller....  :Surprised:  miracle ! i manage to fill all the disk. 

```
while true; do dd if=/dev/zero of=file.`date  +%k%M%S` bs=1024 count=1048576;sleep 1;df -k .;done
```

and fsck is ok  :Smile:  ( at least once )

ok, then lets restart... replug HDDs to the HPT370 controller and go for all the mkraid+mke2fs+fsck /dev/md0 circus ... but fsck is still entering an endless loop, raidstart does not work , mdadm is not better at it...

```
tour1 root # mke2fs -j /dev/hde1

mke2fs 1.35 (28-Feb-2004)

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

14663680 inodes, 29304560 blocks

1465228 blocks (5.00%) reserved for the super user

First data block=0

895 block groups

32768 blocks per group, 32768 fragments per group

16384 inodes per group

Superblock backups stored on blocks:

        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,

        4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done

Creating journal (8192 blocks): done

Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 26 mounts or

180 days, whichever comes first.  Use tune2fs -c or -i to override.

tour1 root # fsck -f -y /dev/hde1

fsck 1.35 (28-Feb-2004)

e2fsck 1.35 (28-Feb-2004)

Group descriptors look bad... trying backup blocks...

Inode table for group 0 is not in group.  (block 8388618)

WARNING: SEVERE DATA LOSS POSSIBLE.

Relocate? yes

Backing up journal inode block information.

fsck.ext2: A block group is missing an inode table while reading bad blocks inode

This doesn't bode well, but we'll try to go on...

Pass 1: Checking inodes, blocks, and sizes

Relocating group 0's inode table to 10...

Restarting e2fsck from the beginning...

Group descriptors look bad... trying backup blocks...

Superblock has a bad ext3 journal (inode 8).

Clear? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

Inode table for group 0 is not in group.  (block 8388618)

WARNING: SEVERE DATA LOSS POSSIBLE.

Relocate? yes

fsck.ext2: A block group is missing an inode table while reading bad blocks inode

This doesn't bode well, but we'll try to go on...

Pass 1: Checking inodes, blocks, and sizes

Relocating group 0's inode table to 10...

Restarting e2fsck from the beginning...

Group descriptors look bad... trying backup blocks...

Inode table for group 0 is not in group.  (block 8388618)

WARNING: SEVERE DATA LOSS POSSIBLE.

Relocate? yes

fsck.ext2: A block group is missing an inode table while reading bad blocks inode

This doesn't bode well, but we'll try to go on...

Pass 1: Checking inodes, blocks, and sizes

Relocating group 0's inode table to 10...

Restarting e2fsck from the beginning...

Group descriptors look bad... trying backup blocks...

Inode table for group 0 is not in group.  (block 8388618)

WARNING: SEVERE DATA LOSS POSSIBLE.

Relocate? yes

fsck.ext2: A block group is missing an inode table while reading bad blocks inode

This doesn't bode well, but we'll try to go on...

Pass 1: Checking inodes, blocks, and sizes

Relocating group 0's inode table to 10...

Restarting e2fsck from the beginning...

Group descriptors look bad... trying backup blocks...

Inode table for group 0 is not in group.  (block 8388618)

WARNING: SEVERE DATA LOSS POSSIBLE.

Relocate? yes

fsck.ext2: A block group is missing an inode table while reading bad blocks inode

This doesn't bode well, but we'll try to go on...

Pass 1: Checking inodes, blocks, and sizes

Relocating group 0's inode table to 10...

Restarting e2fsck from the beginning...

Group descriptors look bad... trying backup blocks...

Inode table for group 0 is not in group.  (block 8388618)

WARNING: SEVERE DATA LOSS POSSIBLE.

Relocate? yes

fsck.ext2: A block group is missing an inode table while reading bad blocks inode

This doesn't bode well, but we'll try to go on...

Pass 1: Checking inodes, blocks, and sizes

/dev/hde1: e2fsck canceled.

dmesg:

EXT3-fs error (device ide2(33,0)): ext3_check_descriptors: Block bitmap for group 880 not in group (block 0)!

EXT3-fs: group descriptors corrupted !

EXT2-fs error (device ide2(33,1)): ext2_check_descriptors: Inode table for group 0 not in group (block 8388608)!

EXT2-fs: group descriptors corrupted!

```

WTF again? e2fsk is not good? oki, lets try reiserfs.... great, fsck is OK just after the mkreiserfs, i'm happy, lets fill that disk with gigabytes of files.... shit, fsck -y repairs stuffs ..... so.... again, borked... after many --rebuild-tree , here is what i get :

```
fsck -y -f /dev/hdf1

fsck 1.35 (28-Feb-2004)

reiserfsck 3.6.18 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/hdf1

Will put log info to 'stdout'

###########

reiserfsck --check started at Fri Nov 19 20:14:14 2004

###########

Replaying journal..

Reiserfs journal '/dev/hdf1' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..

Bad root block 0. (--rebuild-tree did not complete)

Warning... fsck.reiserfs for device /dev/hdf1 exited with signal 6.

tour1 root # mkreiserfs /dev/hdf1

mkreiserfs 3.6.18 (2003 www.namesys.com)

A pair of credits:

Nikita Danilov  wrote  most of the core  balancing code, plugin infrastructure,

and directory code. He steadily worked long hours, and is the reason so much of

the Reiser4 plugin infrastructure is well abstracted in its details.  The carry

function, and the use of non-recursive balancing, are his idea.

Elena Gryaznova performed testing and benchmarking.

Guessing about desired format.. Kernel 2.4.26-win4lin-r8 is running.

Format 3.6 with standard journal

Count of blocks on the device: 29304560

Number of blocks consumed by mkreiserfs formatting process: 9106

Blocksize: 4096

Hash function used to sort names: "r5"

Journal Size 8193 blocks (first block 18)

Journal Max transaction length 1024

inode generation number: 0

UUID: 1b1ee623-8298-48b7-9520-e8de3c525ba0

ATTENTION: YOU SHOULD REBOOT AFTER FDISK!

        ALL DATA WILL BE LOST ON '/dev/hdf1'!

Continue (y/n):y

Initializing journal - 0%....20%....40%....60%....80%....100%

Syncing..ok

Tell your friends to use a kernel based on 2.4.18 or later, and especially not a

kernel based on 2.4.9, when you use reiserFS. Have fun.

ReiserFS is successfully created on /dev/hdf1.

tour1 root #  fsck -y -f /dev/hdf1

fsck 1.35 (28-Feb-2004)

reiserfsck 3.6.18 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/hdf1

Will put log info to 'stdout'

###########

reiserfsck --check started at Fri Nov 19 20:15:19 2004

###########

Replaying journal..

No transactions found

Checking internal tree..finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

        Leaves 1

        Internal nodes 0

        Directories 1

        Other files 0

        Data block pointers 0 (0 of them are zero)

        Safe links 0

###########

reiserfsck finished at Fri Nov 19 20:15:39 2004

###########

tour1 root #  fsck -y  /dev/hdf1

fsck 1.35 (28-Feb-2004)

reiserfsck 3.6.18 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/hdf1

Will put log info to 'stdout'

###########

reiserfsck --check started at Fri Nov 19 20:15:48 2004

###########

Replaying journal..

No transactions found

Checking internal tree..finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

        Leaves 1

        Internal nodes 0

        Directories 1

        Other files 0

        Data block pointers 0 (0 of them are zero)

        Safe links 0

###########

reiserfsck finished at Fri Nov 19 20:16:08 2004

###########

tour1 root # cat toFillDisk ;cd /mnt/hdf/

while true; do dd if=/dev/zero of=file.`date  +%k%M%S` bs=1024 count=1048576;sle

ep 1;done

tour1 hdf # while true; do dd if=/dev/zero of=file.`date  +%k%M%S` bs=1024 count

=1048576;df -k .;sleep 1;done

1048576+0 records in

1048576+0 records out

Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/hdf1            117214656   1082464 116132192   1% /mnt/hdf

1048576+0 records in

1048576+0 records out

Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/hdf1            117214656   2132088 115082568   2% /mnt/hdf

1048576+0 records in

1048576+0 records out

Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/hdf1            117214656   3181704 114032952   3% /mnt/hdf

1048576+0 records in

1048576+0 records out

Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/hdf1            117214656   4231324 112983332   4% /mnt/hdf

1048576+0 records in

1048576+0 records out

Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/hdf1            117214656   5280940 111933716   5% /mnt/hdf

1048576+0 records in

1048576+0 records out

Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/hdf1            117214656   6330560 110884096   6% /mnt/hdf

494593+0 records in

494592+0 records out

tour1 hdf # 

tour1 root #  fsck -y  /dev/hdf1

fsck 1.35 (28-Feb-2004)

reiserfsck 3.6.18 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/hdf1

Will put log info to 'stdout'

###########

reiserfsck --check started at Fri Nov 19 20:28:59 2004

###########

Replaying journal..

Reiserfs journal '/dev/hdf1' in blocks [18..8211]: 0 transactions replayed

Checking internal tree../  1 (of  10)/  1 (of 170)bad_directory_item: block 8211: The directory item [1 2 0x1 DIR (3)] has the entry (3) "file.201822" with a not legal state (204), (4) expected

bad_directory_item: block 8211: The directory item [1 2 0x1 DIR (3)] has the entry (6) "file.202320" with a not legalstate (204), (4) expected

/  7 (of 170)block 13225: The number of items (129) is incorrect, should be (1)

 the problem in the internal node occured (13225), whole subtree is skipped

/  3 (of  10)/  8 (of 170)block 8747253: The level of the node (0) is not correct, (1) expected

 the problem in the internal node occured (8747253), whole subtree is skipped

/  4 (of  10)block 8912382: The level of the node (0) is not correct, (2) expected

 the problem in the internal node occured (8912382), whole subtree is skipped

finished

Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

3 found corruptions can be fixed only when running with --rebuild-tree

###########

reiserfsck finished at Fri Nov 19 20:29:32 2004

tour1 root # mount /dev/hdf1 /mnt/hdf/

tour1 root # cd !$

cd /mnt/hdf/

tour1 hdf # ll

/bin/ls: file.202003: Permission denied

/bin/ls: file.202320: Permission denied

total 4688905

drwxr-xr-x   3 root root           272 Nov 19 20:27 .

drwxr-xr-x  16 root root          4096 Nov  9 14:18 ..

-rw-r--r--   1 root root    1073741824 Nov 19 20:18 file.201730

-rw-r--r--   1 root root    1073741824 Nov 19 20:20 file.201822

-rw-r--r--   1 root 8388608 1073741824 Nov 19 20:23 file.202122

-rw-r--r--   1 root root    1073741824 Nov 19 20:27 file.202520

-rw-r--r--   1 root root     506463232 Nov 19 20:28 file.202725

```

so what the hell is wrong? the HTP370 driver ? lets' upgrade from kernel 2.4.26-win4lin-r8 to 2.6.9-win4lin ...  newest kernel change nothing... same situation...

here are   a few data you would perhaps like to check ....

```
tour1 root # fdisk -l /dev/hde

Disk /dev/hde: 120.0 GB, 120034123776 bytes

255 heads, 63 sectors/track, 14593 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/hde1               1         609     4891761   fd  Linux raid autodetect

/dev/hde2             610        1218     4891792+  fd  Linux raid autodetect

/dev/hde3            1219       14593   107434687+  fd  Linux raid autodetect

tour1 root # fdisk -l /dev/hdf

Disk /dev/hdf: 120.0 GB, 120034123776 bytes

255 heads, 63 sectors/track, 14593 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/hdf1               1         609     4891761   fd  Linux raid autodetect

/dev/hdf2             610        1218     4891792+  fd  Linux raid autodetect

/dev/hdf3            1219       14593   107434687+  fd  Linux raid autodetect

ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx

VP_IDE: IDE controller at PCI slot 0000:00:07.1

VP_IDE: chipset revision 6

VP_IDE: not 100% native mode: will probe irqs later

VP_IDE: VIA vt82c686a (rev 14) IDE UDMA66 controller on pci0000:00:07.1

    ide0: BM-DMA at 0xc000-0xc007, BIOS settings: hda:DMA, hdb:pio

    ide1: BM-DMA at 0xc008-0xc00f, BIOS settings: hdc:DMA, hdd:DMA

Probing IDE interface ide0...

hda: ST340014A, ATA DISK drive

ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

Probing IDE interface ide1...

hdc: WDC WD800JB-00CRA1, ATA DISK drive

hdd: ST3120022A, ATA DISK drive

ide1 at 0x170-0x177,0x376 on irq 15

HPT370: IDE controller at PCI slot 0000:00:0f.0

PCI: Enabling device 0000:00:0f.0 (0005 -> 0007)

PCI: Found IRQ 11 for device 0000:00:0f.0

HPT370: chipset revision 3

HPT37X: using 50MHz internal PLL

HPT370: 100% native mode on irq 11

    ide2: BM-DMA at 0xec00-0xec07, BIOS settings: hde:DMA, hdf:DMA

    ide3: BM-DMA at 0xec08-0xec0f, BIOS settings: hdg:pio, hdh:pio

Probing IDE interface ide2...

hde: ST3120026A, ATA DISK drive

hdf: ST3120026A, ATA DISK drive

ide2 at 0xdc00-0xdc07,0xe002 on irq 11

Probing IDE interface ide3...

Probing IDE interface ide3...

Probing IDE interface ide4...

ide4: Wait for ready failed before probe !

Probing IDE interface ide5...

ide5: Wait for ready failed before probe !

hda: max request size: 1024KiB

hda: 78165360 sectors (40020 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(66)

hda: cache flushes supported

 /dev/ide/host0/bus0/target0/lun0: p1 p2 p3

hdc: max request size: 128KiB

hdc: 156301488 sectors (80026 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(66)

hdc: cache flushes not supported

 /dev/ide/host0/bus1/target0/lun0: p1 p2 < p5 >

hdd: max request size: 1024KiB

hdd: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(66)

hdd: cache flushes supported

 /dev/ide/host0/bus1/target1/lun0: p1 p2

hde: max request size: 1024KiB

hde: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=16383/255/63, UDMA(100)

hde: cache flushes supported

 /dev/ide/host2/bus0/target0/lun0: p1 p2 p3

hdf: max request size: 1024KiB

  hdf: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=16383/255/63, UDMA(100)

hdf: cache flushes supported

 /dev/ide/host2/bus0/target1/lun0: p1 p2 p3

kernel used : win4lin-sources  2.4.26-r3 / 2.4.26-r8 / 2.6.9

fileutils-4.1.11-r1

reiserfsprogs-3.6.18

mdadm-1.6.0

raidtools-1.00.3-r2

the hardisk configuration is changing many times every days since 2 weeks ;-)

```

so here are my questions :

1) with ext3  i enter endless loops,  with reiserfs  --rebuild-tree repairs nothing, i tend to believe that they are not guilty but ... should i fill bug reports somewhere ? 

2) is it possible such an old driver as the one for HPT370 chipset still got bugs?

3) i'd like to try putting hde on a wire (ide2)  and hdf on another (ide3) ... but i've got no free wires at the moment and i do not want to complicate the situation with unpluging the HDD that are in service. but, is it possible that the chipset does not handle correctly simultaneous access to harddrives on the same wire? would this be a useful test?

4) am i so stupid in googling and doc reading that i miss something obvious ? would you see something wrong in all these manipulations? any suggestions?

i really need to be advised, i already lost many hours fdisk-ing and mkfs-ing, despite all the great things i learned  :Smile: 

Thanks for reading such a long  post...

Regards,

YbbY[Last edited by ybby on Mon Nov 29, 2004 1:29 pm; edited 2 times in total

----------

## NeddySeagoon

ybby,

You should not be trying to set up a RAID system with hde and hdf.

They are primary and secondary devices on the same IDE. It should work but the performqce will be poor because the controller cannot overlap  commands and data transfers.

Its probably not your problem though.

----------

## ybby

Thx for the advice NeddySeagoon, but at the moment all this run on an 4 years old computer i use as a server (PIII 500Mhz), performance is not really an issue  :Wink: .

of course, i plan to buy an other wire to do what i said in  point 3). but then, if i set up another RAID array on the 2 secondary devices, and if my hypothesis about simultaneous access to harddrives on the same wire is correct, i will probably fall in the same problem  :Sad: 

anybody think this could be submitted to the kernel devs?

any body successfully using raid1 on hpt370?

----------

## zeek

I think you will have more luck if you use mdadm rather than the raidtools package.  Make sure your partition types are set to 'fd' (linux raid autodetect).  No need for /etc/raidtab or mkraid.

raidtools seems to be one of those packages that was never quite finished...

----------

## ybby

yes, i pasted the output of fdisk -l /dev/hd[ef], and now my partitions are correctly set to fd type  :Wink:  it was my fisrt beginner's mistake but i will never forget this again...

once the problem occured, i did it all once again only with mdadm (Create/Start/etc...) ... it did not change anything.

further more, even if /etc/raidtab is not needed,  /etc/mdadm.conf is needed if you want gentoo init scripts to initialise the RAID at startup ( see "/etc/init.d/checkfs" )

i thought that once the array is started / synchronized, raidtools code was not involved anymore. once the array is created, isn'nt it supposed to run only kernel code? how could raidtools code be the cause of the corruptions? if raidtools are the cause, why does everything remains corrupted even after the dd if=/dev/zero even without RAID in ext2 or reiserfs?

do you think that if i repair all  by pluging again my hdd back to the motherboard contoller, and then create/start the RAID device only with mdadm it would change something ? if yes, what exactly would be the difference ?

sorry for raising all theses new questions but i want to understand a bit more what happens before plugging back my hdds to the mother board controller... it is a painful operation for me, the server is headless, on top of the furnitures of my kitchen, and it is heavy  with all theses hdd's  :Wink: 

----------

## zeek

 *ybby wrote:*   

> once the problem occured, i did it all once again only with mdadm (Create/Start/etc...) ... it did not change anything.
> 
> further more, even if /etc/raidtab is not needed,  /etc/mdadm.conf is needed if you want gentoo init scripts to initialise the RAID at startup ( see "/etc/init.d/checkfs" )
> 
> 

 

Interesting , I have my system on md RAID1 devices and don't use mdadm.conf:

```
sock root # df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/md0               31G   24G  6.8G  78% /

/dev/md1               44G   37G  6.8G  85% /home

none                  506M     0  506M   0% /dev/shm

sock root # egrep -v '^#' /etc/mdadm.conf

sock root #

```

But back to the original problem, I missed the part where you tried it directly on the disks with ext2 and reiserfs and had it fail also.  I think that would eliminate md raid from the potential culprits and leave only the HPT 370 controller, the kernel driver, and the disks.  

HPT370 driver has been around since 2000.

Not a fun problem -- good luck!

----------

## ybby

[EDIT]

i followed the advices and  i ended with :

-using only mdadm

-buying a new ide wire to have one hdd on each controller

the result is everything works perfectly now

my conclusion is that i will never use raidtools again, as they seem to be the cause of my problems.

however , i still got doubt about the HPT370 chipset.  i do not know if i will ever try to plug the 2 others hdd's that could be plugged as secondary slaves of each controller.

Thx for help,

YbbY.

----------

## ybby

oki, i was happy too soon.... 

```

# grep -i ext3 /var/log/syslog

Nov 29 14:12:57 tour1 kernel: EXT3-fs error (device md0): ext3_free_blocks: Freeing blocks not in datazone - block = 1073774078, count = 1

Nov 29 14:12:57 tour1 kernel: ext3_free_branches: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<3>ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Nov 29 14:12:57 tour1 kernel: EXT3-fs error (device md0) in ext3_truncate: Journal has aborted

Nov 29 14:12:57 tour1 kernel: ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Nov 29 14:12:57 tour1 kernel: EXT3-fs error (device md0) in ext3_orphan_del: Journal has aborted

Nov 29 14:12:58 tour1 kernel: ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Nov 29 14:12:58 tour1 kernel: EXT3-fs error (device md0) in ext3_delete_inode: Journal has aborted

Nov 29 14:12:58 tour1 kernel: ext3_abort called.

Nov 29 14:12:58 tour1 kernel: EXT3-fs error (device md0): ext3_journal_start: Detected aborted journal

Nov 29 14:17:38 tour1 kernel: EXT3-fs warning (device md0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure

Nov 29 14:17:38 tour1 kernel: EXT3-fs warning (device md0): ext3_clear_journal_err: Marking fs in need of filesystem check.

Nov 29 14:17:38 tour1 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended

Nov 29 14:17:38 tour1 kernel: EXT3 FS on md0, internal journal

Nov 29 14:17:38 tour1 kernel: EXT3-fs: recovery complete.

Nov 29 14:17:38 tour1 kernel: EXT3-fs: mounted filesystem with ordered data mode.

Nov 29 14:18:11 tour1 kernel: EXT3-fs error (device md0): ext3_free_blocks: Freeing blocks not in datazone - block = 1073774078, count = 1

Nov 29 14:18:11 tour1 kernel: ext3_free_branches: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<3>ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Nov 29 14:18:11 tour1 kernel: EXT3-fs error (device md0) in ext3_truncate: Journal has aborted

Nov 29 14:18:11 tour1 kernel: ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Nov 29 14:18:11 tour1 kernel: EXT3-fs error (device md0) in ext3_orphan_del: Journal has aborted

Nov 29 14:18:11 tour1 kernel: ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted

Nov 29 14:18:11 tour1 kernel: EXT3-fs error (device md0) in ext3_delete_inode: Journal has aborted

Nov 29 14:18:11 tour1 kernel: ext3_abort called.

Nov 29 14:18:11 tour1 kernel: EXT3-fs error (device md0): ext3_journal_start: Detected aborted journal

Nov 29 14:20:38 tour1 kernel: EXT3-fs error (device md0): ext3_check_descriptors: Inode table for group 0 not in group (block 8388618)!

Nov 29 14:20:38 tour1 kernel: EXT3-fs: group descriptors corrupted !

Nov 29 14:21:10 tour1 kernel: EXT3-fs error (device md0): ext3_check_descriptors: Inode table for group 0 not in group (block 8388618)!

Nov 29 14:21:10 tour1 kernel: EXT3-fs: group descriptors corrupted !

```

i wanted RAID to have safer filesystems,  but  i obtain the exact opposit  :Sad: 

----------

## Tuna

ahoi,

interesting problem. i am familiar with that one. i bought myself the same

Adaptec AHA1200A controller trying to run a RAID1. i had the same problems as you had. tried everything from kernel versions and/or patches and ended up the same: the filesystem gets corrupted.

here is what really happens *i think*. the data on the disk is written correctly, but the synchronious reads produce errors. you can test it that way: copy a file large enough so it wont get cached (500mb for example). and make md5sum on the file on the raid several times. you should get a different result each time (if the file doesnt get cached). also if you compare the file (well any broken version) against the original you should see that some bytes are shifted about a constant factor (0x80 or something.. i cant remember). since the errors are at different places each time .. you could theoretically read the file 5 times compare each version and reconstruct the original data from that again. sux of course.

this is what i did. i tried every possible drive combination on the controller and came to the conclusion that the A channel of the card is broken/has errors or are buggy due to the hpt driver. as long at least one drive in the raid array was on the A channel i was in pain. the raid works flawless on the B channel of the controller. so plug them in as hdg and hdh and check if that helps.

i guess this is a crappy produced controller from adaptec.. if you have the chance take it back where you bought it and get a decent controller instead. or maybe adaptec wants to release a non buggy firmware? btw i tried updateing to the latest firmware.. no dice.

----------

## ybby

Thx a lot  Tuna for theses precious informations 

i still do not not know if i will try to take it back where i bought it , or  plug the hdd's on hdg + hdh ...

However, anybody reading this, never buy adaptec !

----------

## Tuna

it is still to be seen if that failure is adaptec's fault. maybe we should email Andre Hedrick, the maintainer of the hpt36x/37x driver to report this as a bug? and while your on it send him your controller too if you dont like it anymore so he can test the hardware  :Wink:  it is also possible that the RAID code is to blame here. at least the controller works fine in windows and with raid too (well i think it was.. but im not too certain i did enough tests). and running the drives in single mode on the A channel did not make any problems too.

----------

