# RAID0 has lower perf than seperated disk: why is / slower ?

## doublehp

HDA and HDC are primary and secondary masters, IDE chains on motherboard. There are no slave.

The disks are identical (consecutive serial numbers), and partitionned with symetry.

```
moon-gen-3 ~ # fdisk -l /dev/hda

Disk /dev/hda: 160.0 GB, 160041885696 bytes

255 heads, 63 sectors/track, 19457 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0xa430a430

   Device Boot      Start         End      Blocks   Id  System

/dev/hda1   *           1         123      987966   83  Linux

/dev/hda2             124        1583    11727450    7  HPFS/NTFS

/dev/hda3            1584        1827     1959930   fd  Linux raid autodetect

/dev/hda4            1828       19458   141615526+   5  Extended

Partition 4 does not end on cylinder boundary.

/dev/hda5            1828        3044     9775521   83  Linux

/dev/hda6            3045        3076      257008+  83  Linux

/dev/hda7            3077        5000    15454498+  fd  Linux raid autodetect

/dev/hda8            5001       11000    48194968+  fd  Linux raid autodetect

/dev/hda9           11001       19457    67930821   fd  Linux raid autodetect

moon-gen-3 ~ # fdisk -l /dev/hdc

Disk /dev/hdc: 160.0 GB, 160041885696 bytes

255 heads, 63 sectors/track, 19457 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x667edcf1

   Device Boot      Start         End      Blocks   Id  System

/dev/hdc1               1         123      987966   83  Linux

/dev/hdc2             124        1583    11727450    b  W95 FAT32

/dev/hdc3            1584        1827     1959930   fd  Linux raid autodetect

/dev/hdc4            1828       19457   141612975    5  Extended

/dev/hdc5            1828        2801     7823623+  a5  FreeBSD

/dev/hdc6            2802        3076     2208906   82  Linux swap / Solaris

/dev/hdc7            3077        5000    15454498+  fd  Linux raid autodetect

/dev/hdc8            5001       11000    48194968+  fd  Linux raid autodetect

/dev/hdc9           11001       19457    67930821   fd  Linux raid autodetect

moon-gen-3 ~ #

```

(yes i did a small mistake in hda4; the problem is on the end of the partition, so, I REALLY can't see how it could affect perfs of contained secondary)

hda7+hdc7 => md7

hda8+hdc8 => md8

```
moon-gen-3 ~ # cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]

md3 : active raid0 hdc3[1] hda3[0]

      3919616 blocks 64k chunks

md8 : active raid0 hdc8[1] hda8[0]

      96389760 blocks 64k chunks

md29 : active raid0 hdc9[1] hda9[0]

      135861504 blocks 64k chunks

md7 : active raid0 hdc7[1] hda7[0]

      30908800 blocks 64k chunks

unused devices: <none>

moon-gen-3 ~ #

```

md7 and md8 are formatted in ext3; they seem to not have identical features, md7 is my / and md8 is my Home:

```
moon-gen-3 ~ # tune2fs -l /dev/md7 >/tmp/md7

moon-gen-3 ~ # tune2fs -l /dev/md8 >/tmp/md8

moon-gen-3 ~ # diff /tmp/md7 /tmp/md8

2c2

< Filesystem volume name:   <none>

---

> Filesystem volume name:   Home

4c4

< Filesystem UUID:          f1d3eba4-e57b-4b1d-9e3b-852ef1944675

---

> Filesystem UUID:          282573c3-4e31-45e9-904a-371af6c2b52b

7c7

< Filesystem features:      has_journal resize_inode dir_index filetype sparse_super large_file

---

> Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file

13,17c13,17

< Inode count:              3866624

< Block count:              7727200

< Reserved block count:     100000

< Free blocks:              2739794

< Free inodes:              3288918

---

> Inode count:              6029312

> Block count:              24097440

> Reserved block count:     0

> Free blocks:              8998872

> Free inodes:              5759015

21c21

< Reserved GDT blocks:      1022

---

> Reserved GDT blocks:      1018

24c24

< Inodes per group:         16384

---

> Inodes per group:         8192

26,31c26,31

< Filesystem created:       Sat Feb 23 01:59:27 2008

< Last mount time:          Wed Jun 10 09:45:32 2009

< Last write time:          Mon Sep 21 16:44:34 2009

< Mount count:              0

< Maximum mount count:      222

< Last checked:             Mon Sep 21 16:44:34 2009

---

> Filesystem created:       Wed Aug 19 17:39:47 2009

> Last mount time:          Mon Sep 21 11:21:49 2009

> Last write time:          Mon Sep 21 11:21:49 2009

> Mount count:              53

> Maximum mount count:      190

> Last checked:             Wed Aug 19 17:39:47 2009

33c33

< Next check after:         Sat Mar 20 15:44:34 2010

---

> Next check after:         Mon Feb 15 16:39:47 2010

37c37,39

< Inode size:             128

---

> Inode size:             256

> Required extra isize:     28

> Desired extra isize:      28

39,40c41,43

< Default directory hash:   tea

< Directory Hash Seed:      a387c15c-a23e-4d72-8a09-a3a4e031fd0f

---

> First orphan inode:       3686433

> Default directory hash:   half_md4

> Directory Hash Seed:      3c1821f5-2f06-48a8-b917-287ef3afa94f

moon-gen-3 ~ #

```

Speed of hda alone:

```
moon-gen-3 ~ # hdparm -t /dev/hda

/dev/hda:

 Timing buffered disk reads:  162 MB in  3.01 seconds =  53.88 MB/sec

moon-gen-3 ~ #

```

speed of hdc alone: 54M. speed of hda and hdc when running the test at the same time on both: 32M each.

Speed of hda7 hda8 hdc7 and hdc8 individually: 44M

Speed of hda7 hdc7 at the same time: 31M.

Speed of md7: 48M; md8: 52M

File writing speed (tmp is in / = md7, home =md8)

```
moon-gen-3 ~ # dd if=/dev/zero of=/tmp/plop bs=1M count=1000

1000+0 records in

1000+0 records out

1048576000 bytes (1.0 GB) copied, 35.058 s, 29.9 MB/s

moon-gen-3 ~ # dd if=/dev/zero of=/home/plop bs=1M count=1000

1000+0 records in

1000+0 records out

1048576000 bytes (1.0 GB) copied, 14.1813 s, 73.9 MB/s

moon-gen-3 ~ # dd if=/dev/zero of=/tmp/plop bs=1M count=1000

1000+0 records in

1000+0 records out

1048576000 bytes (1.0 GB) copied, 30.3401 s, 34.6 MB/s

moon-gen-3 ~ # dd if=/dev/zero of=/home/plop bs=1M count=1000

1000+0 records in

1000+0 records out

1048576000 bytes (1.0 GB) copied, 15.9537 s, 65.7 MB/s

moon-gen-3 ~ #

```

md7 is in practice as slow as a single drive, while md8 works as expected for raid0

Why is md7 slower than md8 ? is it because the features are not exactly the same ? I cant see how the dir hash algo could affect write speed of a single file ... 

because md7 is my / ? if so,  why exactly is the / always slower ? because it's / or because of one particular subdir ?

md8 is physically deeper on the plates, so, if one partition should be a bit slower, md8 should be slower than md7, by less than 5%; I can't see any reason why md8 is nearly twice faster than md7.

MD7 is as fast as a single drive, while md8 really gains benefits of the RAID0. Why dont md7 gain any benefit ?

I have also always found that the drive holding / was slower after system boot, than when testing the same drive from a live CD. And the / partition in particular always been significantly slower than the rest of the drive.

I am about to renew my drives, and I need to understand the cause of this problem so that I can partition the new drives a better efficient way  :Smile: 

Thanks.

----------

## HeissFuss

Do you see the same pattern with hdparm on the md devices?  What are your mount options?

----------

## doublehp

 *HeissFuss wrote:*   

> Do you see the same pattern with hdparm on the md devices?

 

I do not understand your question.

Hdparm is not always reliable. hdparm on RAID1 devices always, ALWAYS returned me the speed on ONE drive (one or the other alternatively; this is especially visible when drives are not identical, or if you have an eye on HDD LEDs), while dd on a file always returns a value coherent with what's really happens in daily life.

 *HeissFuss wrote:*   

> What are your mount options?

 

```
moon-gen-3 ~ # cat /etc/mtab

/dev/md7 / ext3 rw,noatime 0 0

proc /proc proc rw,nosuid,nodev,noexec 0 0

sysfs /sys sysfs rw,nosuid,nodev,noexec 0 0

udev /dev tmpfs rw,nosuid,size=10240k,mode=755 0 0

devpts /dev/pts devpts rw,nosuid,noexec,gid=5,mode=620 0 0

/proc /mnt/debian/proc none rw,bind 0 0

shm /dev/shm tmpfs rw,noexec,nosuid,nodev 0 0

/dev/md8 /home ext3 rw 0 0

/dev/hda2 /mnt/NTFS_winXP fuseblk rw,allow_other,default_permissions,blksize=4096 0 0

/dev/hdc1 /mnt/doublehp ext3 rw 0 0

/dev/hdc2 /mnt/FAT32_12G vfat rw,uid=1000,utf8 0 0

/dev/hda5 /mnt/debian ext3 rw 0 0

/home /mnt/debian/home none rw,bind 0 0

/dev/sda1 /mnt/DHP_1T ext3 rw,nosuid 0 0

usbfs /proc/bus/usb usbfs rw,noexec,nosuid,devmode=0664,devgid=85 0 0

binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,noexec,nosuid,nodev 0 0

nfsd /proc/fs/nfsd nfsd rw,noexec,nosuid,nodev 0 0

moon-gen-3 ~ # mount

/dev/md7 on / type ext3 (rw,noatime)

proc on /proc type proc (rw,nosuid,nodev,noexec)

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)

udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)

devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)

/proc on /mnt/debian/proc type none (rw,bind)

shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)

/dev/md8 on /home type ext3 (rw)

/dev/hda2 on /mnt/NTFS_winXP type fuseblk (rw,allow_other,default_permissions,blksize=4096)

/dev/hdc1 on /mnt/doublehp type ext3 (rw)

/dev/hdc2 on /mnt/FAT32_12G type vfat (rw,uid=1000,utf8)

/dev/hda5 on /mnt/debian type ext3 (rw)

/home on /mnt/debian/home type none (rw,bind)

/dev/sda1 on /mnt/DHP_1T type ext3 (rw,nosuid)

usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)

binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)

nfsd on /proc/fs/nfsd type nfsd (rw,noexec,nosuid,nodev)

moon-gen-3 ~ #

moon-gen-3 ~ # cat /etc/fstab | grep md

/dev/md7                /               ext3            noatime         0 1

/dev/md3                none            swap            sw              0 0

/dev/md8                /home           ext3            defaults                0 2

moon-gen-3 ~ #
```

I see a funny test to do  :Smile: 

```
moon-gen-3 ~ # swapoff /dev/md3

moon-gen-3 ~ # dd if=/dev/zero of=/dev/md3 bs=1M count=1000

1000+0 records in

1000+0 records out

1048576000 bytes (1.0 GB) copied, 12.1899 s, 86.0 MB/s

moon-gen-3 ~ # dd if=/dev/zero of=/dev/md3 bs=1M count=3000

3000+0 records in

3000+0 records out

3145728000 bytes (3.1 GB) copied, 31.3817 s, 100 MB/s

moon-gen-3 ~ # dd if=/dev/zero of=/dev/md3 bs=1M count=1000

1000+0 records in

1000+0 records out

1048576000 bytes (1.0 GB) copied, 10.7487 s, 97.6 MB/s

moon-gen-3 ~ # dd if=/dev/zero of=/dev/md3 bs=1M count=3000

3000+0 records in

3000+0 records out

3145728000 bytes (3.1 GB) copied, 32.4331 s, 97.0 MB/s

moon-gen-3 ~ #
```

THAT is good perf for a nice swap  :Smile: 

Note that the following syntax is (nearly) equivalent: dd if=/dev/zero of=/dev/md3 bs=1M count=3k

Raw MD is faster than file system; I am not surprised of that; but I am surprised to loose 26% (compare dd on md3 and on /home/plop ); I know I should consider the physical location of the partition; but hda8-hdc8 do not run after half of the plate, so density should not have impact (any way, md8 is FURTHER than md7); density impact should not be visible before 80% (run a badblocks on a full drive, with a timer in the hand, and "vmstat 1" in a console, and you will see what I mean).

----------

## Roman_Gruber

It seems you are using hardware raid. But most of the hardware raid controllers are fake raid or do you have a seperate raid controller card?

I also have fake raid in my box and I have set up lvm. so, why do you use hardware raid and not lvm? lvm gave me enormous performance boost, but my discs have encryption with luks too!! The performance with luks and lvm is bigger then with single disc io.

----------

## doublehp

 *tw04l124 wrote:*   

> It seems you are using hardware raid. But most of the hardware raid controllers are fake raid or do you have a seperate raid controller card?

 

...

 *doublehp wrote:*   

> 
> 
> ```
> moon-gen-3 ~ # cat /proc/mdstat
> 
> ...

 

... seems hardware to you ?

 *tw04l124 wrote:*   

> I also have fake raid in my box and I have set up lvm. so, why do you use hardware raid and not lvm? lvm gave me enormous performance boost, but my discs have encryption with luks too!! The performance with luks and lvm is bigger then with single disc io.

 

I may go for LVM next time; I just never felt confortable with it, and never dared trying. Fdisk+mdadm was enough to me; I fear to not understand properly LVM; I tried once, and gave up. LVM would not fix the problems I have (moving/shifting a partition without unmounting) and would introduce new problems (Windows and BSD can not be installed in LVM). I am just happy with fdisk.

----------

## HeissFuss

You could try remounting with data=writeback option and see if you see improvements.  ext3 is reliable, but its performance is not typically the best.  If you have a filesystem you don't mind wiping out, you could do a comparison of ext3, ext4 and xfs.  An extent based filesystem using delayed allocation should result in better write performance for large files, and should also yield more throughput reading back large files, especially if they were written or modified over a longer period of time (fragmentation on ext3 is much more of an issue than xfs or ext4.)

I can't explain why one ext3 device would be slower than the other though.  That's kind of a mystery unless / has seen a lot of writes+deletes already (once again, fragmentation on ext3 is an issue.)

If you're going to test XFS, make sure to mount with logbufs=8.

As for LVM2 striping (LVM default is concatenation), I had very poor performance with it the last time I tested it compared to md raid0.  YMMV.

----------

## doublehp

I am not asking why the rate is slower on md7 than md3, but md7 than md8. Both are ext3. Even when not in RAID, I have always seen that the / partition is slower than other ones. And in my box ATM, it's 50% slower.

----------

## HeissFuss

The only thing I can think of is fragmentation.  If you've been running the system for a while, deleting and creating files will make a lot of holes which will get filled up by segments of a large file.  The / partition will see a lot more of that kind of activity due to /tmp, /var and /usr/portage being located there vs the more static activity of /home.

A simple check is to run 'filefrag -v' on your dd created files and compare the one on md7 vs the one on md8.  If md7 looks much worse than md8, fragmentation is probably the issue.  Short term you can try 'shake' on that filesystem to try to reduce fragmentation and see if speed improves.  Long term you may want to look at different filesystems as I mentioned earlier.

----------

## doublehp

Ok, you may be right:

```
moon-gen-3 ~ # dd if=/dev/zero of=/tmp/plop bs=1M count=1k

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 31.6017 s, 34.0 MB/s

moon-gen-3 ~ # dd if=/dev/zero of=/home/plop bs=1M count=1k

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 16.2332 s, 66.1 MB/s

moon-gen-3 ~ # +

-su: +: command not found

moon-gen-3 ~ # filefrag -v

Usage: filefrag [-v] file ...

moon-gen-3 ~ # filefrag -v /tmp/iso/grml/grml64_2008.11.iso | wc

   7692   61513  423976

moon-gen-3 ~ # filefrag -v /home/plop | wc

    108     841    5729

moon-gen-3 ~ # filefrag -v /tmp/plop | wc

   8511   68065  475810

moon-gen-3 ~ # dd if=/dev/zero of=/tmp/plop2 bs=1M count=1k

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 32.0217 s, 33.5 MB/s

moon-gen-3 ~ # dd if=/dev/zero of=/tmp/plop3 bs=1M count=1k

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 26.0038 s, 41.3 MB/s

moon-gen-3 ~ # filefrag -v /tmp/plop | wc

   8511   68065  475810

moon-gen-3 ~ # filefrag -v /tmp/plop2 | wc

   7643   61121  414556

moon-gen-3 ~ # filefrag -v /tmp/plop3 | wc

   5367   42913  285076

moon-gen-3 ~ #

```

I will do two tests tomorrow: make the same test from a live CD, just to see whether md7 being a live mounted / is important (or not), and then ... defrag md7

BTW: how do I defrag ?

----------

## HeissFuss

You can try sys-fs/shake to somewhat defrag.  It won't defrag files that are currently being written, and it may only help with existing files (new files may still be fragmented when created.)  It doesn't hurt to try shake though before something more drastic.  The best option would be to tar up the filesystem's contents, reformat it and then restore fresh.  If you're doing that though, you may as well switch to a different filesystem that's less susceptible to fragmentation, such as xfs or ext4.  xfs even has an online defrag option with the standard tools.  I'm not aware if ext4 has one or not.

----------

## doublehp

After running shake on a file, the fragmentation count was ... the same.

I did not understand how to shake a directory. IMHO, I should shake the full disk in order to optimise free space.

----------

## HeissFuss

Shake operates by making a new copy of the file that you're trying to defragment.  If your filesystem doesn't have enough free space for that copy, it can't do anything with it (similar to lower level filesystem defragmenters.)

Unfortunately shake is a user space defragmenter, so it won't be able to do a very complete job even when it does work.  Also, I suspect it will have trouble if your freespace is heavily fragmented or if your filesystem is nearly full.

----------

## doublehp

21G used in a 30G volume is not what I call "full". I created a 1G file with dd.

Any lower level defrag ?

IRC people told me to copy the whole partition on an other disk, and resrtaure. I may do that soon. I still don't think it will really make a difference. I am sad that ext3 performance depends on the free space ...

----------

## HeissFuss

ext2 and xfs have defragmentors.  There's rumors of ext4 possibly getting one also, but other than that, I don't know of any other Linux filesystems with them.

If you're going to restore the filesystem, you should seriously consider moving off of ext3 entirely if possible.

----------

## doublehp

 *HeissFuss wrote:*   

> ext2 and xfs have defragmentors.  There's rumors of ext4 possibly getting one also, but other than that, I don't know of any other Linux filesystems with them.
> 
> If you're going to restore the filesystem, you should seriously consider moving off of ext3 entirely if possible.

 

Can you expand ?

Why would ext2 have a degrag and not ext3 ? how is it called ?

Are you telling me to remove ext3 forever ?

I thought about booting CD, then:

$ mv /mnt/gentoo/* /mnt/big_disk/

$ mv /mnt/big_disk/* /mnt/gentoo

It may break hard links if any, but it should not be a big trouble.

Are you telling me to remove ext3, and move to an alternative FS ? no way to try Reisor. Heard about too many problems. I stopped using XFS because I was loosing files and data on each power failure. I now have UPS, but XFS may still cause file and data loss on kernel panic or freese. And I still have many freeses (several per month). ext3 remains the most reliable FS.

I am not ready to try ext4 yet. I can not use it before it's supported by Debian/stable (I never really boot any CD; I only boot on network images, or local rescue systems; and when Gentoo breaks, I have two local debian to help; so, to install or repair Gentoo, I need Debian to be able to write on Gentoo's partition: can't use ext4 before Debian has it in stable: linux + mkfs + fsck properly packaged).

BTW: are there some options or features to pass to mkfs when dealing with RAID volumes ?

----------

## HeissFuss

From wikipedia ext3 page:

There is no online ext3 defragmentation tool that works on the filesystem level. An offline ext2 defragmenter, e2defrag, exists but requires that the ext3 filesystem be converted back to ext2 first. But depending on the feature bits turned on in the filesystem, e2defrag may destroy data; it does not know how to treat many of the newer ext3 features.

You should also check out e2freefrag (man e2freefrag) to check freespace fragmentation.

I see that moving off of ext3 is not an option for you at this point.  I agree that reiserfs is not a good alternative since it seems to perform even worse than ext3 when mostly full/fragmented.  I'm not sure why you've seen issues with XFS unless you ran it quite a while ago.  There have been fixes related to data consistency in kernels 2.6.18 and 2.6.22.  I've been running it for years and haven't had any issues after power loss/lockups.  I have had data eaten by reiserfs though.

If you're going to reformat, you should do the following:

tar -cf /mnt/big_disk/gentoo.tar /mnt/gentoo

umount and reformat partition, then mount back to /mnt/gentoo

tar -xf /mnt/big_disk/gentoo.tar -C /

tar should work better than mv.

As far as RAID specific format options for ext3, I'm not sure if there are any.  I know that ext4 allows to set stride and stripe-width, but I don't think ext3 has those options.  In the long run, those probably don't matter anyway.  The biggest thing is to make sure that your partition is aligned with the raid stripe since that can actually have a performance impact.  I suggest you google it.

----------

## doublehp

Would reformat help better than ... not doing it ?

why is tar better than mv ? 

Yes I tried XFS long time ago: 2002-2005. I may try it for my /gentoo ; my / is just a system; if I crash my /, I can reinstall; boring but acceptable; I would not take this risk for user data and backup drives before years  :Smile:  /gentoo is ok.

 *Quote:*   

> make sure that your partition is aligned with the raid stripe

 

Not sure what you mean. I just make sure that partitions start at the same sector (plus identical size on twin disks ... ); I only use RAID 0 and 1. No way to use JBOD. I may try RAID 5 or 6; not sure yet. Will depend on off-topic factors.

I won't bore with defrag if mv/tar can give better and faster results  :Smile: 

ext4@debian : in fact they seem to support it since a long time (testing):

```
moon-gen-3:/lib/modules# find | grep ext4

./2.6.24-1-686/kernel/fs/ext4

./2.6.24-1-686/kernel/fs/ext4/ext4dev.ko

./2.6.30-2-686/kernel/fs/ext4

./2.6.30-2-686/kernel/fs/ext4/ext4.ko

./2.6.22-3-686/kernel/fs/ext4

./2.6.22-3-686/kernel/fs/ext4/ext4dev.ko

moon-gen-3:/lib/modules# ls -lha `which mkfs.ext4`

-rwxr-xr-x 5 root root 48K 2008-10-13 05:33 /sbin/mkfs.ext4

moon-gen-3:/lib/modules#

```

 :Razz: 

----------

## HeissFuss

Reformat is cleaner than just deleting everything.  At that point, what's the harm in recreating the filesystem fresh since there's nothing on it anyway?

I tend to use tar for this type of things for a couple of reasons.  First, it's faster to restore from a tar.  Second, you can move it around, or create/extract it through an ssh pipe.  If mv works for you though, go ahead.

If you're not using LVM, forget what I said about stripe alignment.  If you're going to use raid-5 though, the ext4 stride-width option is something you should look at.

Good luck.

----------

