# Software RAID 10 drive utilisation

## parityboy

I've been hunting around on Google trying to get some info on this and it's a bit of a struggle.  Basically my question is this: assuming a 4-drive RAID 10, what will the drive utilisation be for a sequential read? 2x striped drives? 3x? 4?

If only two, why?  Is there any way to get a stacked RAID to use the drives fully? i.e. RAID 1+0, RAID 0+1?

----------

## aidanjt

I assume it'll try to read across all 4 disks at the same time.  For e.g. if we have disk/block arrangement:

```
disks 1 2 3 4

--------------

blocks a b a b

       c d c d
```

When you try to read a to d, it'll read d1ba d2bb d3bc d4bd at the same time.  The only way you'll know for sure is by testing it.  if hdparm -t on an individual disk gives you 40MB/s and the RAID10 array gives you a hdparm -t value of ~160MB/s then you'll know it's working as I've shown.  Your mileage may vary.

----------

## Mad Merlin

For sequential reads, you'll see 2x speedup for a 4 disk RAID 10 set vs a single disk, you can't really do better than that without losing redundancy (RAID 5 or RAID 0, for example). The reason why is that you can't effectively interleave the reads, most of the time would be spent seeking, not reading.

However, for (small) random reads or multiple simultaneous reads, you'll see closer to 4x speedup, that's because you can have different sets of disks servicing different requests simultaneously.

----------

## parityboy

This is something I've never really understood with mirroring systems.  I've heard 

that smart RAID controllers can achieve near RAID 0 speeds with RAID 1 making 

all reads interleaved reads - i.e. RAID 1 writing, RAID 0 reading.

I've often wondered why Linux cannot embody this model.  Why not, with a mirrored system, 

employ an interleaved read mode?  RAID 0 has to do this as a matter of course, so why not 

employ it for mirrored systems?

----------

## Vietor

 *parityboy wrote:*   

> This is something I've never really understood with mirroring systems.  I've heard 
> 
> that smart RAID controllers can achieve near RAID 0 speeds with RAID 1 making 
> 
> all reads interleaved reads - i.e. RAID 1 writing, RAID 0 reading.
> ...

 

I'm pretty sure the answer is that it can.

```

hdparm -t /dev/sda4

/dev/sda4:

 Timing buffered disk reads:  222 MB in  3.03 seconds =  73.37 MB/sec

 hdparm -t /dev/md3 

/dev/md3:

 Timing buffered disk reads:  838 MB in  3.01 seconds = 278.80 MB/sec

cat /proc/mdstat

...

md3 : active raid10 sdd4[3] sdc4[2] sdb4[1] sda4[0]

      272478208 blocks 64K chunks 2 far-copies [4/4] [UUUU]
```

----------

## Mad Merlin

 *Vietor wrote:*   

>  *parityboy wrote:*   This is something I've never really understood with mirroring systems.  I've heard 
> 
> that smart RAID controllers can achieve near RAID 0 speeds with RAID 1 making 
> 
> all reads interleaved reads - i.e. RAID 1 writing, RAID 0 reading.
> ...

 

Could you try the same again with dd, and read at least twice as much as you have RAM (ie, >= 8G on a 4G system)? I'm curious.

```

dd if=/dev/sda4 of=/dev/null bs=1024 count=1024*1024*8

dd if=/dev/md3 of=/dev/null bs=1024 count=1024*1024*8

```

----------

## Vietor

 *Mad Merlin wrote:*   

> 
> 
> Could you try the same again with dd, and read at least twice as much as you have RAM (ie, >= 8G on a 4G system)? I'm curious.
> 
> ```
> ...

 

Results are similar, this is on an active system so none of the runs are completely uninterrupted. System has 4GB of RAM, so as expected the tests do an 8GB read.

```

dd if=/dev/sda4 of=/dev/null bs=1M count=8K

8192+0 records in

8192+0 records out

8589934592 bytes (8.6 GB) copied, 119.785 s, 71.7 MB/s

dd if=/dev/md3 of=/dev/null bs=1M count=8K

8192+0 records in

8192+0 records out

8589934592 bytes (8.6 GB) copied, 35.9709 s, 239 MB/s

```

----------

## parityboy

@ Vietor:

Many, many thanks for your info - it was very helpful.  I noticed you use the "far" data model, perhaps this is the personality mode that gives the scaleable performance.

I still wonder why Linux' RAID 1 personality cannot have this kind of performance, i.e if you had two of your disks in RAID 1, you should get around 140MB/sec with an interleaved read, but the current implementation pegs you at single drive performance.  I know that it will distribute the read between the two drives IF there are two calling processes, but for a single process only one drive at a time is employed.

I'd love to know why, since the RAID1 personality must assume that both drives have consistent data?

----------

## Mad Merlin

 *Vietor wrote:*   

>  *Mad Merlin wrote:*   
> 
> Could you try the same again with dd, and read at least twice as much as you have RAM (ie, >= 8G on a 4G system)? I'm curious.
> 
> ```
> ...

 

Hmm... interesting. I'll have to give that a whirl myself when I get the chance. I assumed RAID 10 performance based on the performance of RAID 1 and RAID 0.

----------

## jormartr

The day I created my home raid10 with 4 discs, I made some tests.

a. destroy old raid, create raid again and create a gigabytes file (testing write time)

b. reboot

c. tested read time

a, b, c .....

this was done with different raid modes (o2, n2, f2) ... and f2 gave me almost 4x the read speed of a single drive.

The test was just as simple as creating a md device on empty discs, dd some gigabytes from zero to a file on the empty partition, and measure the time to write and read it, nothing else.

I didn't get 400% read speed, but something like 360% - 380% the speed of a single drive.

----------

## Vietor

To add some more data points to the discussion, here are some bonnie++ runs for a 4 disk (WD5002ABYS) RAID-10 array. This is on a different system that the earlier numbers I posted.

```

64K stripe, near

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

zbox-dce-netboo 12G  1136  97 157192  27 89282  14  2102  96 202628  20 571.0  16

Latency              8769us    3010ms     147ms   56118us   47698us     637ms

Version 1.93c       ------Sequential Create------ --------Random Create--------

zbox-dce-netboot    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

                 16 13692  57 +++++ +++ 12338  43 14108  64 +++++ +++ 11419  41

Latency             28226us      97us   24458us   30941us      73us   65092us

64K stripe, far

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

zbox-dce-netboo 12G  1145  97 141549  24 89353  16  2096  99 406629  41 554.0  16

Latency              8833us    3518ms     187ms   13548us   34003us   80105us

Version 1.93c       ------Sequential Create------ --------Random Create--------

zbox-dce-netboot    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

                 16 14282  63 +++++ +++ 10322  41 13200  57 +++++ +++  9703  37

Latency             25408us      78us   44247us   39839us      75us   49854us

64K stripe, offset

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

zbox-dce-netboo 12G  1150  97 156791  27 90112  15  2167  99 199985  19 548.2  15

Latency              8429us    3163ms     226ms   10403us   44001us   66004us

Version 1.93c       ------Sequential Create------ --------Random Create--------

zbox-dce-netboot    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

                 16 12002  56 +++++ +++  9908  36 10472  52 +++++ +++  7296  27

Latency             80707us     111us   57962us   72803us      84us     110ms

```

These are all single worker tests. If I have a chance to take some multi-worker tests on the same hardware, and a set of tests on a single drive from the array, I'll add them to the thread.

----------

## linuxtuxhellsinki

 *Vietor wrote:*   

> To add some more data points to the discussion, here are some bonnie++ runs for a 4 disk (WD5002ABYS) RAID-10 array. This is on a different system that the earlier numbers I posted.

 

Are you using AHCI/NCQ with those WD-drives ?  If yes, then do you've this kind of errors in log ?

```
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
```

I'm trying to find out that is the NCQ implementation bad in these WD5002ABYS drives or in my supermicro mobo, cause I'm having this error-message when the system has high I/O-load and only way to get rid of these errors is to disable NCQ.

----------

## Vietor

 *linuxtuxhellsinki wrote:*   

> 
> 
> Are you using AHCI/NCQ with those WD-drives ?  If yes, then do you've this kind of errors in log ?
> 
> ```
> ...

 

I'm not seeing any errors like that.

System is using a Tyan B7002 Motherboard (Basically a S7002), with an "Intel 82801JI (ICH10 Family) SATA AHCI Controller". Disks are configured in AHCI mode.

Are you seeing an issue with the 500GB WD RE3 drives specifically? Any WD RE3 drives? I've run many many WD RE(1,2,3) drives on a Tyan B7002 and Tyan S2915, as well as many VIA Mini-ITX boards, and some Asus Desktop boards, and seen nothing like what you report.

----------

## linuxtuxhellsinki

 *Vietor wrote:*   

> Are you seeing an issue with the 500GB WD RE3 drives specifically? Any WD RE3 drives? I've run many many WD RE(1,2,3) drives on a Tyan B7002 and Tyan S2915, as well as many VIA Mini-ITX boards, and some Asus Desktop boards, and seen nothing like what you report.

 

Yes, mostly with RE2 drives WD5000ABYS/WD5001ABYS in some "older" SuperMicro servers with "Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01)" but this doesn't happen with all of 'em, so maybe I've to search the differences from mobo's firmwares or sth.

----------

