# [REOPEN] SATA on ICH10 poor performance

## flipy

Hi,

Comparing two machines, one with Red Hat and the other with Gentoo, there is a difference in SATA performance for cached reads.

One gets >3500 MB/sec and the other only 1800 MB/sec.

Both have the same configuration (Intel Core 2 Quad Q8400, ICH10, Q43).

On the Gentoo machine only the AHCI kernel option is selected as built-in (IDE/ATA is not selected).

hdparm -tT /dev/sda

```
/dev/sda:

 Timing cached reads:   3586 MB in  2.00 seconds = 1793.92 MB/sec

 Timing buffered disk reads:  384 MB in  3.01 seconds = 127.60 MB/sec
```

hdparm -I /dev/sda

```
/dev/sda:

ATA device, with non-removable media

   Model Number:       ST3500418AS                             

   Serial Number:      5VM89C97            

   Firmware Revision:  HP34    

   Transport:          Serial

Standards:

   Used: unknown (minor revision code 0x0029) 

   Supported: 8 7 6 5 

   Likely used: 8

Configuration:

   Logical      max   current

   cylinders   16383   16383

   heads      16   16

   sectors/track   63   63

   --

   CHS current addressable sectors:   16514064

   LBA    user addressable sectors:  268435455

   LBA48  user addressable sectors:  976773168

   Logical/Physical Sector size:           512 bytes

   device size with M = 1024*1024:      476940 MBytes

   device size with M = 1000*1000:      500107 MBytes (500 GB)

   cache/buffer size  = 16384 KBytes

   Nominal Media Rotation Rate: 7200

Capabilities:

   LBA, IORDY(can be disabled)

   Queue depth: 32

   Standby timer values: spec'd by Standard, no device specific minimum

   R/W multiple sector transfer: Max = 16   Current = 16

   Recommended acoustic management value: 208, current value: 0

   DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 

        Cycle time: min=120ns recommended=120ns

   PIO: pio0 pio1 pio2 pio3 pio4 

        Cycle time: no flow control=120ns  IORDY flow control=120ns

Commands/features:

   Enabled   Supported:

      *   SMART feature set

          Security Mode feature set

      *   Power Management feature set

      *   Write cache

      *   Look-ahead

      *   WRITE_BUFFER command

      *   READ_BUFFER command

      *   DOWNLOAD_MICROCODE

      *   48-bit Address feature set

      *   Device Configuration Overlay feature set

      *   Mandatory FLUSH_CACHE

      *   FLUSH_CACHE_EXT

      *   SMART error logging

      *   SMART self-test

      *   General Purpose Logging feature set

      *   WRITE_{DMA|MULTIPLE}_FUA_EXT

      *   64-bit World wide name

          Write-Read-Verify feature set

      *   WRITE_UNCORRECTABLE_EXT command

      *   {READ,WRITE}_DMA_EXT_GPL commands

      *   Segmented DOWNLOAD_MICROCODE

      *   Gen1 signaling speed (1.5Gb/s)

      *   Gen2 signaling speed (3.0Gb/s)

      *   Native Command Queueing (NCQ)

      *   Phy event counters

          Device-initiated interface power management

      *   Software settings preservation

      *   SMART Command Transport (SCT) feature set

      *   SCT Long Sector Access (AC1)

      *   SCT LBA Segment Access (AC2)

      *   SCT Error Recovery Control (AC3)

      *   SCT Features Control (AC4)

      *   SCT Data Tables (AC5)

          unknown 206[12] (vendor specific)

Security: 

   Master password revision code = 65534

      supported

   not   enabled

   not   locked

      frozen

   not   expired: security count

      supported: enhanced erase

   80min for SECURITY ERASE UNIT. 80min for ENHANCED SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 5000c50021c41f16

   NAA      : 5

   IEEE OUI   : 000c50

   Unique ID   : 021c41f16

Checksum: correct
```

sdparm --all /dev/sda

```
/dev/sda: ATA       ST3500418AS       HP34

Read write error recovery mode page:

  AWRE        1

  ARRE        0

  TB          0

  RC          0

  EER         0

  PER         0

  DTE         0

  DCR         0

  RRC         0

  COR_S       0

  HOC         0

  DSOC        0

  WRC         0

  RTL         0

Caching (SBC) mode page:

  IC          0

  ABPF        0

  CAP         0

  DISC        0

  SIZE        0

  WCE         1

  MF          0

  RCD         0

  DRRP        0

  WRP         0

  DPTL        0

  MIPF        0

  MAPF        0

  MAPFC       0

  FSW         0

  LBCSS       0

  DRA         0

  NV_DIS      0

  NCS         0

  CSS         0

Control mode page:

  TST         0

  TMF_ONLY    0

  D_SENSE     0

  GLTSD       1

  RLEC        0

  QAM         0

  QERR        0

  RAC         0

  UA_INTLCK   0

  SWP         0

  ATO         0

  TAS         0

  AUTOLOAD    0

  BTP        -1

  ESTCT      30
```

lscpi

```
00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)

00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)

00:02.1 Display controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)

00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI Controller (rev 03)

00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller (rev 03)

00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT Controller (rev 03)

00:19.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network Connection (rev 02)

00:1a.0 USB Controller: Intel Corporation 82801JD/DO (ICH10 Family) USB UHCI Controller #4 (rev 02)

00:1a.1 USB Controller: Intel Corporation 82801JD/DO (ICH10 Family) USB UHCI Controller #5 (rev 02)

00:1a.2 USB Controller: Intel Corporation 82801JD/DO (ICH10 Family) USB UHCI Controller #6 (rev 02)

00:1a.7 USB Controller: Intel Corporation 82801JD/DO (ICH10 Family) USB2 EHCI Controller #2 (rev 02)

00:1b.0 Audio device: Intel Corporation 82801JD/DO (ICH10 Family) HD Audio Controller (rev 02)

00:1c.0 PCI bridge: Intel Corporation 82801JD/DO (ICH10 Family) PCI Express Port 1 (rev 02)

00:1c.1 PCI bridge: Intel Corporation 82801JD/DO (ICH10 Family) PCI Express Port 2 (rev 02)

00:1d.0 USB Controller: Intel Corporation 82801JD/DO (ICH10 Family) USB UHCI Controller #1 (rev 02)

00:1d.1 USB Controller: Intel Corporation 82801JD/DO (ICH10 Family) USB UHCI Controller #2 (rev 02)

00:1d.2 USB Controller: Intel Corporation 82801JD/DO (ICH10 Family) USB UHCI Controller #3 (rev 02)

00:1d.7 USB Controller: Intel Corporation 82801JD/DO (ICH10 Family) USB2 EHCI Controller #1 (rev 02)

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a2)

00:1f.0 ISA bridge: Intel Corporation 82801JD (ICH10D) LPC Interface Controller (rev 02)

00:1f.2 SATA controller: Intel Corporation 82801JD/DO (ICH10 Family) SATA AHCI Controller (rev 02)
```

Kernel config http://pastebin.com/4HU26Rd3

Any hints?

Thanks!Last edited by flipy on Thu Sep 30, 2010 11:36 am; edited 2 times in total

----------

## d2_racing

What kernel version are you using on both box ?

----------

## flipy

On the Gentoo machine:

```
Linux AS5 2.6.32-gentoo-r7 #4 SMP Thu Jun 10 09:47:59 CEST 2010 x86_64 Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz GenuineIntel GNU/Linux
```

```
/dev/sda:

 Timing cached reads:   3588 MB in  2.00 seconds = 1795.00 MB/sec

 Timing buffered disk reads:  376 MB in  3.01 seconds = 124.74 MB/sec
```

On the Red Hat machine: 

```
Linux dre027 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:22:18 EDT 2010 i686 i686 i386 GNU/Linux
```

```
/dev/sda:

 Timing cached reads:   6956 MB in  2.00 seconds = 3480.58 MB/sec

 Timing buffered disk reads:  330 MB in  3.01 seconds = 109.80 MB/sec
```

----------

## dmpogo

 *flipy wrote:*   

> On the Gentoo machine:
> 
> ```
> Linux AS5 2.6.32-gentoo-r7 #4 SMP Thu Jun 10 09:47:59 CEST 2010 x86_64 Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz GenuineIntel GNU/Linux
> ```
> ...

 

BTW buffered disk reads is what is more important

----------

## krinn

it test cache performance, so memory and not sata perf or the disk.

and then i suppose accessing memory in a 32bits arch can beat a 64bits for speed when using few memory.

on another note, your gentoo perform better with the disk than the fedora and your title is way incorrect as 124mb/s is not what you should call a "poor" performance as many users could confirm you.

----------

## Zebbeman

I had similar thoughts - is this the speed that in theory should be 300 MB/sec on a sata 2 drive?

```
Timing buffered disk reads:  318 MB in  3.01 seconds = 105.78 MB/sec
```

----------

## NeddySeagoon

flipy,

What versions of hdparm are you using.  There was an error of a factor of 2 in the output, which gentoo fixed a year or so ago.

Maybe thats the cause of your problem ... an old hdparm on the Red Hat box.

----------

## flipy

 *krinn wrote:*   

> it test cache performance, so memory and not sata perf or the disk.
> 
> and then i suppose accessing memory in a 32bits arch can beat a 64bits for speed when using few memory.
> 
> on another note, your gentoo perform better with the disk than the fedora and your title is way incorrect as 124mb/s is not what you should call a "poor" performance as many users could confirm you.

 

Yes, I know it test cache performance, but is still odd that an old x68_32 kernel makes a better performance than a brand new one.

And is by far what is expected on a SATA2 drive, right?

 *NeddySeagoon wrote:*   

> flipy, 
> 
> What versions of hdparm are you using. There was an error of a factor of 2 in the output, which gentoo fixed a year or so ago. 
> 
> Maybe thats the cause of your problem ... an old hdparm on the Red Hat box.

 

I've heard of that, but it would multiply all the results, wouldn't it?

I'm using v9.20 on the Gentoo machine.

The Red Hat machine has v6.6.

 *Zebbeman wrote:*   

> I had similar thoughts - is this the speed that in theory should be 300 MB/sec on a sata 2 drive? 

 

Right, but I don't know what more can be tweaked in the kernel and compiled as built-in...

Not an expert, but I've taken a deep look at almost all options related to this motherboard/chipset and end with the current configuration.

Thanks!

EDIT: I've checked in a new machine with the latest amd64 livecd what were the results of hdparm, and are the same as I have.

I'll try to check it with the x86_32 livecd and compare results!

----------

## dmpogo

 *Zebbeman wrote:*   

> I had similar thoughts - is this the speed that in theory should be 300 MB/sec on a sata 2 drive?
> 
> ```
> Timing buffered disk reads:  318 MB in  3.01 seconds = 105.78 MB/sec
> ```
> ...

 

Are you joking ?  For mechanical 7200 RPM drives, typical advertised speeds are 100-120 MB/s ,

see for example specs PDF for WD

http://www.wdc.com/wdproducts/library/?id=120&type=8

 in real life (hdparm is not real life, so closer to advertised values) people are happy with sustainable 90 MB/s 

Buffer-to-host  of SATA2  (3Gb/s) is never a bottleneck for the performance of a modern single mechanical drive

SSD drives are faster on reads, though

----------

## NeddySeagoon

flipy,

Please test with the same versions of hdparm, or better, get bonnie

----------

## eccerr0r

 *flipy wrote:*   

> [I've heard of that, but it would multiply all the results, wouldn't it?
> 
> I'm using v9.20 on the Gentoo machine.
> 
> The Red Hat machine has v6.6.
> ...

 

Yes, this is it.  Around hdparm version 6.9 a huge algorithm change making a lot of people qq about their cached read rates, where nothing really changed...  You should still verify for yourself that this indeed is the issue, but based on the versions you state, it likely is.

----------

## flipy

Well,

As most of you commented, the HDParm version on the Red Hat machine was wrong, so the results were'nt accurate.

Thanks for your time!

----------

## d2_racing

No problem  :Razz: 

----------

## flipy

 *flipy wrote:*   

> Well,
> 
> As most of you commented, the HDParm version on the Red Hat machine was wrong, so the results were'nt accurate.
> 
> Thanks for your time!

 

I've tested it against other machines and their results are by far better than with my machine.

Also, they're using almost the same.

Is there any way to tweak it?

I've recompiled the kernel, checked all the config files, but no help.

Where else can I get more information?

BTW, anyone know a good tutorial to detect all your hardware and compile the kernel accordingly?

----------

## krinn

You can do lspci -k and it tell you drivers names to use.

Or you can get a seed there to build a kernel with it to test : https://forums.gentoo.org/viewtopic-t-829476.html

Many users report it to be "fast", so i assume it should have not bad default settings.

But again, first time you were comparing a 32 vs 64bits kernel, and now you have compare a kernel on different computers.

Kinda like comparing the weights of a duck vs a duck in water

And now comparing the weights of a duck vs a witch

(i know not really good compare examples, but i love monty python)

----------

## flipy

 *krinn wrote:*   

> You can do lspci -k and it tell you drivers names to use.
> 
> Or you can get a seed there to build a kernel with it to test : https://forums.gentoo.org/viewtopic-t-829476.html
> 
> Many users report it to be "fast", so i assume it should have not bad default settings.
> ...

 

Nice comparison, though!

It's my work computer, seems to just "work", although other coworkers have reported a better performance with the same hardware.

I've been testing and comparing with other machines just to know if there is something bad with the hardware, that's why I'm putting some tests against different OSs/arch.

Now I'm using the latest kernel in the tree (2.6.35-r10), and it's the same.

```

AS5 ~ # uname -a

Linux AS5 2.6.35-gentoo-r10 #2 SMP Thu Oct 14 17:17:08 CEST 2010 x86_64 Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz GenuineIntel GNU/Linux

AS5 ~ # hdparm -tT /dev/sda

/dev/sda:

 Timing cached reads:   3584 MB in  2.00 seconds = 1792.59 MB/sec

 Timing buffered disk reads:  386 MB in  3.01 seconds = 128.33 MB/sec

```

I've tried to tune the kernel just for this specific hardware, but I may be missing something that decreases performance specially in I/O operations.

I couldn't find a compile-your-own-kernel tutorial that explains in detail how to detect all your hardware and where to find information about what to compile in the kernel.

The gentoo wiki seems to have some references, but mostly is for laptops.

----------

## Anon-E-moose

seagate 

/dev/sda:

 Timing cached reads:   2338 MB in  2.00 seconds = 1169.14 MB/sec

 Timing buffered disk reads:  318 MB in  3.01 seconds = 105.74 MB/sec

maxtor

/dev/sdb:

 Timing cached reads:   2312 MB in  2.00 seconds = 1156.15 MB/sec

 Timing buffered disk reads:  236 MB in  3.01 seconds =  78.36 MB/sec

hitachi

/dev/sdc:

 Timing cached reads:   2320 MB in  2.00 seconds = 1160.18 MB/sec

 Timing buffered disk reads:  368 MB in  3.01 seconds = 122.24 MB/sec

all sata2 drives on 2.6.35 with an nvidia chipset (sata) using hdparm-9.28

on other motherboards, it might give different results (different chipsets)

----------

## krinn

 :Smile: 

/dev/sdb:

 Timing cached reads:   18124 MB in  1.99 seconds = 9103.21 MB/sec

 Timing buffered disk reads: 876 MB in  3.00 seconds = 291.60 MB/sec

nah, told you, x86 rulesssss for memory speed, while x86-64 rules when it's to handle big memory sizes.

Portage 2.1.9.14 (default/linux/x86/10.0/desktop, gcc-4.4.3, glibc-2.11.2-r0, 2.6.36-rc6 i686)

----------

## flipy

 *krinn wrote:*   

> 
> 
> /dev/sdb:
> 
>  Timing cached reads:   18124 MB in  1.99 seconds = 9103.21 MB/sec
> ...

 

Damn, so good results under x86!

Well, at least it seems that it's ok to have that performance under 64 bits.

----------

## flipy

 *krinn wrote:*   

> 
> 
> /dev/sdb:
> 
>  Timing cached reads:   18124 MB in  1.99 seconds = 9103.21 MB/sec
> ...

 

I've give it another chance and tested using the latest stable iso for gentoo x86.

Ran hdparm -tT and what I've got is almost the same values as for the x86_64 version.

So it does not make any difference between architectures.

----------

## aCOSwt

 *krinn wrote:*   

> ...i love monty python

 

I would have bet cause I always considered that your help was... hmmm... helpful !    :Very Happy: 

On my x86_64 + P43 + caviar black, I notice a slight difference depending on the io scheduler selected, in a way which sounds surprising to me :

Regarding the Timing buffered disk reads, I get :

- noop : 113.30 MB/s

- deadline : 113.50 MB/s

(Same exact figures for many different tries)

Cannot test with the cfq as I did not build it.

which io scheduler are you using flipy ?

----------

