# SATA: Slow cached reads

## kshade

Hi,

I just realized that most apps load faster inside the virtual machine I set up (running Archlinux) than directly on the host (Gentoo, of course). Firefox for example takes about 15 seconds on the host and only 8 inside the VM. While investigating that strange behavior I did some performance tests with hdparm, here's the (averaged) result:

```

root@gm ~ # hdparm -tT /dev/sda

/dev/sda:

 Timing cached reads:   670 MB in  2.00 seconds = 334.43 MB/sec

 Timing buffered disk reads:  180 MB in  3.01 seconds =  59.75 MB/sec
```

The buffered reads are OK, but the cached reads are way too low.

Some informations about my system:

lspci

```

00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge

00:00.1 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge

00:00.2 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge

00:00.3 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge

00:00.4 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge

00:00.7 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge

00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South]

00:0c.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a)

00:0c.1 Input device controller: Creative Labs SB Live! Game Port (rev 0a)

00:0f.0 IDE interface: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)

00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)

00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)

00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)

00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)

00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)

00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)

00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]

00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)

00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control

01:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600/GeForce 6600 GT] (rev a2)

```

hdparm -I /dev/sda

```

/dev/sda:

ATA device, with non-removable media

   Model Number:       SAMSUNG SP1614C                         

   Serial Number:      0696J1FX201693

   Firmware Revision:  SW100-27

Standards:

   Used: ATA/ATAPI-7 T13 1532D revision 0 

   Supported: 7 6 5 4 

Configuration:

   Logical      max   current

   cylinders   16383   16383

   heads      16   16

   sectors/track   63   63

   --

   CHS current addressable sectors:   16514064

   LBA    user addressable sectors:  268435455

   LBA48  user addressable sectors:  312579695

   device size with M = 1024*1024:      152626 MBytes

   device size with M = 1000*1000:      160040 MBytes (160 GB)

Capabilities:

   LBA, IORDY(can be disabled)

   Standby timer values: spec'd by Standard, no device specific minimum

   R/W multiple sector transfer: Max = 16   Current = 16

   Recommended acoustic management value: 254, current value: 254

   DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 udma7 

        Cycle time: min=120ns recommended=120ns

   PIO: pio0 pio1 pio2 pio3 pio4 

        Cycle time: no flow control=120ns  IORDY flow control=120ns

Commands/features:

   Enabled   Supported:

          SMART feature set

          Security Mode feature set

      *   Power Management feature set

      *   Write cache

      *   Look-ahead

      *   Host Protected Area feature set

      *   WRITE_BUFFER command

      *   READ_BUFFER command

      *   DOWNLOAD_MICROCODE

          SET_MAX security extension

      *   Automatic Acoustic Management feature set

      *   48-bit Address feature set

      *   Device Configuration Overlay feature set

      *   Mandatory FLUSH_CACHE

      *   FLUSH_CACHE_EXT

      *   SMART error logging

      *   SMART self-test

      *   SATA-I signaling speed (1.5Gb/s)

Security: 

   Master password revision code = 65534

      supported

   not   enabled

   not   locked

   not   frozen

   not   expired: security count

      supported: enhanced erase

   56min for SECURITY ERASE UNIT. 56min for ENHANCED SECURITY ERASE UNIT.

Checksum: correct

```

Thanks in advance.

----------

## guero61

I assume you're asking for advice on speeding things up, but for the record your post simply reads as a statement, not as a request for help.

That said, on most vanilla Linux systems the "cached reads" section is actually a measure of your processor-memory I/O (man 8 hdparm) and not disk I/O or even interaction with the disk's cache.  There isn't much tuning you can do to alter that short of overclocking.  334MB/s seems to translate roughly into the realm of PC-133 memory (1/2 bus bandwidth); is that about right?

Depending on how they're set up (loop device, raw partition, LVM, etc), Xen domU clients can benefit from the host's disk cache as well as their own, likely explaining your boost in speed within a VM.  The only reliable way to test disk I/O within a VM or anything else is not by seat-of-the-pants measurement, but by ensuring your caches are flushed, using accurate load timing, and documenting your methodology so it is repeatable.

----------

## kshade

 *guero61 wrote:*   

> I assume you're asking for advice on speeding things up, but for the record your post simply reads as a statement, not as a request for help.

 

You're right, sorry. Shouldn't have posted just after getting up.

 *guero61 wrote:*   

> That said, on most vanilla Linux systems the "cached reads" section is actually a measure of your processor-memory I/O (man 8 hdparm) and not disk I/O or even interaction with the disk's cache.  There isn't much tuning you can do to alter that short of overclocking.

 

Thanks, didn't knew that.

 *guero61 wrote:*   

> 334MB/s seems to translate roughly into the realm of PC-133 memory (1/2 bus bandwidth); is that about right?

 

No, it's actually 3x 512MB DDR-400 RAM. My old PC (which has PC-133 RAM) performs about the same when it comes to cached reads but falls far behind when comparing buffered disk reads. I found various benchmark results on machines similar to mine and those indicated that I should get about 1000 MB/sec.

 *guero61 wrote:*   

> Depending on how they're set up (loop device, raw partition, LVM, etc), Xen domU clients can benefit from the host's disk cache as well as their own, likely explaining your boost in speed within a VM.

 

It's Virtualbox and the HDD's just a file. I don't care too much about the HDD performance in the VM really, it just got me investigating the low performance of the host.

Some memory benchmarks:

xfbsuite.pl -r

```
RAM: 1518       

*** MB/s ***    

Function        

Copy:  893      

Scale: 894      

Add:   1002     

Triad: 998
```

bashmark --just-mem-test

```
:  T   E   S   T        :    :S C O R E :  : R A T I O:

:-----------------------------------------------------:

:Memory r/w (cached)    :    :      1936:  :      +61%:

:Memory de-/alloc       :    :       585:  :      -11%:
```

----------

## kshade

Dang, looks like I those 3 DIMMs don't get along well, memtest told me that they were running at PC100 speed. I removed one of the three and got those way better results:

```
/dev/sda:

 Timing cached reads:   1004 MB in  2.00 seconds = 501.73 MB/sec

 Timing buffered disk reads:  182 MB in  3.02 seconds =  60.23 MB/sec
```

```
RAM: 1010       

*** MB/s ***    

Function        

Copy:  1783     

Scale: 1816     

Add:   1937     

Triad: 2062
```

```
:  T   E   S   T        :    :S C O R E :  : R A T I O:

:-----------------------------------------------------:

:Memory r/w (cached)    :    :      1973:  :      +64%:

:Memory de-/alloc       :    :       676:  :       +3%:
```

----------

## drescherjm

 *Quote:*   

> Dang, looks like I those 3 DIMMs don't get along well, memtest told me that they were running at PC100 speed. I removed one of the three and got those way better results

 

The slow down is normal with a lot of DDR systems. The motherboard bios will generally run the memory at slower bus speed when a channel has more than one double sided dimm installed to minimize stability issues caused by the increased noise on the bus.

----------

## kshade

 *drescherjm wrote:*   

>  *Quote:*   Dang, looks like I those 3 DIMMs don't get along well, memtest told me that they were running at PC100 speed. I removed one of the three and got those way better results 
> 
> The slow down is normal with a lot of DDR systems. The motherboard bios will generally run the memory at slower bus speed when a channel has more than one double sided dimm installed to minimize stability issues caused by the increased noise on the bus.

 

Hum, the mainboards manual states that it will run at full speed with all banks occupied.

I ran some test on my notebook (Pentium M 1.5 GHz, 512 MB RAM, Intel 82801 chipset, IDE drive) and got those results:

```
Timing cached reads:   1264 MB in  2.00 seconds = 631.84 MB/sec

Timing buffered disk reads:   72 MB in  3.06 seconds =  23.56 MB/sec
```

Faster than my desktop (Athlon 64 3200+, 1024 MB RAM, VIA K8T800+ chipset) for some reasons.

Bashmark on the other hand yields those numbers:

```
:  T   E   S   T        :    :S C O R E :  : R A T I O:

:-----------------------------------------------------:

:Memory r/w (cached)    :    :       690:  :      -43%:

:Memory de-/alloc       :    :       412:  :      -37%:
```

OK for a notebook I think and clearly below the desktop machine.

Here's the NBs drive information:

```

ATA device, with non-removable media

   Model Number:       FUJITSU MHT2060AT                       

   Serial Number:      NN78T4814BNM

   Firmware Revision:  0022    

Standards:

   Used: ATA/ATAPI-6 T13 1410D revision 3a 

   Supported: 6 5 4 

Configuration:

   Logical      max   current

   cylinders   16383   16383

   heads      16   16

   sectors/track   63   63

   --

   CHS current addressable sectors:   16514064

   LBA    user addressable sectors:  117210240

   device size with M = 1024*1024:       57231 MBytes

   device size with M = 1000*1000:       60011 MBytes (60 GB)

Capabilities:

   LBA, IORDY(cannot be disabled)

   Standby timer values: spec'd by Standard, no device specific minimum

   R/W multiple sector transfer: Max = 16   Current = 16

   Advanced power management level: 128 (0x80)

   Recommended acoustic management value: 254, current value: 254

   DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 

        Cycle time: min=120ns recommended=120ns

   PIO: pio0 pio1 pio2 pio3 pio4 

        Cycle time: no flow control=240ns  IORDY flow control=120ns

Commands/features:

   Enabled   Supported:

      *   SMART feature set

          Security Mode feature set

      *   Power Management feature set

      *   Write cache

      *   Look-ahead

      *   Host Protected Area feature set

      *   WRITE_BUFFER command

      *   READ_BUFFER command

      *   DOWNLOAD_MICROCODE

      *   Advanced Power Management feature set

          Power-Up In Standby feature set

      *   SET_MAX security extension

      *   Automatic Acoustic Management feature set

      *   Device Configuration Overlay feature set

      *   Mandatory FLUSH_CACHE

      *   SMART error logging

      *   SMART self-test

Security: 

   Master password revision code = 65534

      supported

   not   enabled

   not   locked

      frozen

   not   expired: security count

   not   supported: enhanced erase

   60min for SECURITY ERASE UNIT. 

HW reset results:

   CBLID- above Vih

   Device num = 0 determined by the jumper

Checksum: correct

```

----------

## guero61

 *kshade wrote:*   

> 
> 
>  *guero61 wrote:*   That said, on most vanilla Linux systems the "cached reads" section is actually a measure of your processor-memory I/O (man 8 hdparm) and not disk I/O or even interaction with the disk's cache.  There isn't much tuning you can do to alter that short of overclocking. 
> 
> Thanks, didn't knew that.
> ...

 

Neither did I, until recently.  That's what so many of us get for blindly running commands someone else posts without reading the man page and actually understanding what's going on.

I'm sure the mainboard's manual stating that it will run at "full speed with all banks occupied" is probably one of those weasel statements - the kind with microscopic footnotes that say, "unless you hold your mouth wrong, or use more than one double-sided DIMM, or..."

It's unfortunate, but if it's really tied to 2 v. 3 DIMMs, you may just have to choose between being slower and having less RAM, unless you purchase a pristine set of matching single-sided DIMMS.

----------

## kshade

 *guero61 wrote:*   

> I'm sure the mainboard's manual stating that it will run at "full speed with all banks occupied" is probably one of those weasel statements - the kind with microscopic footnotes that say, "unless you hold your mouth wrong, or use more than one double-sided DIMM, or..."
> 
> It's unfortunate, but if it's really tied to 2 v. 3 DIMMs, you may just have to choose between being slower and having less RAM, unless you purchase a pristine set of matching single-sided DIMMS.

 

Seems so. Just looked that table up in the manual, it says that 3 double sided modules can run at DDR333 speed and 3 single sided modules at DDR400 speed.

It runs at DDR-400 speed now but the cached reads from HD are still way too slow. I'll try tinkering with some BIOS setting, but I doubt that will help. Any other suggestions?

----------

## luminoso

did u found any solution?

----------

## kshade

Not really. Bought a new HDD (wanted some more space anyway) and reinstalled Gentoo, the system feels a lot more responsive now and actually is measurable quicker (but not that much).

----------

