# Poor SSD write performance despite TRIM

## Hypnos

Since my SSD performance has been deteriorating, I decided to (finally) upgrade the MMCRE28G8MXP-0VBL1﻿ in my Thinkpad X301 from the VBM1EL1Q﻿ Lenovo firmware to the VBM19C1Q﻿ Samsung-direct firmware which enables TRIM and NCQ.

hdparm -I /dev/sda indicates TRIM is enabled, and a test using hdparm --fibmap and hdparm --read-sector shows that TRIM is working.  I also ran the old wiper.sh script for good measure to make sure the free space was TRIMmed.

Before the flashing bonnie++ was showing sequential output at ~50MiB/s, far below the >= 100MiB/s I had previously.  Unfortunately, there is no improvement on this benchmark after the firmware change and procedure below.  Sequential input continues to be ~250MiB/s.  (File size: 8GB)

I get a similar disappointing result from test described in the Arch wiki:

```
# dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 17.7783 s, 60.4 MB/s

# echo 3 > /proc/sys/vm/drop_caches

# dd if=tempfile of=/dev/null bs=1M count=1024

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 4.97858 s, 216 MB/s

# dd if=tempfile of=/dev/null bs=1M count=1024

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 0.510984 s, 2.1 GB/s
```

Some additional info:

* My uname -a:

```
Linux anodyne 2.6.38-tuxonice-r1 #1 SMP PREEMPT Sat Jun 11 19:43:28 JST 2011 x86_64 Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz GenuineIntel GNU/Linux
```

* My hdparm -I is here.

* My tune2fs -l is here.

* My scheduler is cfq.  noop does not improve performance, and increases desktop latency during heavy I/O operations.

* My SATA link is not restricted:

```
# dmesg | grep SATA

ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 4 ports 3 Gbps 0x1 impl SATA mode

ata1: SATA max UDMA/133 abar m2048@0xf0826000 port 0xf0826100 irq 43

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
```

Any ideas?  If you have an SSD, what do you get with these benchmarks?

----------

## disi

Intel X-25-M 120GB, not much better :/

```
disi-bigtop ~ # dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 10.2319 s, 105 MB/s

# echo 3 > /proc/sys/vm/drop_caches 

disi-bigtop ~ # dd if=tempfile of=/dev/null bs=1M count=1024

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 3.95367 s, 272 MB/s

disi-bigtop ~ # dd if=tempfile of=/dev/null bs=1M count=1024

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 0.111187 s, 9.7 GB/s
```

This is with CFQ, no discard option but once a day 

```
fstrim /
```

```
Linux disi-bigtop 3.0.4-gentoo #11 SMP Thu Sep 8 20:07:53 BST 2011 x86_64 Intel(R) Core(TM) i7-2820QM CPU @ 2.30GHz GenuineIntel GNU/Linux
```

----------

## Hypnos

Your result is comparable to this one.

I was hoping for one comparable to this.

----------

## whig

I'm no ssd expert but I hear our partitions need to aligned. On 512kB or even 2MB boundaries. Without alignment a write has to write to two blocks. See here.

----------

## Hypnos

As you can see from my tune2fs output, my filesystem starts at block 0 (partitionless disk) and has stride and stripe width 256 for a 4KiB block.  Thus, the alignment is zero-offset with spacing 1MiB.

----------

## Hypnos

Just for kicks I did a secure erase with hdparm, then ran from sysrescuecd :

```
# dd if=/dev/zero of=/dev/sda bs=1M count=1024 conv=fdatasync,notrunc
```

Then, after a fresh format did the usual test

```
# dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc
```

Both times, I get ~140MB/s, which is acceptable.  However, once I restore my system from backup, my performance goes down to ~60MB/s (from either sysrescuecd-1.5.1 or my installed system).

Fragmentation does not seem to be the problem (nor should it matter so much for an SSD):

```
# e2freefrag /dev/sda

Device: /dev/sda

Blocksize: 4096 bytes

Total blocks: 31258710

Free blocks: 11630767 (37.2%)

Min. free extent: 4 KB 

Max. free extent: 2064256 KB

Avg. free extent: 57224 KB

HISTOGRAM OF FREE EXTENT SIZES:

Extent Size Range :  Free extents   Free Blocks  Percent

    4K...    8K-  :           272           272    0.00%

    8K...   16K-  :            40            94    0.00%

   16K...   32K-  :            39           213    0.00%

   32K...   64K-  :            33           343    0.00%

   64K...  128K-  :            47          1101    0.01%

  128K...  256K-  :            57          2553    0.02%

  256K...  512K-  :            66          6177    0.05%

  512K... 1024K-  :            59          9944    0.09%

    1M...    2M-  :            25          9836    0.08%

    2M...    4M-  :            55         42548    0.37%

    4M...    8M-  :            48         76085    0.65%

    8M...   16M-  :             6         18337    0.16%

   16M...   32M-  :             6         33103    0.28%

   32M...   64M-  :            11        133687    1.15%

   64M...  128M-  :            23        554510    4.77%

  256M...  512M-  :             1         90080    0.77%

  512M... 1024M-  :             4        781720    6.72%

    1G...    2G-  :            21       9870706   84.87%
```

----------

## HeissFuss

Did you try running the test with ext4 write barriers disabled?  You wouldn't want to actually disable them for normal usage, but it may be what's affecting your throughput.

----------

## krinn

You might just suffer from a too old SSD and seeing its lifetime cycle going on ?

 *http://en.wikipedia.org/wiki/Write_amplification wrote:*   

> The ATA Secure Erase command is designed to remove all user data from a drive. With an SSD without integrated encryption, this command will put the drive back to it original out-of-box state. This will initially restore its performance to the highest possible level and the best (lowest number) possible write amplification, but as soon as the drive starts garbage collecting again the performance and write amplification will start returning to the former levels.

 

----------

## Hypnos

 *HeissFuss wrote:*   

> Did you try running the test with ext4 write barriers disabled?  You wouldn't want to actually disable them for normal usage, but it may be what's affecting your throughput.

 

With "barrier=0" I still get ~60MB/s for my write test.

----------

## Hypnos

krinn,

Why wouldn't TRIM solve the problem described in your quote?

----------

## krinn

I'm not quiet sure, read the link the quote came from, what i understand is that trim looks like you just blank the area of the SSD where the file structure is record (for a disk i would call that the FAT area), but datas remains on disk.

And SSD can only write on a clear area, so even you cheat the SSD with just erasing the FAT, the SSD will soon try to write on an area where datas are still there, and need to perform the clear before writing the new datas, and it looks like the more a cell is use, the more time/trys it need to clear and write on a cell.

At bare minimum, it's clear an SSD need 2 writes to write 1 thing, 1 write to clear the cell, and 1 to write the cell, should explain the garbage collector also, instead of writing to an avaiable cell but with datas in it that need a clear before writing, it's better to write to an already clear cell as you need just 1 write then, you have the benefit also to write to an empty cell, averaging the cells use (if cell isn't clear it have been use 1 time, so instead of re-using it again, writing to an already clear cell average number of write on both).

That's what i understood, i don't really know the mechanics in use for an SSD, just that lifetime is a problem with such a device.

At first SSD were design for laptop, a user install his system and never really alter it after, so you just have few writes for his datas.

----------

## Hypnos

krinn, 

My understanding:

1) TRIM should significantly improve write performance, since cells are cleared during idle periods instead of immediately before a write.  Moreover, the improvement should be drastic for writes much smaller than the erase block size.

2) TRIM does not increase SSD life -- in the end you have to erase what you have to erase, and write what you have to write.

Is it possible that I'm encountering too many bad blocks after loading up my disk?  Then, I should be getting FS corruptions already, but it's solid as a rock ...

----------

## Hu

TRIM support should be able to improve SSD lifetime somewhat.  Although you are correct that the erase must be incurred either way, the other part of TRIM is that the region is marked as free in the FTL.  By advising the drive which areas are free, you prevent the drive's garbage collection from copying free areas, which means fewer erase cycles overall.

----------

