# Problems with nvsas, hard drives, ssd, and LG BDROM [solved]

## RayDude

I'm putting this here so someone in my place in the future doesn't have to go through what I went through.

I have a Sandy Bridge Celery Server running Gentoo and I just put together a six disk 3TB Raid6 array from Black Friday deals for cheap to replace my aging and failing Raid 5 arrays.

I have had a hellofa time finding a SATA controller to boost my four port sata mobo to the needed eight ports. Here's a list of the troubles:

1) Silicon images PCI Raid card is too slow (only 70 MB / sec for the hard drives).

2) Two Silicon images PCI-Express dual sata cards didn't play nice with each other, only one worked.

3) Frys did not have a four port PCI-Express sata card.

4) LSI Logic SATA cards apparently don't support commercial 3TB drives, even with the IT mode BIOS.

5) Cheap Marvell SATA cards provided by Syba won't last more than 20 hours.

6) When connected to the PCI card, the LG BDROM slowed the Raid6 array to 5MB / second read. I had to leave it disconnected.

I have almost destroyed the Raid 6 array three times. One time, (the Syba fiasco) three drives errored out of the array simultaneously. I saw my life flash before my eyes during that episode. Thank god software Raid is so good.

The latest card that I bought is here: http://www.newegg.com/Product/Product.aspx?Item=N82E16816101358

This is a marvell SAS (kernel driver in scsi drivers and called mvsas).

Here are some observations:

1) mvsas' BIOS takes precedence over the mother board SATA controller making its devices sda ... sdb ... etc.

2) The SAS cables are labeled correctly but the mvsas driver makes the highest in use port sda, which frankly makes no sense. In my case I had to connect my SSD boot drive to cable 3 when I was only using one connector.

After getting everything up and running I benchmarked the setup with hdparm -t and the array was a respectable 180 MB / second with individual drives showing up as 120 MB / second. Better than what I got with the PCI card.

After running for a day I noticed that I was getting errors in dmesg and syslogs:

```
Dec 30 03:10:01 server kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1

Dec 30 03:10:01 server kernel: sas: ata4: end_device-0:3: cmd error handler

Dec 30 03:10:01 server kernel: sas: ata1: end_device-0:0: dev error handler

Dec 30 03:10:01 server kernel: sas: ata2: end_device-0:1: dev error handler

Dec 30 03:10:01 server kernel: sas: ata3: end_device-0:2: dev error handler

Dec 30 03:10:01 server kernel: sas: ata4: end_device-0:3: dev error handler

Dec 30 03:10:01 server kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0

```

These were generated in reams and reams continuously. I searched and searched and did the following to fix the issue:

1) Upgraded to the latest kernel, which broke mvsas and required a custom patch that isn't in the kernel yet.

2) Upgraded the BDROM firmware.

Then I moved the BDROM from port 0 to port 8 of the SAS controller and voila! There errors went away (except when the BDROM is accessed).

So the moral of the story is: sometimes its not the controller or the hard drives. Sometimes its the junky BDROM who's at fault!

And the best part is: the Raid 6 array is reading at almost 500 MB/second.

Thanks for reading, I hope this will help someone in the future.

Update:

MEH! It came back!

```
sr 0:0:3:0: command ffff8800b8a82e80 timed out

sas: Enter sas_scsi_recover_host busy: 1 failed: 1

sas: ata4: end_device-0:3: cmd error handler

sas: ata1: end_device-0:0: dev error handler

sas: ata2: end_device-0:1: dev error handler

sas: ata3: end_device-0:2: dev error handler

sas: ata4: end_device-0:3: dev error handler

sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1

```

Update:

Well, I made the errors go away by swapping one mother board hard drive with the BDROM, but the performance of everything went down. The SSD is now only doing 225 MB / sec (was 450 MB / sec) and the md array is only 220 MB /sec.

I'm wondering if I got higher performance with the 3.5.7 kernel than I do with the 3.7.1 but honestly I'm really tired of this BS.

Its working, its fast enough...

----------

