# LSI3108 driver failure -- disks not available after boot

## dbishop

I have a new system that uses a Supermicro AOM-S3108M-H8 SAS controller based on an LSI 3108 "invader" part.

The F/W seems correct when loading as BIOS option ROM, correctly recognizes the disks attached.  When my kernel loads (gentoo-sources 4.1.12) the megaraid_sas driver loads but reports this in dmesg:

```
machine ~ # dmesg | grep -C 2 megasas

[    7.709375] hid-generic 0003:0557:2419.0004: input: USB HID v1.00 Mouse [HID 0557:2419] on usb-0000:00:14.0-13.1/input1

[    8.957241] random: systemd-udevd urandom read with 70 bits of entropy available

[    9.616812] megasas: 06.806.08.00-rc1

[    9.616888] megasas: 0x1000:0x005d:0x15d9:0x0809: bus 3:slot 0:func 0

[    9.617442] megasas: Waiting for FW to come to ready state

[    9.629500] megasas: FW in FAULT state!!

[    9.629503] megaraid_sas 0000:03:00.0: megasas: FW restarted successfully from megasas_init_fw!

[    9.807533] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 4.0.1-k

[    9.807534] ixgbe: Copyright (c) 1999-2014 Intel Corporation.

--

[   11.170768] ixgbe 0000:01:00.0 dmz: renamed from eth0

[   23.430286] random: nonblocking pool is initialized

[   39.662901] megasas: Waiting for FW to come to ready state

[   39.662903] megasas: FW in FAULT state!!

```

The output from lspci is this:

```
machine ~ # lspci -s 03:00.0 -vvxxx

03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)

        Subsystem: Super Micro Computer Inc MegaRAID SAS-3 3108 [Invader]

        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-

        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

        Interrupt: pin A routed to IRQ 26

        Region 0: I/O ports at 5000 [size=256]

        Region 1: Memory at c7300000 (64-bit, non-prefetchable) [size=64K]

        Region 3: Memory at c7200000 (64-bit, non-prefetchable) [size=1M]

        Expansion ROM at c7100000 [disabled] [size=1M]

        Capabilities: [50] Power Management version 3

                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

        Capabilities: [68] Express (v2) Endpoint, MSI 00

                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us

                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-

                        MaxPayload 256 bytes, MaxReadReq 512 bytes

                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <2us, L1 <4us

                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+

                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

                LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

                DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported

                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled

                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

                         Compliance De-emphasis: -6dB

                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

        Capabilities: [d0] Vital Product Data

                Unknown small resource type 00, will not decode more.

        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+

                Address: 0000000000000000  Data: 0000

                Masking: 00000000  Pending: 00000000

        Capabilities: [c0] MSI-X: Enable- Count=97 Masked-

                Vector table: BAR=1 offset=0000e000

                PBA: BAR=1 offset=0000f000

        Capabilities: [100 v2] Advanced Error Reporting

                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

        Capabilities: [1e0 v1] #19

        Capabilities: [1c0 v1] Power Budgeting <?>

        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)

                ARICap: MFVC- ACS-, Next Function: 0

                ARICtl: MFVC- ACS-, Function Group: 0

        Kernel modules: megaraid_sas

00: 00 10 5d 00 03 01 10 00 02 00 04 01 08 00 00 00

10: 01 50 00 00 04 00 30 c7 00 00 00 00 04 00 20 c7

20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 09 08

30: 00 00 10 c7 50 00 00 00 00 00 00 00 0b 01 00 00

40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

50: 01 68 03 06 08 00 00 00 00 00 00 00 00 00 00 00

60: 00 00 00 00 00 03 00 00 10 d0 02 00 25 80 00 10

70: 20 28 09 00 83 54 41 00 40 00 83 10 00 00 00 00

80: 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00

90: 00 00 00 00 0e 00 00 00 03 00 1e 00 00 00 00 00

a0: 00 00 00 00 00 00 00 00 05 c0 80 01 00 00 00 00

b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

c0: 11 00 60 00 01 e0 00 00 01 f0 00 00 00 00 00 00

d0: 03 a8 00 80 00 00 00 00 00 00 00 00 00 00 00 00

e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

```

Errata:

```
machine ~ # uname -a

Linux bolan 4.1.12-gentoo #1 SMP Wed Jan 13 08:48:15 XST 2016 x86_64 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux
```

```
machine ~ # lsmod

Module                  Size  Used by

ixgbe                 158809  0

x86_pkg_temp_thermal     2896  0

megaraid_sas           90578  0

mdio                    2711  1 ixgbe

tpm_tis                 7733  0
```

----------

## Keruskerfuerst

What is FW?

The possible cause is:

1. Mainboard defective (PCIe bus)

2. Controller defective

----------

## dbishop

I believe that "FW" in this case means "firmware". 

I did some additional research on this and got some help from one of the excellent Gentoo dev's who had seen this specific problem some time ago.

The workaround is to simply suffer the time-wasting boot sequence by turning on the BIOS -- but for me it only works if the firmware is configured in "legacy" mode, not in "EFI" mode (which appears to be an issue with Supermicro's BIOS, not the LSI/Agere code).

The history is this:  This particular firmware is known to have had problems from the beginning.  I believe that it was the FW's authors' expectation that these cards would be used by systems that needed to know ahead of system boot BIOS (and now EFI) what disks were available pre-boot since they may have been hardware-raided (with a proprietary method) in case they were the boot drives.  It seems they never expected that their boot option ROM code would not get executed first.

Since pointlessly tolerating this time-sucking OpROM ballet is not desirable for me -- boot time is critical and the disks are non-raided data-only -- I turned the MegaRaid OpROM off. This is neither an unusual nor a discouraged practice -- like turning off PXE in machines that will never pixie-boot.  Scanning disks ahead of boot serves no useful purpose for me.  Frustratingly, in the case of the Linux kernel and megaraid_sas driver, not doing this causes the firmware to "fail" -- which I believe in this case means "confused state" with respect to the kernel and driver.  There have been several driver work-arounds tried but clearly they aren't effective. The driver maintainer stated a while back that the real fix needs to be done in the 3108's firmware. It is my opinion that this will never happen, because Avago couldn't care less about open markets like the Linux market.

The most disturbing and disappointing problem is that Avago seems to be sequestering all Linux support.  They do have a CLI control interface but it is fetch-restricted in Portage, and the link in the ebuild is broken. It took a while to come up with the right search key on Avago's site to get the file to betray its location. This mindset seems to be driven by the same megalomania that grips the likes of Broadcom and Qualcomm.  They cannot stand the notion that someone would pay good money to use their wares in a way they can't dictate and forever subjugate.  I avoid all three companies' goods where ever possible, especially the latter two. I'll change when they change.

I guess for me I'll have to find an alternative, but for now it works provided I let the thing's OpROM run ahead of hardware booting.Last edited by dbishop on Fri Jan 15, 2016 12:20 pm; edited 2 times in total

----------

## Keruskerfuerst

Is there any update for the controller firmware availaible?

----------

## dbishop

It is already running the latest firmware. I am sending a request to Avago to see if they want to do something about this but given that they've known about this for years (going back to the LSI days) I am not especially hopeful.

What I can say is this: As of this post, if you're using Linux -- not just Gentoo since this is an SoC's firmware limitation/bug -- I would strongly advise against any LSI 3008, LSI 3108 based SoC designs -- which is to say, avoid anything that runs the megaraid_sas driver.

Areca's products, while not perfect, are much better solutions. For me, I need a built-in/on-board solution. My application is power and size restricted, and boot time is critical. So I may need to avoid SAS, and maybe even Supermicro altogether, since SAS is almost always an add-in PCIe solution these days.

----------

