# Kernel upgrade breaks my Intel C606 SAS

## Zolcos

I tried upgrading my kernel from 3.5.4-r1 to 3.8.6. The new kernel can boot OK except the drives on my Intel C606 SAS controller are no longer recognized. The controller itself does seem to be recognized as I see its name flash by on boot and I get the right number of /dev/sas_host* devices.

In configuring the new kernel I made sure to use the same settings in regard to firmware. The old kernel loads the driver as a module and the firmware file is located at /lib/firmware/isci/isci_firmware.bin. I can still boot the old kernel and this process works.

I also tried something different -- making the c606 driver builitin and telling it to include the firmware blob in the kernel. (This is what I wanted to do anyway but the old kernel would panic on boot when I tried this). This time, it had the same result as the module approach -- the kernel boots successfully, the c606 is recognized, but the drives do not appear.

I should note there weren't any errors on the kernel make aside from a lot of "section mismatch" but I always see that message.

Is there a way I can check at runtime if the firmware file is being loaded successfully by the kernel? It would at least rule out the firmware related configuration as the culprit here... I'm only so fixated on that because it was the source of a similar problem when I initially installed Gentoo on this machine.

----------

## wcg

Did you try enabling SCSI_LOGGING? You need /proc filesystem

and sysctrl support, too, to enable it.

There is a script for configuring it in the sg3_utils package. It ends

up as "/usr/sbin/scsi_logging_level".  There is a man page for the script,

and being a shell script, you can read the script itself.

[edit:] Since this needs to be enabled after boot by writing to a file

in /proc, it may not enhance the information that you get in dmesg

for scsi device initialization.[/edit]

That might provide a little more information in dmesg [not; see edit above]

when scsi devices are initialized by the kernel. Messages after boot probably

end up in /var/log/ somewhere. (I install sysklogd and have klogd log kernel

messages directly to a file in /var/log/, rather than going through syslog,

so I always know where to look for kernel messages. Requires some

/etc/syslog.conf reconfiguration from defaults.)

----------

## wcg

PS: If you can figure out what all the script does, you may be able to hack

around in /usr/src/linux/drivers/scsi/scsi_logging.h to enable it at boot,

but I did not immediately see what to change looking at it.

(It would be useful if one could pass a kernel parameter via grub's kernel

command line to enable and configure it. But I did not see an option for that

documented anywhere.)

----------

## Zolcos

SCSI_LOGGING is enabled and I have that script but getting the logging to happen when the devices are first being initialized seems like a challenge.

btw I should mention that my system uses mdev because of that udev update that broke a lot of gentoo boxes (I have a separate /var), not sure if it makes a difference here

----------

## wcg

I have no idea if mdev would be responsible. I got eudev working

alright. It supports having /var and /usr on separate filesystems

(like udev before the big change). Works more or less the same

as <= udev-181. The one difference is that you need an empty

/etc/udev/rules.d/80-net-name-slot.rules file to prevent eudev

from following udev's "persistent network names" behavior

(prevent it from renaming eth0 and/or wlan0 to something else

at boot).

(I left out the "legacy-libudev" USE flag for eudev. Nothing broke.)

----------

## wcg

PS:

When sg3_utils was installed, emerge installed some dependency of it, and

rescan_scsi_bus, which is actually a shell script in /usr/sbin,

rescan_scsi_bus.sh. Maybe scsi_logging_level in combination with that

would get you some useful info. I did not see a man page for it,

but it has help in the script, ie

```

rescan_scsi_bus -h

```

(The one thing I did not see was what the syntax for a "Host"

should be, ie "--hosts=[LIST]". LIST of what? Scsi hosts, we

know that, but specified how exactly? A pci bus address?

Maybe the way they are listed in dmesg when encountered

by the kernel at boot? 0-1-2-3? There are some symbolic links

in /sys/bus/scsi/devices/, maybe you can try those for the names

of "hosts" for rescan_scsi_bus.)

Anyway, if you figure out how to specify a host to it, it can search

for luns, targets, etc. And if it is not finding any, SCSI_LOGGING

can probably report the details if it has been enabled with

scsi_logging_level.

----------

## Zolcos

I had to install rescan-scsi-bus separately, and here's what I got:

With the old (working) kernel: http://pastebin.com/qE0JyYWT

With the new kernel: http://pastebin.com/W3Bt39AC

Looks like it finds the icsi device (c606) alright, but doesn't see the four drives that are on it when using the new kernel.

I was able to configure my logger to put kernel messages outside /var so I can get them when /var isn't working. I'm working on getting it to catch something interesting from scsi_logging.

----------

## wcg

There is the command line option

```

--nooptscan

```

but according to the embedded help it only applies to a LUN

search ("optimize scan" is apparently an early out if LUN 0 is

not found; --nooptscan would instruct it to keep going in that

case). I do not know if it applies to targets as well as LUNs.

Anyway, you can experiment with command line options that

do not look risky (that only modify what information is searched

for and where it looks for it exactly), like

```

--channels= 0 1

```

I looked at the script, it seems to mostly use information in /sys/

and /proc/, so it is relying on the kernel to find the hardware at

boot and query it for configuration information.

I find the fact that the old kernel output reports the onboard SAS

hosts as iscsi and the new one reports them as ahci suspicious.

(Is the kernel using the correct driver for these devices?)

There is a linux-scsi mailing list with a lot of nuts-and-bolts discussion

of kernel scsi code:

http://vger.kernel.org/vger-lists.html#linux-scsi

----------

## Zolcos

 *wcg wrote:*   

> I find the fact that the old kernel output reports the onboard SAS
> 
> hosts as iscsi and the new one reports them as ahci suspicious. 

 

I think they are still being detected correctly, just in a different order -- you can see the ocz-vertex drives end up on hosts 2 and 3 in the new kernel instead of 0 and 1 like in the old kernel

----------

## wcg

[edit:] Oh, I see what you are saying, the iscsi devices are

hosts 0 and 1 when the new kernel boots. So, if it is using

the correct device driver for those SAS host interfaces,

I have no idea why the kernel would fail to detect the

connected drives at boot. And both kernels detect the

same drives connected to the mpt2sas controller, so

the problem seems to be specific to the driver for the

C606 SAS controller rather than to all SAS controllers.

I still think you need to consult the mailing list for

bug/patch reports.[/edit]

[deleted blather that reflected not consulting the two pastebins

for clarification.]

Anyway, you can search the mailing list for posts

with keywords "C606 ISCSI missing drives". Someone else may

have already reported your problem and newer kernels

may already have a boot probe device detection patch for it.

----------

## wcg

PS:

I hesitate to mention this, it is becoming such a cliche these

days of aggressive power saving invading every aspect of

the operation of our hardware, but I wonder if on-board

acpi code is spinning down the drives before the kernel can

detect them. It would be a bizarre kind of error that boot

code could easily work around if it knew that could happen.

Something to keep in mind if you don't come across any

other reports of the same problem on the same hardware.

(Not very likely. A change in the kernel like this usually arrives

with more than one report on various mailing lists.)

----------

## wcg

[edit:]After some www browsing and rechecking /usr/src/linux/.config, I see

that the C600 SAS driver is actually "ISCI" rather than "ISCSI". That makes

searching for bug reports a little more effective, since one can exclude

"iscsi" hits.[/edit]

[obsolete comments]

When searching for relevant bug reports, one needs to look

closely to be sure the report is relevant to the ISCSI driver

("Intel SCSI" for the intel SAS controllers) and not "iSCSI",

an "scsi over networks" protocol that predates the intel

hardware driver. ( http://linux-iscsi.org/wiki/Main_Page )

(I did not see anything relevant in the linux-acpi bugzilla

or mailing list.)

[/obs]

[edit:]

This a very new driver, in terms of mainline kernel integration.

Intel had an internal version sometime before, but the source

in the public kernels was first introduced into 3.0-rc6, with a

big set of patches merged in 3.7-rc6. I would have expected

it to work better in a 3.8.x kernel than a 3.5.x kernel, but

your results demonstrate that would be overly optimistic.

Even using the correct search term ("isci"), I still do not find

anything relevant to that SAS driver in the linux-acpi bugzilla

and mailing list. That does not necessarily mean that acpi is

not the real problem, only that no one else has identified it as

such. ( https://01.org/linux-acpi )

With luck, SCSI_LOGGING turns up some useful information.

[/edit]

----------

## RAPHEAD

Hi, 

did you found out anything regarding this?

I'm also having problems with a c600 based system in combination with

an Intel RKSAS4 Upgrade Key to enable SAS and recent Linux Kernels.

My problem is that I cannot make use of a big drive (4TB) while smaller SAS drives work.

Funnily it works with a very old SuSE Linux Enterprise 11 SP2 with Kernel 3.0.xx.

But altough this kernel also has the ISCI module, it does not get loaded (lsmod does not show it).

Greets

----------

