# hardware raid (megaraid) question

## Xamindar

Anyone have any experience with hardware raid scsi controllers?

I am trying to get an "HP Netraid" working which should be supported by the megaraid module.

But when I modprobe megaraid the system just locks up right there.

Are there any special parameters I am supposed to add?  Does it matter if I set the controller to "I2O" or "mass storage"?

Thanks for any pointers.

----------

## commonloon

I believe the controller should be set to mass storage, else you would need to load up the I2O modules instead. You may need to try a different version of the megaraid module. It has been my experience that raid modules in general are very sensitive to the firmware on the respective card. It may be that the module is not happy with the firmware.

----------

## Xamindar

I'm about to double check it again in mass storage mode.

I just tried it in i2o mode and loaded the i2o_block driver.  The module loaded fine but I don't see any new block devices, it didn't work.  It's my understanding that the i2o devices will show up as /dev/hdX versus the megaraid driver which should show drives as /dev/sdX?

Thanks for the clarification.

Here is what dmesg says about it:

```
I2O subsystem v$Rev$

i2o: max drivers = 8

i2o: Checking for PCI I2O controllers...

i2o: I2O controller found on bus 1 at 121.

iop0: PCI I2O controller at FF400000 size=4194304

iop0: Installed at IRQ 20

iop0: Activating I2O controller...

iop0: This may take a few minutes if there are many devices

iop0: Timeout Initializing

iop0: could not activate controller

I2O controller: probe of 0000:01:0f.1 failed with error -110

I2O Block Device OSM v$Rev$

block-osm: registered device at major 80

I2O ProcFS OSM v$Rev$

I2O SCSI Peripheral OSM v$Rev$

EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended

```

So it didn't work for some reason.

Here is a section from lspci if it helps:

```

0000:01:0f.0 PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge] (rev 03) (prog-if 00 [Normal decode])

   Flags: bus master, medium devsel, latency 64

   Bus: primary=01, secondary=02, subordinate=02, sec-latency=0

   I/O behind bridge: 00000000-00000fff

   Memory behind bridge: ffb00000-ffbfffff

   Prefetchable memory behind bridge: ff100000-ff1fffff

0000:01:0f.1 I2O: Intel Corporation 80960RP [i960RP Microprocessor] (rev 03) (prog-if 01)

   Subsystem: Hewlett-Packard Company HP NetRAID-1Si

   Flags: medium devsel, IRQ 20

   Memory at ff400000 (32-bit, prefetchable)

```

----------

## Xamindar

I ran the windows 2000 setup program and it showed the raid array just fine and I was able to format it.  So it doesn't make sence that linux wouldn't be able to use it.  This question is probably better suited for a kernel mailing list though.

----------

## commonloon

Trying LKML and kerneltrap sounds like a good idea. I'd search 1st then post obviously. I'd also look for notes from LSI on the card and kernel module.

----------

## Xamindar

Hey thanks for the tips.  I looked around those sites but was unsucessful.

Would the *new* driver work with this card?  Or does it not even support this old hardware?  I haven't checked it out yet though.

EDIT:  A little update, I installed windows 2000 on the machine and it finds the raid array just perfectly.  No problems under windows 2000.  I formatted the drive and copied lots of files over to it.  But windows is useless on this computer.  BUT WHY NOT LINUX!!!???  I have never seen important hardware work on windows but not on linux before now.

----------

## commonloon

I had a raid system that had a drive go bad on it and thus the late reply... kind of ironic  :Wink: 

The megaraid source should have a list of supported controllers. Look in /usr/src/linux/drivers/scsi/megaraid.c, etc. Mine shows...

```

 * Supported controllers: MegaRAID 418, 428, 438, 466, 762, 467, 471, 490, 493

 *                                      518, 520, 531, 532

```

Does your controller post something like this during boot up? Does it give you a firmware version?

----------

## Xamindar

Yes my controller is a 438.  It says that on the board, not on boot up.  I did however flash the firmware with the latest one I found on the internet which effectivly decreased the firmware version (it originaly said it was released some time in 2000, but after the flash it says some time in 1999) and it still locks up.

I also have a better 493 board here but I have been unable to find supported ecc ram for it.  I hope to be able to try this one once I find some.

But for now I am trying different options such as compiling the module in the kernel and downgrading to a previous version to see if it works.  I highly doubt this will fix my problem though.

----------

## Xamindar

I was also thinking.  Could it be possible that another module is interfering with the megaraid one?  For example I also have 2 aic7xxx controllers as well, one of which has the gentoo drive on it.  But I can't really test it as the os has to be on this controller.  Is that in any way possible?

----------

## commonloon

I haven't ever personally seen one module interfering with another. You might pull out the extra aic controller w/o the os on it, just for kicks and to eliminate a variable. All you device nodes show up right? 

```

dmesg | egrep -i 'dev\/sd|mega|adapt'

ls -la /dev/{sda,sdb,sdc}

```

I did STRUGGLE for more than a month recently getting a controller to work -- that said that machine has currently 55 days of uptime with no errors whatsoever and it is used heavily as a database machine. Not that 55 days is anything to brag about, but before "Fixing" it I could literally untar a big file and the array would go read only because of errors. In that case, I had a bad drive rev from Seagate, as well as, new firmware on the controller that simply didn't work with the mainstream kernel module. I had to update the one of the drives firmware, the controller firmware and install the latest kernel module from the manufactor then viola it worked..  My experience has been that lsi megaraid and adaptec cards are probably the best supported for linux...

----------

## Xamindar

Well, I finally drove down to weirdstuff today and found some ECC ram for that 493 board.  Put the board in and booted linux to try it.  It does the SAME THING when I modprobe the megaraid module.  Linux freezes just like it did with the other board.  I also tried compiling a 2.6.8 kernel with the megaraid module built in the kernel.  That kernel boots untill it gets to the megaraid section.  Then it displays that it found a megaraid controller on address blabla, pci 3, adapter 0 whatever and lists irq 11.  Then the kernel is frozen at that spot.

I don't get it.  Now it looks like it isn't the raid controller, it must be Linux.  This is really frustrating.  Does anyome know if there is a kernel option to handle irqs better?  I'm thinking that might be causing a problem.

----------

## Xamindar

I aquired a normal pci scsi controller to put in this box so that I can now move the raid controller to another pci slot (before, the onboard scsi controllers would only be loaded first if the raid controller was in the last pci slot).  I moved it to the second pci slot and this time the kernel does not totaly freeze but it still gives errors and I am not able to use the volumes.

Here is a snippet from dmesg:

```

scsi2: PCI error Interrupt at seqaddr = 0x8

scsi2: Data Parity Error Detected during address or write data phase

megaraid: found 0x101e:0x1960:bus 2:slot 0:func 0

scsi3:Found MegaRAID controller at 0xc38ec000, IRQ:15

megaraid: [H01.08:G01.02] detected 3 logical drives.

megaraid: Firmware H.01.07, H.01.08, and H.01.09 on 1M/2M controllers

megaraid: do not support 64 bit addressing.

megaraid: DISABLING 64 bit support.

megaraid: channel[0] is raid.

megaraid: channel[1] is raid.

scsi3 : LSI Logic MegaRAID H01.08 254 commands 16 targs 5 chans 7 luns

scsi3: scanning scsi channel 0 for logical drives.

  Vendor: MegaRAID  Model: LD0 RAID1  4339R  Rev:   H 

  Type:   Direct-Access                      ANSI SCSI revision: 02

SCSI device sdb: 8886272 512-byte hdwr sectors (4550 MB)

sdb: asking for cache data failed

sdb: assuming drive cache: write through

SCSI device sdb: 8886272 512-byte hdwr sectors (4550 MB)

sdb: asking for cache data failed

sdb: assuming drive cache: write through

 sdb:<4>megaraid: ABORTING-180d cmd=28 <c=0 t=0 l=0>

megaraid: ABORTING-180d[7d], fw owner.

megaraid: reservation reset failed.

megaraid: RESET-180d cmd=28 <c=0 t=0 l=0>

megaraid: RESET-180d[7d], fw owner.

megaraid: reservation reset failed.

megaraid: RESET-180d cmd=28 <c=0 t=0 l=0>

megaraid: RESET-180d[7d], fw owner.

megaraid: aborted cmd 180d[7d] complete.

megaraid: reservation reset failed.

megaraid: RESET-180d cmd=28 <c=0 t=0 l=0>

scsi: Device offlined - not ready after error recovery: host 3 channel 0 id 0 lun 0

SCSI error : <3 0 0 0> return code = 0x50000

end_request: I/O error, dev sdb, sector 0

Buffer I/O error on device sdb, logical block 0

scsi3 (0:0): rejecting I/O to offline device

Buffer I/O error on device sdb, logical block 0

 unable to read partition table

Attached scsi disk sdb at scsi3, channel 0, id 0, lun 0

Attached scsi generic sg1 at scsi3, channel 0, id 0, lun 0,  type 0

megaraid: ABORTING-1819 cmd=12 <c=0 t=1 l=0>

megaraid: ABORTING-1819[7d], fw owner.

megaraid: aborted cmd 1819[7d] complete.

megaraid: reservation reset failed.

megaraid: RESET-1819 cmd=12 <c=0 t=1 l=0>

megaraid: reservation reset failed.

megaraid: RESET-1819 cmd=12 <c=0 t=1 l=0>

megaraid: reservation reset failed.

megaraid: RESET-1819 cmd=12 <c=0 t=1 l=0>

scsi: Device offlined - not ready after error recovery: host 3 channel 0 id 1 lun 0

megaraid: ABORTING-181e cmd=12 <c=0 t=2 l=0>

megaraid: ABORTING-181e[7d], fw owner.

megaraid: aborted cmd 181e[7d] complete.

megaraid: reservation reset failed.

megaraid: RESET-181e cmd=12 <c=0 t=2 l=0>

megaraid: reservation reset failed.

megaraid: RESET-181e cmd=12 <c=0 t=2 l=0>

megaraid: reservation reset failed.

megaraid: RESET-181e cmd=12 <c=0 t=2 l=0>

scsi: Device offlined - not ready after error recovery: host 3 channel 0 id 2 lun 0

```

This is the errors after modprobing the megaraid driver.  It does the same thing compiled into the kernel but the machine is not able to finish booting this way.

It looks like at one point it found /dev/sdb but that device is not there any more.  I can't fdisk it.

Has anyone ever encountered these types of errors?  

Here is a link to all of dmesg in case that helps:  http://xamindar.radnimax.com/raid_error.txt

----------

## Xamindar

Hmm, the above was when I was trying to use the 493 raid controler and it looks like that controller is broken as windows couldn't see it either.

I ended up putting the first one back in at a different pci slot this time and it apears to be working now.  I also had to fiddle with the scsi termination on the drive bay board to use this raid controller as it only has one internal connecter.  I now have all six drives on it working.  This computer is very old, I guess all pci slots are not created equal.

Thanks for your help commonloon. :Wink: 

----------

## commonloon

No problem... a lot of mobos have different speed pci's, e.g., 1 might be 32bit/33hz while the others might be 64/66 as you prob know... also termination has always been an tricker than it should be thing. Some of those scsi ribbon cables have the connectors basically, for lack of a better way to describe it, stapled in. 

Cool that you got it working.

----------

