# Detecting disk failure on cpqarray

## Spiffster

Hello all,

I have an old Compaq proliant DL-360 with two 36GB, 10kRPM disks mirrored on a Symbios Logic 53C1510 hardware raid controller. The raid system works fine (uses the cpqarray driver, vanilla 2.6.20.1 kernel), but I am looking for a way to monitor the status of the individual disks, in order to be notified if a disk crashes. I have been searching high and low for something useful, but so far have not been able to find anything that appears useful. Any suggestions??

I don't think i can use S.M.A.R.T., since the disks are not presented individually, but only through the raid device.

With the cpqarray driver, my root partition is as follows:

/dev/ida/c0d0p1 on / type reiserfs (rw,usrquota)

The controller is:

0000:00:01.0 RAID bus controller: LSI Logic / Symbios Logic 53C1510 (rev 02)

----------

## killfire

This may be a very stupid way to test it, but if you have only one hard drive connected, does anything show up in the cpqarray info in the kernel messages or lspci? I know that doesn't help, but it is a possible way to start sniffing around... also, have you tried messaging the people who mantain cpqarray? these people on sourceforge: Project Admins: cmaupin, compaqadm, jcagle, mhaselden, murthysaripalli, smcameron...

hopefully that gives you _some_ help, if not actually solving your problem.

----------

## hurgh

Hi,

Not sure if this is going to be useful for you or not, but I have a system setup using the same type of raid card.

My devices show up as /dev/ida/c0d0*.

I just had a drive fail in the array, I noticed this by visual inspection of the drives (while I was walking past my server the other day), and thought I would post what messages I saw.

In dmesg i get this:

```

Non Fatal error on ida/c0d0

```

It is repeated a lot of times, and fills the entire output of dmesg.

I also get the same sort of message in /var/log/messages (or whatever your system log is set to).

It repeats about 4-5 times a min, and fills up the logs quickly  :Smile: .

As soon as i replaced the drive, the messages stopped and the system seems to be running fine.

The new disk has not finished syncing yet, so I will let you know if there is a message in the logs or something when the disk is finished syncing.

Hope this helps.

-Hurgh-

----------

## sanman

You could try using the insight agents that get supplied as part of the Smartstart CD's. I think they are RPM's. If there is a failure they write to the log or you can set them up to use SNMP and use an insight server.

----------

## richard.scott

I've got a DL360 G1 and I've just found this in portage:

```
*  sys-apps/arrayprobe

      Latest version available: 2.0

      Latest version installed: [ Not Installed ]

      Size of files: 81 kB

      Homepage:      http://www.strocamp.net/opensource/arrayprobe.php

      Description:   CLI utility that reports the status of a HP (Compaq) array controller (both IDA & CCISS supported).

      License:       GPL-2
```

and it does this as it rebuilds the array after replacing a drive:

```
# arrayprobe

WARNING Arrayprobe Logical drive 0 on /dev/ida/c0d0: Logical drive is is currently recovering

#
```

now I can keep an eye on my hard drives   :Cool: 

----------

