# Raid5: non-fresh drives after system crash [solved]

## ingemar

Hi

I have a 4x500G raid5 set up using software raid. Earlier today my system crashed (probably because some glitch with the graphics card..) After rebooting, I'm unable to mount my raid. I've read here and there that people have had similar problems after a system crash, but re-adding their drives to the raid solves the problems they are having. But in my case, that won't work...

Various commands and their outputs:

```
# dmesg | grep md

md: raid0 personality registered for level 0

md: raid1 personality registered for level 1

md: raid6 personality registered for level 6

md: raid5 personality registered for level 5

md: raid4 personality registered for level 4

md: Autodetecting RAID arrays.

md: Scanned 4 and added 4 devices.

md: autorun ...

md: considering sde1 ...

md:  adding sde1 ...

md:  adding sdd1 ...

md:  adding sdc1 ...

md:  adding sdb1 ...

md: created md0

md: bind<sdb1>

md: bind<sdc1>

md: bind<sdd1>

md: bind<sde1>

md: running: <sde1><sdd1><sdc1><sdb1>

md: kicking non-fresh sde1 from array!

md: unbind<sde1>

md: export_rdev(sde1)

md: kicking non-fresh sdd1 from array!

md: unbind<sdd1>

md: export_rdev(sdd1)

raid5: not enough operational devices for md0 (2/4 failed)

raid5: failed to run raid set md0

md: pers->run() failed ...

md: do_md_run() returned -5

md: md0 stopped.

md: unbind<sdc1>

md: export_rdev(sdc1)

md: unbind<sdb1>

md: export_rdev(sdb1)

md: ... autorun DONE.

md: md0 stopped.
```

```
# cat /proc/mdstat

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 

md0 : inactive sdb1[0](S) sde1[3](S) sdd1[2](S) sdc1[1](S)

      1953535744 blocks
```

```
# mdadm --examine --scan /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

ARRAY /dev/md0 level=raid5 num-devices=4 UUID=d4a96563:dc6433d2:42f0def5:658a0146
```

```
# mdadm --examine /dev/sdd1

/dev/sdd1:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : d4a96563:dc6433d2:42f0def5:658a0146

  Creation Time : Tue Feb 12 21:45:18 2008

     Raid Level : raid5

  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)

     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0

    Update Time : Sat Feb 23 15:19:57 2008

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : af432117 - correct

         Events : 0.48

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     2       8       49        2      active sync   /dev/sdd1

   0     0       8       17        0      active sync   /dev/sdb1

   1     1       8       33        1      active sync   /dev/sdc1

   2     2       8       49        2      active sync   /dev/sdd1

   3     3       8       65        3      active sync   /dev/sde1
```

```
# mdadm --examine /dev/sde1

/dev/sde1:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : d4a96563:dc6433d2:42f0def5:658a0146

  Creation Time : Tue Feb 12 21:45:18 2008

     Raid Level : raid5

  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)

     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0

    Update Time : Sat Feb 23 15:19:57 2008

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : af432129 - correct

         Events : 0.48

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     3       8       65        3      active sync   /dev/sde1

   0     0       8       17        0      active sync   /dev/sdb1

   1     1       8       33        1      active sync   /dev/sdc1

   2     2       8       49        2      active sync   /dev/sdd1

   3     3       8       65        3      active sync   /dev/sde1
```

```
# mdadm /dev/md0 --fail /dev/sdd1 --remove /dev/sdd1 

mdadm: cannot get array info for /dev/md0
```

```
# mdadm -A /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 

mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
```

Where others have succeeded (removing the non-fresh drives, and adding them again) mdadm fails in my system. All the drives shows up with fdisk -l /dev/sd[b-e]

I don't suspect any hardware issue, since others have had problems after a system crash, and I'm in the same scenario right now.. Does anyone have an idea about what could be wrong? Help would be much appreciated =)

----------

## Yak

I have a similar raid5 array, but I've never had to deal with something like this. I'm curious about the output of mdadm --examine on your sdb1 and sdc1. Could you post those also?

----------

## HeissFuss

Also, could we get mdadm -D /dev/md0?

The reason the array isn't starting is that the event count (presumably) on devices sde1 sdd1 are lower than for the array/other drives.  Was there anything going on with the array prior to the crash (recently added/failed drives or syncing?)

----------

## ingemar

Thank you for your replies!

It didn't occur to me to check the other discs, and it seems the output differ from the "faulty" discs and the ones that are not:

```
# mdadm --examine /dev/sdb1

/dev/sdb1:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : d4a96563:dc6433d2:42f0def5:658a0146

  Creation Time : Tue Feb 12 21:45:18 2008

     Raid Level : raid5

  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)

     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0

    Update Time : Sat Feb 23 15:30:32 2008

          State : clean

 Active Devices : 2

Working Devices : 2

 Failed Devices : 2

  Spare Devices : 0

       Checksum : af43238e - correct

         Events : 0.52

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     0       8       17        0      active sync   /dev/sdb1

   0     0       8       17        0      active sync   /dev/sdb1

   1     1       8       33        1      active sync   /dev/sdc1

   2     2       0        0        2      faulty removed

   3     3       0        0        3      faulty removed
```

The same goes for /dev/sdc1

And:

```
# mdadm -D /dev/md0        

mdadm: md device /dev/md0 does not appear to be active.
```

 *Quote:*   

> The reason the array isn't starting is that the event count (presumably) on devices sde1 sdd1 are lower than for the array/other drives. Was there anything going on with the array prior to the crash (recently added/failed drives or syncing?)

 

Yes, when the system crashed I was using the array quite extensively, although no reading operation.. I was extracting a set of rars containing a ~23G file, so I guess the array was under heavy load, but there were no operations regarding the administrative kind on the array. I've had this raid up and running for a few weeks, and similar rars have been extracted from it..

Thanks again for your help! All my music resides on this array, and I'd be devastated if it were ruined!

----------

## HeissFuss

I wonder what happened for the event count to get out of sync on half the drives...

If you're pretty confident that it wasn't a hardware issue with the drives/controller (are those two drives on the same controller by chance?) you should be able to force re-assemble safely.  

```

mdadm --stop /dev/md0

mdadm -A --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

```

Then run a fsck before trying to mount.

----------

## flybynite

I just went through this, I know it can be devastating.

If your data is very important you need to make copies of the disks before doing anything else because forcibly assembling the array changes it in ways that may make things worse.

I didn't make any backups but I was lucky  :Smile: 

I had a 3 disk raid - hda4,hde1,hdg1 that failed while rebuilding the degraded array so I had two kicked disks which won't work with raid 5.  I used -force to update the event counts.  Your filesystem might be broke so check it immediately after if it assembles.

 *Quote:*   

> 
> 
> mdadm --assemble /dev/md0 /dev/hda4 /dev/hde1 /dev/hdg1 --force
> 
> mdadm: forcing event count in /dev/hde1(1) from 60897278 upto 60897284
> ...

 

Note it used the fresher of the two kicked discs and started the array still degraded but now rebuildable.

In my case the filesystem wasn't in use so there was no damage to it.  Since your filesystem was in use it probably is damaged.  It might  fsck and lose only the files you were unpacking or the whole thing might be fubar'ed  :Sad: 

----------

## Yak

I've been reading about assemble mode in the mdadm man page. It talks about an option --update=summaries to update the superblocks, but I suppose that wouldn't do since the event count also differs. I also read about --force, but I would be pretty freaked out to try it without like flybynite said having a copy of the drives.  Maybe someone could elaborate on the differing event counts and the risks and extent of potential data loss?

----------

## ingemar

It's alive!

Forcibly assembling the raid seems to have worked, and fsck turned ok! I guess that I'm lucky I didn't extract the file to the raid itself, but to another drive in my system. I have to admit I was a little nervous while passing the --force argument, but the array is operational and mounted. And reading results in no errors whatsoever.. I'm so relieved! And thank you all for your help!

----------

## Yak

Glad it worked out okay for you  :Smile: 

That had to be rather nerve-wracking experience if you didn't have backups.

----------

## flybynite

 *Yak wrote:*   

> Maybe someone could elaborate on the differing event counts and the risks and extent of potential data loss?

 

You could write a book about that.

The raid 5 writes data1, data2 and checksum, but the power dies before checksum gets written to the disk, the old checksum is still there but is now incorrect.  The event counts won't match so we know something is wrong and the raid driver doesn't try to guess and refuses to assemble the array.

Now, how do you get the data?  

1. data1 + checksum

2. data2 + checksum

3. data1 + data2

Only 1 of the three will work, the others are garbage.

Best guess is to take the latest event count and hope but this may not work depending on the order the disks were actually written.

fsck only means the filesystem is coherent, the data might still have garbage in two out of the three ways to assemble the array.

That, unfortunately, is why they say raid is no substitute for backups.  Raid protects against a disk failure, it is still possible to corrupt the raid array when your dealing with power loss or certain hardware failures.

----------

## Yak

Thanks for the explanation.

----------

## ingemar

Today, my system crashed again, except for this time, I was not using it to anything else but just surfing the web. The raid has once again been marked as faulty and if I want it back up I guess using --force to assemble it is the only way.. My situation this time is exactly like before (meaning the outputs of mdadm and others in my first posts are the same) Although, first I'd like to ask for help interpreting this log message from the crash, and might help me pinpoint what the source of the problem could be..

The output is quite extensive, but I guess its better to paste more than less:

```
Up until this point everything seems to be working as it should...

Feb 24 18:09:45 vallby NVRM: Xid (0003:00): 13, 0000 01014200 00000062 00000188 000189fa 00000006

Feb 24 18:09:45 vallby ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xa frozen

Feb 24 18:09:45 vallby ata5.00: irq_stat 0x00400040, connection status changed

Feb 24 18:09:45 vallby ata5: SError: { PHYRdyChg DevExch }

Feb 24 18:09:45 vallby ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0

Feb 24 18:09:45 vallby res 40/00:00:bf:4b:38/00:00:3a:00:00/40 Emask 0x10 (ATA bus error)

Feb 24 18:09:45 vallby ata5.00: status: { DRDY }

Feb 24 18:09:45 vallby ata5: hard resetting link

Feb 24 18:09:45 vallby NVRM: Xid (0003:00): 13, 0000 01014200 00000062 00000184 000189fa 00000006

Feb 24 18:09:46 vallby NVRM: Xid (0003:00): 13, 0000 01014200 00000062 00000188 000189fa 00000006

Feb 24 18:09:46 vallby NVRM: Xid (0003:00): 13, 0000 01014200 00000062 00000184 000189fa 00000004

Feb 24 18:09:46 vallby NVRM: Xid (0003:00): 13, 0000 01014200 00000062 00000188 000189fa 00000006

Feb 24 18:09:47 vallby ata4.00: exception Emask 0x10 SAct 0x3 SErr 0x4010000 action 0xa frozen

Feb 24 18:09:47 vallby ata4.00: irq_stat 0x00400040, connection status changed

Feb 24 18:09:47 vallby ata4: SError: { PHYRdyChg DevExch }

Feb 24 18:09:47 vallby ata4.00: cmd 60/b0:00:3f:d5:a9/00:00:02:00:00/40 tag 0 ncq 90112 in

Feb 24 18:09:47 vallby res 40/00:04:3f:d5:a9/00:00:02:00:00/40 Emask 0x10 (ATA bus error)

Feb 24 18:09:47 vallby ata4.00: status: { DRDY }

Feb 24 18:09:47 vallby ata4.00: cmd 60/d0:08:ef:d5:a9/00:00:02:00:00/40 tag 1 ncq 106496 in

Feb 24 18:09:47 vallby res 40/00:04:3f:d5:a9/00:00:02:00:00/40 Emask 0x10 (ATA bus error)

Feb 24 18:09:47 vallby ata4.00: status: { DRDY }

Feb 24 18:09:47 vallby ata4: hard resetting link

Feb 24 18:09:47 vallby NVRM: Xid (0003:00): 13, 0000 01014200 00000062 00000188 000189fa 00000006

Feb 24 18:09:47 vallby NVRM: Xid (0003:00): 13, 0000 01014200 00000062 00000188 000189fa 00000006

Feb 24 18:09:47 vallby NVRM: Xid (0003:00): 13, 0000 01016100 0000008a 00000190 00018a09 00000002

Feb 24 18:09:48 vallby ata5: SATA link down (SStatus 0 SControl 300)

Feb 24 18:09:48 vallby ata5: failed to recover some devices, retrying in 5 secs

Feb 24 18:09:48 vallby ata4: SATA link down (SStatus 0 SControl 300)

Feb 24 18:09:48 vallby ata4: failed to recover some devices, retrying in 5 secs

Feb 24 18:09:53 vallby ata5: hard resetting link

Feb 24 18:09:53 vallby ata4: hard resetting link

Feb 24 18:09:54 vallby ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Feb 24 18:09:54 vallby ata5.00: configured for UDMA/133

Feb 24 18:09:54 vallby ata5: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4

Feb 24 18:09:54 vallby ata5: irq_stat 0x00000040, connection status changed

Feb 24 18:09:54 vallby ata5.00: configured for UDMA/133

Feb 24 18:09:54 vallby ata5: EH complete

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Write Protect is off

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Write Protect is off

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:09:54 vallby ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Feb 24 18:09:54 vallby ata4.00: configured for UDMA/133

Feb 24 18:09:54 vallby ata4: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4

Feb 24 18:09:54 vallby ata4: irq_stat 0x00000040, connection status changed

Feb 24 18:09:54 vallby ata4.00: configured for UDMA/133

Feb 24 18:09:54 vallby ata4: EH complete

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Write Protect is off

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Write Protect is off

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:09:55 vallby ata5.00: exception Emask 0x10 SAct 0x1 SErr 0x4010000 action 0xa frozen

Feb 24 18:09:55 vallby ata5.00: irq_stat 0x00400040, connection status changed

Feb 24 18:09:55 vallby ata5: SError: { PHYRdyChg DevExch }

Feb 24 18:09:55 vallby ata5.00: cmd 60/80:00:3f:d7:a9/00:00:02:00:00/40 tag 0 ncq 65536 in

Feb 24 18:09:55 vallby res 40/00:04:3f:d7:a9/00:00:02:00:00/40 Emask 0x10 (ATA bus error)

Feb 24 18:09:55 vallby ata5.00: status: { DRDY }

Feb 24 18:09:55 vallby ata5: hard resetting link

Feb 24 18:09:58 vallby ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Feb 24 18:09:58 vallby ata5.00: configured for UDMA/133

Feb 24 18:09:58 vallby ata5: EH complete

Feb 24 18:09:58 vallby sd 4:0:0:0: [sde] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:58 vallby sd 4:0:0:0: [sde] Write Protect is off

Feb 24 18:09:58 vallby sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00

Feb 24 18:09:58 vallby sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:11:17 vallby ata5: limiting SATA link speed to 1.5 Gbps

Feb 24 18:11:17 vallby ata5: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen

Feb 24 18:11:17 vallby ata5: irq_stat 0x00400040, connection status changed

Feb 24 18:11:17 vallby ata5: SError: { PHYRdyChg DevExch }

Feb 24 18:11:17 vallby ata5: hard resetting link

Feb 24 18:11:21 vallby ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Feb 24 18:11:21 vallby ata5.00: configured for UDMA/133

Feb 24 18:11:21 vallby ata5: EH complete

Feb 24 18:11:21 vallby sd 4:0:0:0: [sde] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:11:21 vallby sd 4:0:0:0: [sde] Write Protect is off

Feb 24 18:11:21 vallby sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00

Feb 24 18:11:21 vallby sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:11:21 vallby ata5: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xa frozen

Feb 24 18:11:21 vallby ata5: irq_stat 0x00400040, connection status changed

Feb 24 18:11:21 vallby ata5: SError: { PHYRdyChg DevExch }

Feb 24 18:11:21 vallby ata5: hard resetting link

Feb 24 18:11:21 vallby ata4.00: exception Emask 0x10 SAct 0x1 SErr 0x4010000 action 0xa frozen

Feb 24 18:11:21 vallby ata4.00: irq_stat 0x00400040, connection status changed

Feb 24 18:11:21 vallby ata4: SError: { PHYRdyChg DevExch }

Feb 24 18:11:21 vallby ata4.00: cmd 60/08:00:37:5c:18/00:00:0d:00:00/40 tag 0 ncq 4096 in

Feb 24 18:11:21 vallby res 40/00:04:37:5c:18/00:00:0d:00:00/40 Emask 0x10 (ATA bus error)

Feb 24 18:11:21 vallby ata4.00: status: { DRDY }

Feb 24 18:11:21 vallby ata4: hard resetting link

Feb 24 18:11:22 vallby ata4: SATA link down (SStatus 0 SControl 300)

Feb 24 18:11:22 vallby ata4: failed to recover some devices, retrying in 5 secs

Feb 24 18:11:22 vallby ata5: SATA link down (SStatus 0 SControl 310)

Feb 24 18:11:22 vallby ata5: failed to recover some devices, retrying in 5 secs

Feb 24 18:11:27 vallby ata5: hard resetting link

Feb 24 18:11:30 vallby Clocksource tsc unstable (delta = 294808342 ns)

Feb 24 18:11:30 vallby Time: hpet clocksource has been installed.

Feb 24 18:11:32 vallby ata5: SATA link down (SStatus 0 SControl 310)

Feb 24 18:11:33 vallby ata5: failed to recover some devices, retrying in 5 secs

Feb 24 18:11:36 vallby ata5: hard resetting link

Feb 24 18:11:38 vallby ata5: SATA link down (SStatus 0 SControl 310)

Feb 24 18:11:38 vallby ata5.00: disabled

Feb 24 18:11:38 vallby ata4: hard resetting link

Feb 24 18:11:38 vallby sd 4:0:0:0: rejecting I/O to offline device

Feb 24 18:11:38 vallby sd 4:0:0:0: [sde] Result: hostbyte=0x01 driverbyte=0x00

Feb 24 18:11:38 vallby end_request: I/O error, dev sde, sector 219700655

Feb 24 18:11:38 vallby raid5: Disk failure on sde1, disabling device. Operation continuing on 3 devices

Feb 24 18:11:38 vallby sd 4:0:0:0: rejecting I/O to offline device

Feb 24 18:11:38 vallby sd 4:0:0:0: rejecting I/O to offline device

Feb 24 18:11:38 vallby ata4: SATA link down (SStatus 0 SControl 300)

Feb 24 18:11:38 vallby ata4: failed to recover some devices, retrying in 5 secs

Feb 24 18:11:38 vallby ata5: EH complete

Feb 24 18:11:38 vallby ata5.00: detaching (SCSI 4:0:0:0)

Feb 24 18:11:38 vallby sd 4:0:0:0: [sde] Synchronizing SCSI cache

Feb 24 18:11:38 vallby sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Feb 24 18:11:38 vallby sd 4:0:0:0: [sde] Stopping disk

Feb 24 18:11:38 vallby sd 4:0:0:0: [sde] START_STOP FAILED

Feb 24 18:11:38 vallby sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Feb 24 18:11:44 vallby ata4: hard resetting link

Feb 24 18:11:44 vallby ata4: SATA link down (SStatus 0 SControl 300)

Feb 24 18:11:44 vallby ata4.00: disabled

Feb 24 18:11:45 vallby ata4: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4

Feb 24 18:11:45 vallby ata4: irq_stat 0x00000040, connection status changed

Feb 24 18:11:45 vallby ata4: hard resetting link

Feb 24 18:11:45 vallby ata4: SATA link down (SStatus 0 SControl 300)

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] Sense Key : 0xb [current] [descriptor]

Feb 24 18:11:45 vallby Descriptor sense data with sense descriptors (in hex):

Feb 24 18:11:45 vallby 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 

Feb 24 18:11:45 vallby 0d 18 5c 37 

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] ASC=0x0 ASCQ=0x0

Feb 24 18:11:45 vallby end_request: I/O error, dev sdd, sector 219700279

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 219700216 on sdd1).

Feb 24 18:11:45 vallby raid5: Disk failure on sdd1, disabling device. Operation continuing on 2 devices

Feb 24 18:11:45 vallby sd 3:0:0:0: rejecting I/O to offline device

Feb 24 18:11:45 vallby sd 3:0:0:0: rejecting I/O to offline device

Feb 24 18:11:45 vallby sd 3:0:0:0: rejecting I/O to offline device

Feb 24 18:11:45 vallby sd 3:0:0:0: rejecting I/O to offline device

Feb 24 18:11:45 vallby ata4: EH complete

Feb 24 18:11:45 vallby ata4.00: detaching (SCSI 3:0:0:0)

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] Result: hostbyte=0x01 driverbyte=0x00

Feb 24 18:11:45 vallby end_request: I/O error, dev sdd, sector 136626623

Feb 24 18:11:45 vallby md: super_written gets error=-5, uptodate=0

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626816 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626824 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626832 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626840 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626848 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626856 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626864 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626872 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626880 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626888 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626896 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626904 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626912 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626920 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626928 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626936 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626944 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626952 on sdd1).

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626960 on sdd1).

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] Synchronizing SCSI cache

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] Stopping disk

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] START_STOP FAILED

Feb 24 18:11:45 vallby sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

Feb 24 18:11:45 vallby RAID5 conf printout:

Feb 24 18:11:45 vallby --- rd:4 wd:2

Feb 24 18:11:45 vallby disk 0, o:1, dev:sdb1

Feb 24 18:11:45 vallby disk 1, o:1, dev:sdc1

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388030

Feb 24 18:11:45 vallby disk 2, o:0, dev:sdd1

Feb 24 18:11:45 vallby disk 3, o:0, dev:sde1

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388031

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388016

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388032

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388017

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388033

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388018

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388034

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388019

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388035

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby RAID5 conf printout:

Feb 24 18:11:45 vallby --- rd:4 wd:2

Feb 24 18:11:45 vallby disk 0, o:1, dev:sdb1

Feb 24 18:11:45 vallby disk 1, o:1, dev:sdc1

Feb 24 18:11:45 vallby disk 2, o:0, dev:sdd1

Feb 24 18:11:45 vallby RAID5 conf printout:

Feb 24 18:11:45 vallby --- rd:4 wd:2

Feb 24 18:11:45 vallby disk 0, o:1, dev:sdb1

Feb 24 18:11:45 vallby disk 1, o:1, dev:sdc1

Feb 24 18:11:45 vallby disk 2, o:0, dev:sdd1

Feb 24 18:11:45 vallby RAID5 conf printout:

Feb 24 18:11:45 vallby --- rd:4 wd:2

Feb 24 18:11:45 vallby disk 0, o:1, dev:sdb1

Feb 24 18:11:45 vallby disk 1, o:1, dev:sdc1

Feb 24 18:11:45 vallby Aborting journal on device md0.

Feb 24 18:11:45 vallby mdadm: Fail event detected on md device /dev/md0, component device /dev/sdd1

Feb 24 18:11:45 vallby mdadm: Fail event detected on md device /dev/md0, component device /dev/sde1

Feb 24 18:11:45 vallby ext3_abort called.

Feb 24 18:11:45 vallby EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal

Feb 24 18:11:45 vallby Remounting filesystem read-only
```

Might there be a problem with the sata controller on the motherboard after all? Or is there insufficient power for the drives to work as expected? The reason for me doubting hardware issues is because both sdd and sde are brand new, along with the motherboard. I've had them for a few weeks, and there were no problems whatsoever when I copied data back from various backup disks and via ssh... But maybe I shouldn't trust new hardware as much as I do?

Edit: After rebooting, with the array unoperational, but with the disks plugged in, my system still crashes.. This is what /var/log/messages has to say about it:

```
Feb 24 19:00:58 vallby ata5: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xa frozen

Feb 24 19:00:58 vallby ata5: irq_stat 0x00400040, connection status changed

Feb 24 19:00:58 vallby ata5: SError: { PHYRdyChg DevExch }

Feb 24 19:00:58 vallby ata5: hard resetting link

Feb 24 19:00:58 vallby ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xa frozen

Feb 24 19:00:58 vallby ata4: irq_stat 0x00400040, connection status changed

Feb 24 19:00:58 vallby ata4: SError: { PHYRdyChg DevExch }

Feb 24 19:00:58 vallby ata4: hard resetting link

Feb 24 19:00:59 vallby ata5: SATA link down (SStatus 0 SControl 300)

Feb 24 19:00:59 vallby ata4: SATA link down (SStatus 0 SControl 300)

Feb 24 19:00:59 vallby ata4: failed to recover some devices, retrying in 5 secs

Feb 24 19:00:59 vallby ata5: failed to recover some devices, retrying in 5 secs

Feb 24 19:01:04 vallby ata5: hard resetting link

Feb 24 19:01:04 vallby ata4: hard resetting link

Feb 24 19:01:04 vallby ata4: SATA link down (SStatus 0 SControl 300)

Feb 24 19:01:04 vallby ata5: SATA link down (SStatus 0 SControl 300)

Feb 24 19:01:04 vallby ata5: failed to recover some devices, retrying in 5 secs

Feb 24 19:01:04 vallby ata4: failed to recover some devices, retrying in 5 secs

Feb 24 19:01:09 vallby ata5: hard resetting link

Feb 24 19:01:09 vallby ata4: hard resetting link

Feb 24 19:01:09 vallby ata4: SATA link down (SStatus 0 SControl 300)

Feb 24 19:01:09 vallby ata5: SATA link down (SStatus 0 SControl 300)

Feb 24 19:01:09 vallby ata5.00: disabled

Feb 24 19:01:09 vallby ata4.00: disabled

Feb 24 19:01:10 vallby ata4: EH complete

Feb 24 19:01:10 vallby ata5: EH complete

Feb 24 19:01:10 vallby ata4.00: detaching (SCSI 3:0:0:0)

Feb 24 19:01:10 vallby sd 3:0:0:0: [sdd] Synchronizing SCSI cache

Feb 24 19:01:10 vallby sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

Feb 24 19:01:10 vallby sd 3:0:0:0: [sdd] Stopping disk

Feb 24 19:01:10 vallby sd 3:0:0:0: [sdd] START_STOP FAILED

Feb 24 19:01:10 vallby sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

Feb 24 19:01:10 vallby ata5.00: detaching (SCSI 4:0:0:0)

Feb 24 19:01:10 vallby sd 4:0:0:0: [sde] Synchronizing SCSI cache

Feb 24 19:01:10 vallby sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00

Feb 24 19:01:10 vallby sd 4:0:0:0: [sde] Stopping disk

Feb 24 19:01:10 vallby sd 4:0:0:0: [sde] START_STOP FAILED

Feb 24 19:01:10 vallby sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00
```

All that happened is that my screen just went black, but I am able to use ssh to log in..

----------

## Yak

Sort of a rough translation..

```
Feb 24 18:09:45 vallby ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xa frozen

Feb 24 18:09:45 vallby ata5.00: irq_stat 0x00400040, connection status changed

Feb 24 18:09:45 vallby ata5: SError: { PHYRdyChg DevExch }

Feb 24 18:09:45 vallby ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0

Feb 24 18:09:45 vallby res 40/00:00:bf:4b:38/00:00:3a:00:00/40 Emask 0x10 (ATA bus error)

Feb 24 18:09:45 vallby ata5.00: status: { DRDY }

Feb 24 18:09:45 vallby ata5: hard resetting link

Feb 24 18:09:47 vallby ata4.00: exception Emask 0x10 SAct 0x3 SErr 0x4010000 action 0xa frozen

Feb 24 18:09:47 vallby ata4.00: irq_stat 0x00400040, connection status changed

Feb 24 18:09:47 vallby ata4: SError: { PHYRdyChg DevExch }

```

It's like the sata cables lost connection or were unplugged. 

The stuff in this lkml thread looks really similar.. http://lkml.org/lkml/2007/1/12/25

```

Feb 24 18:09:53 vallby ata5: hard resetting link

Feb 24 18:09:53 vallby ata4: hard resetting link

Feb 24 18:09:54 vallby ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Feb 24 18:09:54 vallby ata5.00: configured for UDMA/133

Feb 24 18:09:54 vallby ata5: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4

Feb 24 18:09:54 vallby ata5: irq_stat 0x00000040, connection status changed

Feb 24 18:09:54 vallby ata5.00: configured for UDMA/133

Feb 24 18:09:54 vallby ata5: EH complete

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Write Protect is off

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Write Protect is off

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00

Feb 24 18:09:54 vallby sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:09:54 vallby ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Feb 24 18:09:54 vallby ata4.00: configured for UDMA/133

Feb 24 18:09:54 vallby ata4: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4

Feb 24 18:09:54 vallby ata4: irq_stat 0x00000040, connection status changed

Feb 24 18:09:54 vallby ata4.00: configured for UDMA/133

Feb 24 18:09:54 vallby ata4: EH complete

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Write Protect is off

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Write Protect is off

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00

Feb 24 18:09:54 vallby sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

```

Both sata links are reset and sde and sdd are available again. The links fail and reset a couple times..

```

Feb 24 18:11:38 vallby sd 4:0:0:0: rejecting I/O to offline device

Feb 24 18:11:38 vallby sd 4:0:0:0: [sde] Result: hostbyte=0x01 driverbyte=0x00

Feb 24 18:11:38 vallby end_request: I/O error, dev sde, sector 219700655

```

Until a read (or write?) to sde fails while the sata link is still down.

```

Feb 24 18:11:38 vallby raid5: Disk failure on sde1, disabling device. Operation continuing on 3 devices

```

raid5 sees the failure and disables sde.  

```

Feb 24 18:11:45 vallby end_request: I/O error, dev sdd, sector 219700279

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 219700216 on sdd1).

Feb 24 18:11:45 vallby raid5: Disk failure on sdd1, disabling device. Operation continuing on 2 devices

```

But the same thing soon happens with sdd.  Then the array fails and a whole bunch of read/write errors on md0 follow...

```

Feb 24 18:11:45 vallby md: super_written gets error=-5, uptodate=0

Feb 24 18:11:45 vallby raid5:md0: read error not correctable (sector 136626816 on sdd1).

Feb 24 18:11:45 vallby lost page write due to I/O error on md0

Feb 24 18:11:45 vallby Buffer I/O error on device md0, logical block 82388031

```

This all happens within a couple minutes. I'm not sure, but I would think if it were something with hotplug or the sata drivers it would fail right off, not run for awhile and then die. What about sda and sdb? Are they the same model drive? Connected to the motherboard also? Same controller chipset? Changed kernels recently?

The power supply is a possibility. I know you're running at least four drives.. whats the rest of your hardware setup like? 

It could also be the sata cables or even the drives themselves. Two new drives both going bad would be a pretty rotten coincidence though.

----------

## ingemar

That translations seems very reasonable.. Thank you!

sda is my "main" drive, from which i run Linux, and where my /home partition resides as well.. sdb and sdc are the exact same model as sdd and sdc, the only exception here is that these drives have been in my system for about six months running as a raid 0 without any problems.. All my drives are plugged in to the motherboard controller, so I guess the same chipset handles all these drives? And as it goes for changing kernels recently, this system is only a few weeks old, so there hasn't been much time for changing kernels..

I'll try replacing the sata cable to sde, and if that doesn't make any difference, I'll try to put sdd and sde on an external power supply to see if that helps..

But there's one thing I'd like to ask: isn't it only sde that is misbehaving? When I try running 'badblocks' on /dev/sdd1, I get nothing. Not in the terminal window and not anything in /var/log/messages, but when I try the same thing on /dev/sde1 the drive freaks out again, and looses the connection, and resets it again.. Might there be a problem with just sde, and this brings sdd down with it?

----------

## Cyker

 *flybynite wrote:*   

>  *Yak wrote:*   Maybe someone could elaborate on the differing event counts and the risks and extent of potential data loss? 
> 
> You could write a book about that.
> 
> The raid 5 writes data1, data2 and checksum, but the power dies before checksum gets written to the disk, the old checksum is still there but is now incorrect.  The event counts won't match so we know something is wrong and the raid driver doesn't try to guess and refuses to assemble the array.
> ...

 

Do you know if enabling the "write-intent-bitmap" (mdadm /dev/mdX -Gb internal) will help with that scenario?

I enabled it so that if I crashed, the RAID wouldn't spend 5 days running a consistency check on itself, but don't know if it will help things at stripe level...

----------

## Yak

 *ingemar wrote:*   

> That translations seems very reasonable.. Thank you!
> 
> sda is my "main" drive, from which i run Linux, and where my /home partition resides as well.. sdb and sdc are the exact same model as sdd and sdc, the only exception here is that these drives have been in my system for about six months running as a raid 0 without any problems.. All my drives are plugged in to the motherboard controller, so I guess the same chipset handles all these drives? And as it goes for changing kernels recently, this system is only a few weeks old, so there hasn't been much time for changing kernels..

 

Probably the same chipset then. Some motherboards have an extra controller chipset with fake raid features, those usually have a different color sata connector on the motherboard.  I would need to know the motherboard model or you could look it up in the manual to be sure. If they are all on the same controller then I'd think some driver issue is not so likely. Even less if it's been running fine for a few weeks with the same kernel drivers.

 *Quote:*   

> 
> 
> But there's one thing I'd like to ask: isn't it only sde that is misbehaving? When I try running 'badblocks' on /dev/sdd1, I get nothing. Not in the terminal window and not anything in /var/log/messages, but when I try the same thing on /dev/sde1 the drive freaks out again, and looses the connection, and resets it again.. Might there be a problem with just sde, and this brings sdd down with it?

 

Interesting.. Yeah in the previous error messages you've posted sde always bombed out first, followed shortly by sdd. I know with PATA drives on the same cable this is likely, but I thought with sata drives on separate cables it would be different.. maybe not though. This makes me suspect a drive/cable/controller problem more than a power supply problem, though. After all, when you are testing with badblocks only one drive is under load, but when reading/writing to the array all 4 are under load. 

If replacing the cable to sde doesn't fix it then you could try exchanging sde for sdb or sdc and see if the errors follow the same drive or repeat on the same sata port. Though if you start swapping cables draw a diagram of how they were connected to the motherboard or something so you don't get them all mixed up! And maybe sticky notes on the drives  :Smile:  Then you can test each one at a time with badblocks.  Not sure how software raid like having the drives all switched around though, might want to put them back after testing just to be safe. 

Or if you have access to another machine or an external enclosure you could isolate the sde drive that way and see if it errors again.

----------

## HeissFuss

I agree that it looks like further testing is necessary.  It could be that the mobo ties SATA ports in pairs (wouldn't surprise me.)  If you have any extra ones, see if movinf sdd to a different one isolates it from sde.

With raid autodetect partition type the cable ordering won't matter.  It'll assemble in order.

----------

## ingemar

I tried switching cables, but still keeping sde as sde, but I get the same errors... Switching place with sdd and sde on the motherboard results in sdd failing instead, so I guess the drive is faulty.. I've contacted the place where I bought it, and they are letting me send it back for a replacement. And I guess they are going to test it as well.. I just hope the error shows up in their tests as well =S

----------

## CombinedEffort

Hi,

I was intrigued by your problem, since I've had similar over the last few months:

https://forums.gentoo.org/viewtopic-t-641372-highlight-.html

I went as far as replacing ALL the disks in the RAID (any excuse for an upgrade), but the problem persisted.

My array seems to be stable on 2.6.22-gentoo-r5 - anything 2.6.23 or later *seems* to cause the problem.

I'm curious as to your hardware setup, since, IMHO, I think this may be a driver issue:

```
ted ~ # lspci

00:00.0 Host bridge: nVidia Corporation C55 Host Bridge (rev a2)

00:00.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:00.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:00.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:00.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:00.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a2)

00:00.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:00.7 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:01.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:01.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:01.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:01.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:01.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:01.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:01.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:02.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:02.1 RAM memory: nVidia Corporation Unknown device 03bc (rev a1)

00:02.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)

00:03.0 PCI bridge: nVidia Corporation C55 PCI Express bridge (rev a1)

00:09.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1)

00:0a.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2)

00:0a.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2)

00:0b.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)

00:0b.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)

00:0d.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)

00:0e.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)

00:0e.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)

00:0e.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)

00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)

00:0f.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2)

00:18.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)

01:00.0 VGA compatible controller: nVidia Corporation G71 [GeForce 7300 GS] (rev a1)

02:06.0 Multimedia controller: Philips Semiconductors SAA7146 (rev 01)

02:07.0 Multimedia video controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder (rev 05)

02:07.1 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder [Audio Port] (rev 05)

02:07.2 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder [MPEG Port] (rev 05)

02:07.4 Multimedia controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder [IR Port] (rev 05)

02:08.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)

02:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)

03:00.0 VGA compatible controller: nVidia Corporation G71 [GeForce 7300 GS] (rev a1)

```

Cheers,

Rich.

----------

## ingemar

This is what's in by box:

```
# lspci 

00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02)

00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev 02)

00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev 02)

00:1a.2 USB Controller: Intel Corporation USB UHCI Controller #6 (rev 02)

00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2 (rev 02)

00:1b.0 Audio device: Intel Corporation HD Audio Controller (rev 02)

00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 02)

00:1c.4 PCI bridge: Intel Corporation PCI Express Port 5 (rev 02)

00:1c.5 PCI bridge: Intel Corporation PCI Express Port 6 (rev 02)

00:1d.0 USB Controller: Intel Corporation USB UHCI Controller #1 (rev 02)

00:1d.1 USB Controller: Intel Corporation USB UHCI Controller #2 (rev 02)

00:1d.2 USB Controller: Intel Corporation USB UHCI Controller #3 (rev 02)

00:1d.7 USB Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)

00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)

00:1f.2 SATA controller: Intel Corporation 6 port SATA AHCI Controller (rev 02)

00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)

01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)

02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101 single-port PATA133 interface (rev b2)

03:00.0 VGA compatible controller: nVidia Corporation NV42 [GeForce 6800 XT] (rev a2)

04:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev c0)

```

..and all the drives are plugged into intel's AHCI sata-controller, but I guess your's are in a nVidia controller, so the driver should be pretty different.. I kind of hope that my drive is faulty now, otherwise I'll have to pay a fee for checking.. But I guess I shouldn't worry too much...

----------

## CombinedEffort

What kernel version are you running?

My SATA drivers look like this:

```
ted linux # cat .config| grep SATA

# CONFIG_BLK_DEV_IDE_SATA is not set

CONFIG_SATA_AHCI=y

# CONFIG_SATA_SVW is not set

# CONFIG_SATA_MV is not set

CONFIG_SATA_NV=y

# CONFIG_SATA_QSTOR is not set

# CONFIG_SATA_PROMISE is not set

# CONFIG_SATA_SX4 is not set

# CONFIG_SATA_SIL is not set

# CONFIG_SATA_SIL24 is not set

# CONFIG_SATA_SIS is not set

# CONFIG_SATA_ULI is not set

# CONFIG_SATA_VIA is not set

# CONFIG_SATA_VITESSE is not set

# CONFIG_SATA_INIC162X is not set

```

Cheers,

Rich.

----------

## ingemar

Hello.. I just wanted to drop by and say that everything seems to work now! I've replaced sde with a new disk, and the array has completely recovered! I guess I'll have to count on some data corruption, but most of the files seems to be intact... Thanks for all the help!

----------

