# Problems with nvidia sata drivers or RAID

## Raboo

Hello

I'm getting some nasty looking error messages in my syslog and I wonder what is wrong and how can I fix it?

Perhaps it's because I'm running FakeRAID/BIOSRaid using dm-raid??

Here is a sample:

```
attempt to access beyond end of device

sdb: rw=0, want=2930303995, limit=976773168

Buffer I/O error on device sdb4, logical block 351789328

attempt to access beyond end of device

sdd: rw=0, want=2930303995, limit=976773168

Buffer I/O error on device sdd4, logical block 351789328
```

Here is the full log. I've cut in everything that I think got to do with the SATA disks/drivers from dmesg:

```
Driver 'sd' needs updating - please use bus_type methods

Driver 'sr' needs updating - please use bus_type methods

--- snip ---

sata_nv 0000:00:0e.0: version 3.5

--- snip ---

scsi0 : sata_nv

scsi1 : sata_nv

ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xf700 irq 21

ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xf708 irq 21

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata1.00: ATA-7: WDC WD5000AAKS-00TMA0, 12.01C01, max UDMA/133

ata1.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)

ata1.00: configured for UDMA/133

ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata2.00: ATA-7: ST3500630AS, 3.AAK, max UDMA/133

ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)

ata2.00: configured for UDMA/133

scsi 0:0:0:0: Direct-Access     ATA      WDC WD5000AAKS-0 12.0 PQ: 0 ANSI: 5

sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)

sd 0:0:0:0: [sda] Write Protect is off

sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00

sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)

sd 0:0:0:0: [sda] Write Protect is off

sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00

sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 sda: unknown partition table

sd 0:0:0:0: [sda] Attached SCSI disk

sd 0:0:0:0: Attached scsi generic sg0 type 0

scsi 1:0:0:0: Direct-Access     ATA      ST3500630AS      3.AA PQ: 0 ANSI: 5

sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)

sd 1:0:0:0: [sdb] Write Protect is off

sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00

sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)

sd 1:0:0:0: [sdb] Write Protect is off

sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00

sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 sdb: sdb1 sdb2 sdb3 sdb4

 sdb: p4 exceeds device capacity

sd 1:0:0:0: [sdb] Attached SCSI disk

sd 1:0:0:0: Attached scsi generic sg1 type 0

ACPI: PCI Interrupt Link [APSJ] enabled at IRQ 20

ACPI: PCI Interrupt 0000:00:0e.1[B] -> Link [APSJ] -> GSI 20 (level, low) -> IRQ 20

PCI: Setting latency timer of device 0000:00:0e.1 to 64

scsi2 : sata_nv

scsi3 : sata_nv

ata3: SATA max UDMA/133 cmd 0x9e0 ctl 0xbe0 bmdma 0xf200 irq 20

ata4: SATA max UDMA/133 cmd 0x960 ctl 0xb60 bmdma 0xf208 irq 20

ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata3.00: ATA-8: ST3500320AS, AD14, max UDMA/133

ata3.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)

ata3.00: configured for UDMA/133

ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata4.00: ATA-8: ST3500320AS, SD15, max UDMA/133

ata4.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)

ata4.00: configured for UDMA/133

scsi 2:0:0:0: Direct-Access     ATA      ST3500320AS      AD14 PQ: 0 ANSI: 5

sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)

sd 2:0:0:0: [sdc] Write Protect is off

sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00

sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)

sd 2:0:0:0: [sdc] Write Protect is off

sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00

sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 sdc: unknown partition table

sd 2:0:0:0: [sdc] Attached SCSI disk

sd 2:0:0:0: Attached scsi generic sg2 type 0

scsi 3:0:0:0: Direct-Access     ATA      ST3500320AS      SD15 PQ: 0 ANSI: 5

sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)

sd 3:0:0:0: [sdd] Write Protect is off

sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00

sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

sd 3:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)

sd 3:0:0:0: [sdd] Write Protect is off

sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00

sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 sdd: sdd1 sdd2 sdd3 sdd4

 sdd: p4 exceeds device capacity

sd 3:0:0:0: [sdd] Attached SCSI disk

sd 3:0:0:0: Attached scsi generic sg3 type 0

--- snip ---

device-mapper: uevent: version 1.0.3

device-mapper: ioctl: 4.12.0-ioctl (2007-10-02) initialised: dm-devel@redhat.com

device-mapper: dm-raid45: initialized v0.2427

--- snip ---

device-mapper: dm-raid45: /dev/sdb is raid disk 0

device-mapper: dm-raid45: /dev/sda is raid disk 1

device-mapper: dm-raid45: /dev/sdc is raid disk 2

device-mapper: dm-raid45: /dev/sdd is raid disk 3

device-mapper: dm-raid45: 128/128/256 sectors chunk/io/recovery size, 64 stripes

device-mapper: dm-raid45: algorithm "xor_32", 8 chunks with 8212MB/s

device-mapper: dm-raid45: RAID5 (left symmetric) set with net 3/4 devices

device-mapper: dm-raid45: No regions to recover

--- snip ---

attempt to access beyond end of device

sdb: rw=0, want=2930303995, limit=976773168

Buffer I/O error on device sdb4, logical block 351789328

attempt to access beyond end of device

sdb: rw=0, want=2930303995, limit=976773168

Buffer I/O error on device sdb4, logical block 351789328

attempt to access beyond end of device

sdb: rw=0, want=2930304187, limit=976773168

Buffer I/O error on device sdb4, logical block 351789352

attempt to access beyond end of device

sdb: rw=0, want=2930304187, limit=976773168

Buffer I/O error on device sdb4, logical block 351789352

attempt to access beyond end of device

sdd: rw=0, want=2930303995, limit=976773168

Buffer I/O error on device sdd4, logical block 351789328

attempt to access beyond end of device

sdd: rw=0, want=2930303995, limit=976773168

Buffer I/O error on device sdd4, logical block 351789328

attempt to access beyond end of device

sdd: rw=0, want=2930304187, limit=976773168

Buffer I/O error on device sdd4, logical block 351789352

attempt to access beyond end of device

sdd: rw=0, want=2930304187, limit=976773168

Buffer I/O error on device sdd4, logical block 351789352

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

Buffer I/O error on device sdd4, logical block 351789353

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

Buffer I/O error on device sdd4, logical block 351789353

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304139, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304187, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304139, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304187, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

--- snip ---

attempt to access beyond end of device

sdb: rw=0, want=2930303995, limit=976773168

Buffer I/O error on device sdb4, logical block 351789328

attempt to access beyond end of device

sdb: rw=0, want=2930303995, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304187, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304187, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304139, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304187, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdb: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930303995, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930303995, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304187, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304187, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304139, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304187, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

attempt to access beyond end of device

sdd: rw=0, want=2930304195, limit=976773168

--- snip ---
```

Also I don't know if it's needed but here is my fstab

```
# /etc/fstab: static file system information.

/dev/mapper/nvidia_egbdcdib1      /boot      ext2      noauto,noatime   1 2

/dev/mapper/nvidia_egbdcdib3      /      ext3      defaults,noatime   0 1

/dev/cdrom               /mnt/cdrom   auto      noauto,ro   0 0

shm         /dev/shm   tmpfs         nodev,nosuid,noexec   0 0

tmpfs         /tmp      tmpfs      size=1000m,mode=1777   0 0

tmpfs         /var/tmp/portage tmpfs   size=4000m,uid=250,gid=250,mode=0775   0 0

/dev/mapper/nvidia_egbdcdib4 /mnt/storage   ntfs-3g users,user,noauto,silent,umask=000    0 0

/dev/mapper/nvidia_egbdcdib2 /mnt/win      ntfs-3g users,user,noauto,silent,umask=000    0 0
```

----------

## NeddySeagoon

Raboo,

Something is trying to access partitions on an individual drive and thats not allowed.

On a dm-raid raid set, you have a single partition table that describes the partitions on the volume set.

Look with 

```
fdisk -l
```

only one of the drives in your raid set should have a partition table describing all the space.

```
attempt to access beyond end of device

sdb: rw=0, want=2930303995, limit=976773168

Buffer I/O error on device sdb4, logical block 351789328 
```

your error comes from something using the partition table, describing several drives worth of space, trying to access something on a single drive.

Depending on your BIOS, you may be able to see /dev/sd..  entries for the underlying drives and even the partitions on the drive holding the partition table but nothing should ever try to use them.  

Looking deeping into your dmesg ...

```
 sda: unknown partition table

 sdb: sdb1 sdb2 sdb3 sdb4

 sdb: p4 exceeds device capacity 

 sdc: unknown partition table 

 sdd: sdd1 sdd2 sdd3 sdd4

 sdd: p4 exceeds device capacity 
```

shows you have two copies of the partition table, I was expecting only one, but  its raid5 and it would never do to loose your only copy of the partition table, that was faulty thinking on my part ... so thats ok.   

So, what is doing accesses to the underlying drives ?

----------

## Raboo

 *NeddySeagoon wrote:*   

> So, what is doing accesses to the underlying drives ?

 

Good question.. How do i get help with that? Do I make a list of processes runing and and whats in the crontab?

----------

## NeddySeagoon

Raboo,

reboot and post the entire dmesg ... it may just be there.

Whatever it is, is doomed to fail one way or another. Even if it asks for a block number that actually exists, it won't like what is read.

Something is doing raw device access on a device its not supposed to. Rebuild your kernel with timestamps on. Its under 

```
 Kernel hacking  ---> 

[*] Show timing information on printks    
```

this puts timestamps on kernel messages. Maybe something is polling the drive, e.g. an automounter trying to do something.

----------

## Raboo

Okay, hope it helps, it's really long the output of my dmesg:

 *Quote:*   

> [    0.000000] Linux version 2.6.25-gentoo-r7 (root@chefen) (gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2)) #4 SMP Fri Sep 19 00:49:17 CEST 2008
> 
> [    0.000000] Command line: root=/dev/ram0 init=/linuxrc ramdisk=8192 real_root=/dev/mapper/nvidia_egbdcdib3 dodmraid
> 
> [    0.000000] BIOS-provided physical RAM map:
> ...

 

----------

## NeddySeagoon

Raboo,

You have both dmraid and kernel raid built

```
[ 9.177288] md: raid6 personality registered for level 6

[ 9.177290] md: raid5 personality registered for level 5

[ 9.177292] md: raid4 personality registered for level 4

[ 9.312424] md: raid10 personality registered for level 10 
```

It should be harmless but kernel raid will try to auto assemble raid sets from the underlying partitions if the partition type is 0xfd.

Assembling raid sets needs read to the underlying partitions, which is what you are seeing.

Rebuild your kernel with

```
  │ │    --- Multiple devices driver support (RAID and LVM)              │ │  

  │ │    <>   RAID support   
```

 thats kernel raid off. Its not used by dmraid.

----------

## Raboo

 *NeddySeagoon wrote:*   

> 
> 
> Rebuild your kernel with
> 
> ```
> ...

 

I tried that, it didn't help.. could it be the device-mapper or something?

----------

## NeddySeagoon

Raboo,

Post the startup dmesg again please - that was all I could see last time.

Its shouldn't be device mapper, that should use the BIOS provided mappings to access the composite device.

----------

## Raboo

It seems it doesn't fit in a post so I've post it on pastebin http://pastebin.com/f7583b021

btw NeddySeagoon, thank you for the interest in my problem

----------

## NeddySeagoon

Raboo,

Whatever it is, appears to have two goes about 10 seconds apart, then gives up.

Do you have automounter support in your kernel?

This polls drives to see if media has been inserted. It not supposed to poll hard drives but is it ?

Its another thing that does low level raw device access.

If you like automounter support (I don't) you probably need it. 

IF the problem is really two bursts at startup, I would be tempted to let it go.

You need to rebuild your kernel if you want to try without automounter support, but like I say, you may not like the answer, so is it worth asking the question.

----------

## Raboo

I tried disabling the automounter in the kernel but it didn't help..

----------

## danomac

I've been struggling with this for a few days now. I'm using nvraid with a raid10 over 4 drives.

While I have a possible solution, it sure isn't pretty, but it does shut up the flood of messages.

The issue lies with how the raids store the partition tables: in my case there are two sets raided together to get the raid10. The RAID stores the partition table on the first disk of the array, and of course the partition tables span the whole array, and not the single disk. So the kernel complains when the addressing is outside of the physical disk. (In my case, this is tons of messages as / is mounted on the fourth partition...) So, I had tons of messages for sda and sdc.

Enough of my crappy explanation. In short, after a lot of research, it doesn't harm anything. However, those messages make dmesg absolutely useless when trying to troubleshoot something. After much experimenting and reading about udev and hal, I found a mechanism to silence the error messages simply by ignoring the individual partitions on the first disk of each array. I thought about ignoring the entire disk at first, but decided against it in case nvraid is still using it internally.

There's two parts to this solution (and I must add I'm a udev newbie, so try this at your own risk!!!): one is udev and the other is hal.

First; udev: we need to ignore the partitions of the first disk in each array. Create /etc/udev/rules.d/09dmraid.rules with the contents (you will have to edit this to suit your configuration):

```

KERNEL=="sda1", OPTIONS+="ignore_device"

KERNEL=="sda2", OPTIONS+="ignore_device"

KERNEL=="sda3", OPTIONS+="ignore_device"

KERNEL=="sda4", OPTIONS+="ignore_device"

KERNEL=="sdc1", OPTIONS+="ignore_device"

KERNEL=="sdc2", OPTIONS+="ignore_device"

KERNEL=="sdc3", OPTIONS+="ignore_device"

KERNEL=="sdc4", OPTIONS+="ignore_device"

```

In that example I'm ignoring two raid sets, sda and sdc. Remember, don't ignore the entire disk, who knows what will happen!

Second; hal: we need to create /etc/hal/fdi/preprobe/00dmraid.fdi with these contents (you absolutely, positively will have to edit this to suit your configuration, see below...):

```

<?xml version="1.0" encoding="UTF-8"?>

<deviceinfo version="0.2">

        <device>

                <match key="scsi.model" string="ST3500320NS">

                        <merge key="info.ignore" type="bool">true</merge>

                </match>

        </device>

</deviceinfo>

```

In order to edit this properly, you need to know the model of the drive you are wanting to ignore. To do this, simply use:

```

$ cat /sys/block/sd[abcd]/device/model

ST3500320NS

ST3500320NS

ST3500320NS

ST3500320NS

```

Of course these should all be the same, but you never know...

----------

## devsk

what version of dmraid are you using? rc13 and before were known to have that issue. Try 1.0.0_rc14 and latest stable kernel.

EDIT: the only version available in portage is rc14. So, most likely this is not your problem... :Sad: 

----------

## Raboo

Thanks, this seems to work, but I wonder, is the problem fixed or is the messages suppressed?

----------

## devsk

 *Raboo wrote:*   

> Thanks, this seems to work, but I wonder, is the problem fixed or is the messages suppressed?

 you mean you were not running rc14?

rc14 definitely fixed the problem because I got that error on my NVRAID some time back. And, no, it did not just suppress the messages.... :Smile: 

----------

## Raboo

I'm running rc14. but still I had to apply danomac fix for the error messages to disappear from syslog, so my question still follows, is this a fix for the problem or just a hack to make the messages go away?

----------

## danomac

That's just to hide the messages. I had to do that so I could troubleshoot some driver in the kernel, and it was hard to get information out of it with millions of messages in it.

Something in dmraid is causing the issue. As I said earlier, it seems to be harmless.

I'm also running rc14, I still have the issues. I checked genkernel, and it's running rc14 too...

Again, the problem lies with how the partition tables are stored on the arrays.

Edit: If using dmraid, i wonder why udev even bothers mapping sda-sdd. You'd think it wouldn't be needed...

----------

## Raboo

 *danomac wrote:*   

> Edit: If using dmraid, i wonder why udev even bothers mapping sda-sdd. You'd think it wouldn't be needed...

 

Hmm isn't it possible to create a udev rule to not map sda-sdd?

----------

## danomac

 *Raboo wrote:*   

> 
> 
> Hmm isn't it possible to create a udev rule to not map sda-sdd?

 

It is, but I read somewhere on the web that this might be a bad idea, so then I started probing around looking for ways to ignore it instead. That's why I ignored the partitions and not the entire drive in my udev rules.

I'd set up a test PC to try it out, but I only have one PC with onboard RAID, and I need it to work.  :Wink: 

----------

