# Mysterious DRDY

## Corona688

I have two 2TB disks in a software RAID I'm trying to use.

```
~ # grep -v "^#" /etc/mdadm.conf

DEVICE /dev/sd[cd]

ARRAY /dev/md1 UUID=a863f109:403305bb:11806f70:607e1cb2

~ # fdisk -l /dev/md1

Disk /dev/md1: 2000.4 GB, 2000398843904 bytes

2 heads, 4 sectors/track, 488378624 cylinders

Units = cylinders of 8 * 512 = 4096 bytes

Disk identifier: 0x63671c9f

    Device Boot      Start         End      Blocks   Id  System

/dev/md1p1               1       16385       65538   83  Linux

/dev/md1p2           16386      540674     2097156   83  Linux

/dev/md1p3          540675     1064963     2097156   82  Linux swap / Solaris

/dev/md1p4         1064964   488378624  1949254644    5  Extended

/dev/md1p5         1064964     6307844    20971522   83  Linux

/dev/md1p6         6307845     8929285    10485762   83  Linux

/dev/md1p7         8929286    16793606    31457282   83  Linux

/dev/md1p8        16793607   121651207   419430402   83  Linux

/dev/md1p9       121651208   488378624  1466909666   83  Linux
```

Due to what I believed at the time to be a controller failure, I failed out one disk so I could use the other independently through a different controller(still through /dev/md1).  Now I have both disks on their normal controller:

```
~ # cat /proc/mdstat

Personalities : [raid1]

md1 : active raid1 sdd[1] sdc[2]

      1953514496 blocks [2/1] [_U]

      [=>...................]  recovery =  6.9% (136338688/1953514496) finish=412.5min speed=73414K/sec

unused devices: <none>
```

But I'm getting things like this in dmesg:

```
[  781.088191] ata5: illegal qc_active transition (00000007->ffffffff)

[  781.088233] ata5.00: exception Emask 0x2 SAct 0x7 SErr 0x0 action 0x6 frozen

[  781.088242] ata5.00: cmd 60/80:00:00:02:67/00:00:03:00:00/40 tag 0 ncq 65536 in

[  781.088244]          res 40/00:10:00:03:67/00:00:03:00:00/40 Emask 0x2 (HSM violation)

[  781.088247] ata5.00: status: { DRDY }

[  781.088254] ata5.00: cmd 60/80:08:80:02:67/00:00:03:00:00/40 tag 1 ncq 65536 in

[  781.088255]          res 40/00:10:00:03:67/00:00:03:00:00/40 Emask 0x2 (HSM violation)

[  781.088258] ata5.00: status: { DRDY }

[  781.088265] ata5.00: cmd 60/80:10:00:03:67/00:00:03:00:00/40 tag 2 ncq 65536 in

[  781.088267]          res 40/00:10:00:03:67/00:00:03:00:00/40 Emask 0x2 (HSM violation)

[  781.088270] ata5.00: status: { DRDY }

[  781.088275] ata5: hard resetting link

[  781.572702] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

[  781.575868] ata5.00: configured for UDMA/133

[  781.575878] ata5: EH complete
```

 ...errors which, eerily, the mdadm sync doesn't seem to give a damn about.

 The errors are usually rare, but at least a few happen a day.  Not always the same disk.  Sometimes the machine even locks up completely.  If I use only one disk, the errors become extremely rare, but still happen once every few weeks.  What's going on?

----------

## Corona688

Bump!

----------

## linuxtuxhellsinki

Similar errors with SSD users and some kind of udev-rule workaround 

Few links for information...

https://ata.wiki.kernel.org/index.php/Libata_error_messages

 *Quote:*   

> this error can be anything - driver bug, faulty device, controller and/or cable.

 

http://tali.admingilde.org/linux-docbook/libata/ch07.html#excatHSMviolation

Check the SATA-cables and maybe try to turn NCQ off/on ?

And there was something about disabling polling of hal (in case of CD-drives)

```
# hal-disable-polling --device /dev/scd0
```

----------

## Corona688

It's not an SSD.  They're on an adaptor card, so disabling it in the BIOS isn't possible(and may not be relevant anyway since they're not SSD's).  I could try replacing the cables -- again -- but I believe them to be fine.

----------

## Corona688

Bump!

----------

## Corona688

Bump!

----------

## Corona688

Bump!

----------

