# Gentoo KVM guest loses disk access

## mrray

Hi all! (Long time, no see)

Anyways, I have a Gentoo guest running on a KVM host with a coud provider. 

The guest resides on an SSD array which gives massive performance on a server that runs semi-I/O intnesive stuff like amavisd-new and some other stuff. 

Problem is, however, that the guest keeps losing disk access at random intervals. Can be after 4 weeks, has happened after as little as 8 hours  of uptime. 

A reboot solves the immediate problem, but it mean I have to be available to do just that and I like (and need) my beauty sleep. 

The provider support department have been very forthcoming on this issue and has made some configuration changes to the virtual hardware, but also suggested I head on over here and ask if anyone has seen the same problem witj Gentoo KVM guests. 

Here is an enclosed kernel log: 

```
Apr 20 03:07:21 [kernel] [367916.497177] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Apr 20 03:07:21 [kernel] [367916.497184] ata1.00: failed command: WRITE DMA

Apr 20 03:07:21 [kernel] [367916.497188] ata1.00: cmd ca/00:08:5b:f3:e9/00:00:00:00:00/e2 tag 0 dma 4096 out

Apr 20 03:07:21 [kernel] [367916.497188] res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)

Apr 20 03:07:21 [kernel] [367916.497190] ata1.00: status: { DRDY }

Apr 20 03:07:21 [kernel] [367916.502915] ata1: soft resetting link

Apr 20 03:07:21 [kernel] [367916.654706] ata1.01: NODEV after polling detection

Apr 20 03:07:21 [kernel] [367916.655703] ata1.00: configured for MWDMA2

Apr 20 03:07:21 [kernel] [367916.655711] ata1.00: device reported invalid CHS sector 0

Apr 20 03:07:21 [kernel] [367916.655728] sd 0:0:0:0: [sda]

Apr 20 03:07:21 [kernel] [367916.655730] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

Apr 20 03:07:21 [kernel] [367916.655731] sd 0:0:0:0: [sda]

Apr 20 03:07:21 [kernel] [367916.655733] Sense Key : Aborted Command [current] [descriptor]

Apr 20 03:07:21 [kernel] [367916.655735] Descriptor sense data with sense descriptors (in hex):

Apr 20 03:07:21 [kernel] [367916.655736] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00

Apr 20 03:07:21 [kernel] [367916.655740] 00 00 00 00

Apr 20 03:07:21 [kernel] [367916.655743] sd 0:0:0:0: [sda]

Apr 20 03:07:21 [kernel] [367916.655744] Add. Sense: No additional sense information

Apr 20 03:07:21 [kernel] [367916.655746] sd 0:0:0:0: [sda] CDB:

Apr 20 03:07:21 [kernel] [367916.655746] Write(10): 2a 00 02 e9 f3 5b 00 00 08 00

Apr 20 03:07:21 [kernel] [367916.655751] end_request: I/O error, dev sda, sector 48886619

Apr 20 03:07:21 [kernel] [367916.655754] Buffer I/O error on device sda3, logical block 103

Apr 20 03:07:21 [kernel] [367916.655755] lost page write due to I/O error on sda3

Apr 20 03:07:21 [kernel] [367916.655772] ata1: EH complete

Apr 20 03:07:21 [kernel] [367916.829822] REISERFS abort (device sda3): Journal write error in flush_commit_list

Apr 20 03:10:08 [kernel] [367988.070159] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Apr 20 03:10:08 [kernel] [367988.070165] ata1.00: failed command: WRITE DMA

Apr 20 03:10:08 [kernel] [367988.070169] ata1.00: cmd ca/00:08:04:fb:00/00:00:00:00:00/e1 tag 0 dma 4096 out

Apr 20 03:10:08 [kernel] [367988.070169] res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)

Apr 20 03:10:08 [kernel] [367988.070171] ata1.00: status: { DRDY }

Apr 20 03:10:08 [kernel] [367988.070288] ata1: soft resetting link

Apr 20 03:10:08 [kernel] [367988.221587] ata1.01: NODEV after polling detection

Apr 20 03:10:08 [kernel] [367988.222453] ata1.00: configured for MWDMA2

Apr 20 03:10:08 [kernel] [367988.222458] ata1.00: device reported invalid CHS sector 0

Apr 20 03:10:08 [kernel] [367988.222474] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [367988.222476] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

Apr 20 03:10:08 [kernel] [367988.222478] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [367988.222480] Sense Key : Aborted Command [current] [descriptor]

Apr 20 03:10:08 [kernel] [367988.222483] Descriptor sense data with sense descriptors (in hex):

Apr 20 03:10:08 [kernel] [367988.222484] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00

Apr 20 03:10:08 [kernel] [367988.222490] 00 00 00 00

Apr 20 03:10:08 [kernel] [367988.222493] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [367988.222495] Add. Sense: No additional sense information

Apr 20 03:10:08 [kernel] [367988.222497] sd 0:0:0:0: [sda] CDB:

Apr 20 03:10:08 [kernel] [367988.222498] Write(10): 2a 00 01 00 fb 04 00 00 08 00

Apr 20 03:10:08 [kernel] [367988.222504] end_request: I/O error, dev sda, sector 16841476

Apr 20 03:10:08 [kernel] [367988.222507] Buffer I/O error on device sda2, logical block 2097152

Apr 20 03:10:08 [kernel] [367988.222508] lost page write due to I/O error on sda2

Apr 20 03:10:08 [kernel] [367988.222527] ata1: EH complete

Apr 20 03:10:08 [kernel] [368034.360717] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Apr 20 03:10:08 [kernel] [368034.360723] ata1.00: failed command: WRITE DMA

Apr 20 03:10:08 [kernel] [368034.360727] ata1.00: cmd ca/00:08:14:fd:10/00:00:00:00:00/e2 tag 0 dma 4096 out

Apr 20 03:10:08 [kernel] [368034.360727] res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)

Apr 20 03:10:08 [kernel] [368034.360729] ata1.00: status: { DRDY }

Apr 20 03:10:08 [kernel] [368034.360848] ata1: soft resetting link

Apr 20 03:10:08 [kernel] [368034.512510] ata1.01: NODEV after polling detection

Apr 20 03:10:08 [kernel] [368034.513452] ata1.00: configured for MWDMA2

Apr 20 03:10:08 [kernel] [368034.513456] ata1.00: device reported invalid CHS sector 0

Apr 20 03:10:08 [kernel] [368034.513471] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [368034.513473] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

Apr 20 03:10:08 [kernel] [368034.513474] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [368034.513476] Sense Key : Aborted Command [current] [descriptor]

Apr 20 03:10:08 [kernel] [368034.513478] Descriptor sense data with sense descriptors (in hex):

Apr 20 03:10:08 [kernel] [368034.513479] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00

Apr 20 03:10:08 [kernel] [368034.513483] 00 00 00 00

Apr 20 03:10:08 [kernel] [368034.513485] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [368034.513486] Add. Sense: No additional sense information

Apr 20 03:10:08 [kernel] [368034.513488] sd 0:0:0:0: [sda] CDB:

Apr 20 03:10:08 [kernel] [368034.513489] Write(10): 2a 00 02 10 fd 14 00 00 08 00

Apr 20 03:10:08 [kernel] [368034.513494] end_request: I/O error, dev sda, sector 34667796

Apr 20 03:10:08 [kernel] [368034.513502] Buffer I/O error on device sda2, logical block 4325442

Apr 20 03:10:08 [kernel] [368034.513503] lost page write due to I/O error on sda2

Apr 20 03:10:08 [kernel] [368034.513520] ata1: EH complete

Apr 20 03:10:08 [kernel] [368084.435738] ata1.00: limiting speed to MWDMA1:PIO2

Apr 20 03:10:08 [kernel] [368084.435744] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Apr 20 03:10:08 [kernel] [368084.435748] ata1.00: failed command: WRITE DMA

Apr 20 03:10:08 [kernel] [368084.435753] ata1.00: cmd ca/00:08:b4:10:39/00:00:00:00:00/e2 tag 0 dma 4096 out

Apr 20 03:10:08 [kernel] [368084.435753] res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)

Apr 20 03:10:08 [kernel] [368084.435755] ata1.00: status: { DRDY }

Apr 20 03:10:08 [kernel] [368084.435876] ata1: soft resetting link

Apr 20 03:10:08 [kernel] [368084.587551] ata1.01: NODEV after polling detection

Apr 20 03:10:08 [kernel] [368084.588471] ata1.00: configured for MWDMA1

Apr 20 03:10:08 [kernel] [368084.588476] ata1.00: device reported invalid CHS sector 0

Apr 20 03:10:08 [kernel] [368084.588493] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [368084.588494] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

Apr 20 03:10:08 [kernel] [368084.588496] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [368084.588497] Sense Key : Aborted Command [current] [descriptor]

Apr 20 03:10:08 [kernel] [368084.588501] Descriptor sense data with sense descriptors (in hex):

Apr 20 03:10:08 [kernel] [368084.588502] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00

Apr 20 03:10:08 [kernel] [368084.588508] 00 00 00 00

Apr 20 03:10:08 [kernel] [368084.588511] sd 0:0:0:0: [sda]

Apr 20 03:10:08 [kernel] [368084.588512] Add. Sense: No additional sense information

Apr 20 03:10:08 [kernel] [368084.588514] sd 0:0:0:0: [sda] CDB:

Apr 20 03:10:08 [kernel] [368084.588515] Write(10): 2a 00 02 39 10 b4 00 00 08 00

Apr 20 03:10:08 [kernel] [368084.588524] Buffer I/O error on device sda2, logical block 4653750

Apr 20 03:10:08 [kernel] [368084.588526] lost page write due to I/O error on sda2

Apr 20 03:10:08 [kernel] [368084.588548] ata1: EH complete

```

I originally thought it to be a hardware issue, but the provider seems to think different and I just have to go with it since I don´t have access to the hardware...

Any ideas?

----------

## NeddySeagoon

mrray,

It looks like a HDD or HDD data cable issue.

The SMART error log would be useful.

I guess the storage your provider shows your KVM as sda is spread over a lot of physical devices.

I'm surprised to see your storage appear as /dev/sda too.  That suggests you are working through the emulated hardware that KVM provides.

The virtio driver is faster but your provider may not want to use that.  Your block devices would be /dev/vda ...  then.

Bugs in KVM cannot be ruled out. Search the kernel bugtracker.

Will your KVM provider provide storage access via virtio?

That may help narrow down the problem.

----------

## mrray

 *NeddySeagoon wrote:*   

> mrray,
> 
> It looks like a HDD or HDD data cable issue.
> 
> The SMART error log would be useful.

 

I installed smartmontools just now, so I will see what I can cough up and keep you posted. 

 *NeddySeagoon wrote:*   

> I'm surprised to see your storage appear as /dev/sda too.  That suggests you are working through the emulated hardware that KVM provides.
> 
> The virtio driver is faster but your provider may not want to use that.  Your block devices would be /dev/vda ...  then.
> 
> Bugs in KVM cannot be ruled out. Search the kernel bugtracker.
> ...

 

I have to do a bit of guesswork here as I have very limited experience with the KVM hypervisor, but htis VM was converted from its original XEN to KVM when I migrated it onto SSD storage. 

I am quite sure the provider can give me access via Virtio, but wouldn't that require a lot of work on my part? Gentoo and virtio do not seem to be good friends, at least that is what I deduced from a quick google search?

----------

