# I/O errors

## Adel Ahmed

after a power outage I ge tthe following:

Dec 31 15:10:52 pc.home kernel: sd 4:0:0:0: [sdd] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

Dec 31 15:10:52 pc.home kernel: sd 4:0:0:0: [sdd] tag#24 Sense Key : 0x3 [current] [descriptor] 

Dec 31 15:10:52 pc.home kernel: sd 4:0:0:0: [sdd] tag#24 ASC=0x11 ASCQ=0x4 

Dec 31 15:10:52 pc.home kernel: sd 4:0:0:0: [sdd] tag#24 CDB: opcode=0x28 28 00 00 00 00 80 00 00 08 00

Dec 31 15:10:52 pc.home kernel: blk_update_request: I/O error, dev sdd, sector 128

Dec 31 15:10:52 pc.home kernel: Buffer I/O error on dev sdd, logical block 16, async page read

Dec 31 15:10:52 pc.home kernel: ata5: EH complete

I want to make sure this is a hardware error and nothing could be done before I remve that disk from my raid configuration(btrfs raid5) 

has anyone recovered from this error before?

when I try to mount I get:

mount: wrong fs type, bad option, bad superblock on /dev/sdc,

       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try

       dmesg | tail or so.

pc / # blkid /dev/sdc

/dev/sdc: LABEL="raid" UUID="906ed4e3-52c5-4eb7-bd06-4810c0b84902" UUID_SUB="f6058c50-230b-46f1-8afb-c13a05bd5089" TYPE="btrfs"

pc / # blkid /dev/sdd

pc / # 

pc / # blkid /dev/sde

/dev/sde: LABEL="raid" UUID="906ed4e3-52c5-4eb7-bd06-4810c0b84902" UUID_SUB="0c835328-dbcd-488e-b524-337b3cbfc6ce" TYPE="btrfs"

----------

## szatox

 *Quote:*   

> after a power outage

 

Perhaps it's just filesystem that got corrupted? Happened a few times... even with "mature" and "journaled" stuff like ext3. And when it happens it can report IO errors and reject mount attempts just like in your case.

Now, I don't know btrfs itself, but there should be fsck for it, probably supporting -p option (automagically repair errors that do not require manual intervention), and running it is typically enough to let you recover from FS corruption.

----------

## NeddySeagoon

Adel Ahmed,

It can be poor quality SATA data cables.

```
Dec 31 15:10:52 pc.home kernel: blk_update_request: I/O error, dev sdd, sector 128 
```

That's right at the start of the drive. Thats very odd. It would be in the primary GPT partition table if you use GPT, then its only read once at startup.

The HDD should have relocated the data there but it seems it can no longer read its own writing.

Get the smart data (smartmontools) with smartclt -a /dev... and post it here.

You should be able to provoke the error again by rereading that block ... or it will work and maybe get relocated.

If the drive is under warranty, don't 'footer'. That log fragment will justify a warranty replacement.

----------

## Adel Ahmed

changed the sata cable, no dice

here's the journalctl entry while I tried to run fsck on /dev/sdc

Dec 31 21:42:28 pc.home rpc.mountd[308]: authenticated unmount request from 192.168.1.4:762 for /media/raid (/media/raid)

Dec 31 21:42:29 pc.home rpc.mountd[308]: authenticated mount request from 192.168.1.4:806 for /media/raid (/media/raid)

Dec 31 21:42:29 pc.home kernel: ata5.00: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0

Dec 31 21:42:29 pc.home kernel: ata5.00: irq_stat 0x40000008

Dec 31 21:42:29 pc.home kernel: ata5.00: cmd 60/08:40:80:00:00/00:00:00:00:00/40 tag 8 ncq 4096 in

                                         res 41/40:00:80:00:00/00:00:00:00:00/00 Emask 0x409 (media error) <F>

Dec 31 21:42:29 pc.home kernel: ata5.00: configured for UDMA/133

Dec 31 21:42:29 pc.home kernel: sd 4:0:0:0: [sdd] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

Dec 31 21:42:29 pc.home kernel: sd 4:0:0:0: [sdd] tag#8 Sense Key : 0x3 [current] [descriptor] 

Dec 31 21:42:29 pc.home kernel: sd 4:0:0:0: [sdd] tag#8 ASC=0x11 ASCQ=0x4 

Dec 31 21:42:29 pc.home kernel: sd 4:0:0:0: [sdd] tag#8 CDB: opcode=0x28 28 00 00 00 00 80 00 00 08 00

Dec 31 21:42:29 pc.home kernel: blk_update_request: I/O error, dev sdd, sector 128

Dec 31 21:42:29 pc.home kernel: Buffer I/O error on dev sdd, logical block 16, async page read

Dec 31 21:42:29 pc.home kernel: ata5: EH complete

Dec 31 21:42:32 pc.home rpc.mountd[308]: authenticated unmount request from 192.168.1.4:778 for /media/raid (/media/raid)

Dec 31 21:42:33 pc.home kernel: ata5.00: exception Emask 0x0 SAct 0x100000 SErr 0x0 action 0x0

Dec 31 21:42:33 pc.home kernel: ata5.00: irq_stat 0x40000008

Dec 31 21:42:33 pc.home kernel: ata5.00: cmd 60/08:a0:80:00:00/00:00:00:00:00/40 tag 20 ncq 4096 in

                                         res 41/40:00:80:00:00/00:00:00:00:00/00 Emask 0x409 (media error) <F>

Dec 31 21:42:33 pc.home kernel: ata5.00: configured for UDMA/133

Dec 31 21:42:33 pc.home kernel: sd 4:0:0:0: [sdd] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

Dec 31 21:42:33 pc.home kernel: sd 4:0:0:0: [sdd] tag#20 Sense Key : 0x3 [current] [descriptor] 

Dec 31 21:42:33 pc.home kernel: sd 4:0:0:0: [sdd] tag#20 ASC=0x11 ASCQ=0x4 

Dec 31 21:42:33 pc.home kernel: sd 4:0:0:0: [sdd] tag#20 CDB: opcode=0x28 28 00 00 00 00 80 00 00 08 00

Dec 31 21:42:33 pc.home kernel: blk_update_request: I/O error, dev sdd, sector 128

Dec 31 21:42:33 pc.home kernel: Buffer I/O error on dev sdd, logical block 16, async page read

Dec 31 21:42:33 pc.home kernel: ata5: EH complete

pc ~ # btrfsck /dev/sdc

warning, device 2 is missing

checksum verify failed on 21037056 found BB7411C5 wanted 5F97AC73

bytenr mismatch, want=21037056, have=65536

Couldn't read chunk tree

Couldn't open file system

pc ~ # btrfsck /dev/sdd

No valid Btrfs found on /dev/sdd

Couldn't open file system

and the same info in journalctl

smartctl:

http://pastebin.com/xQjNZSyR

I'd much rather have the h/w lost, I'm using raid 5 and I will simply cough up the money to buy a new drive to keep my data protected, I'm definitely still under warranty and I'll start looking for that warranty

Of course, I would like to be sure the drive is damaged before I go through the ordeal of getting a refund

I really appreciate your assistance

----------

## NeddySeagoon

Adel Ahmed,

```
Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
```

The drive has one sector that it would like to relocate but can't because it cant read it.

That's one sector that it knows about.  There may be more.

You won't have any problems getting a warranty replacement and WD will even put one in the post before you need to return yours.  Put the smart log in the RMA form when you fill it in.

I didn't check your warranty status as I would need your region.

----------

## Adel Ahmed

thanks for the link, I live in Egypt and I'll get to replacgin the hard drive as soon as I get things fixed

I'm going to buya new 1 TB hard disk tomorrow to get the raid in place, but for now how do I mount the degraded raid?

----------

## NeddySeagoon

Adel Ahmed,

Is that btrfs on top of mdadm raid or is btrfs doing the raid too?

Knowing that you are in Egypt, your warranty expires on 14 May 2017.

I had 2 WD greens in a raid5 go down within 15 minutes of one another.  They were in warranty too. 

Your raid may be assembled but not running.  Look in /proc/mdstat.

Keep your smartctl log.  If its a raid set with mdadm you can add the faulty drive back to the raid.  It will then get resynced, which will rewrite all the data on it.

This may force the bad block to be relocated.

----------

## Adel Ahmed

btrfs is doing the raid as well 

I'm not eager to use the data now I just want to make sure the data is safe  :Smile: 

sorry to hear about those 2 hard disks  :Sad:  that must've been a terrible incident

I'm unplugging the other disks from the motherboard just in case.

----------

## NeddySeagoon

Adel Ahmed,

I don't know how to bring up a btrfs raid in degraded mode.

I actually only lost a single block and that was in the middle of a DVD rip somewhere.

I had 3 out of 5 good disks and the one that had been kicked out most recently.

ddrescue got back all but one block of that disk.  I could have done without the learning experience.

I really wasn't looking forward to ripping 1500+ DVDs again.

----------

## Adel Ahmed

well thank god for that, I hope I don;t have to go through that EVER

I'll wait till I buy my hard disk and replace the damaged one

----------

## Ant P.

The good news is, it's straightforward to yank failing disks from a btrfs RAID:

https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices

Adding a replacement right away is optional if you've got enough capacity in the remaining good devices. The device-delete/balance command pair will do the right thing.

There's also a btrfs-replace subcommand, but I can't find any good examples for how to use that.

----------

## Adel Ahmed

I've added the device and removed the old one and rebalancing was completed successfully

devices was mounted and data is present

one reboot later:

pc ~ # mount -a

mount: wrong fs type, bad option, bad superblock on /dev/sdb,

       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try

       dmesg | tail or so.

[  130.644798] BTRFS info (device sdb): enabling auto defrag

[  130.644806] BTRFS info (device sdb): disk space caching is enabled

[  130.658952] verify_parent_transid: 16 callbacks suppressed

[  130.658960] BTRFS (device sdb): parent transid verify failed on 1699771072512 wanted 270891 found 270069

[  130.668081] BTRFS (device sdb): parent transid verify failed on 1699771088896 wanted 270891 found 270069

[  130.742669] BTRFS: bdev /dev/sdc errs: wr 274243, rd 0, flush 271639, corrupt 0, gen 0

[  130.813959] BTRFS (device sdb): parent transid verify failed on 1699621765120 wanted 271706 found 270443

[  130.888539] BTRFS (device sdb): parent transid verify failed on 1698604236800 wanted 271748 found 270340

[  130.929401] BTRFS (device sdb): parent transid verify failed on 1698810986496 wanted 271756 found 270353

[  130.953951] BTRFS (device sdb): parent transid verify failed on 1698069135360 wanted 271732 found 270326

[  130.980951] BTRFS (device sdb): parent transid verify failed on 1698078883840 wanted 271731 found 270329

[  131.004352] BTRFS (device sdb): parent transid verify failed on 1698083651584 wanted 271732 found 270329

[  131.051876] BTRFS (device sdb): parent transid verify failed on 1698882453504 wanted 271763 found 270358

[  131.240464] BTRFS (device sdb): parent transid verify failed on 1698776694784 wanted 271757 found 270353

[  135.180623] ------------[ cut here ]------------

[  135.180642] WARNING: CPU: 1 PID: 1522 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x46/0x110()

[  135.180645] BTRFS: Transaction aborted (error -5)

[  135.180647] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge stp llc ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter via_rhine r8169 ohci_pci ohci_hcd

[  135.180719] CPU: 1 PID: 1522 Comm: mount Tainted: G        W       4.0.5-gentoo #1

[  135.180723] Hardware name: Gigabyte Technology Co., Ltd. GA-790XT-USB3/GA-790XT-USB3, BIOS F4 05/13/2010

[  135.180727]  0000000000000000 ffffffff81758719 ffffffff8158b2dd ffff88028daf7ab8

[  135.180735]  ffffffff810817ac ffff8800cb4a7160 ffff8800c0507000 00000000fffffffb

[  135.180746]  ffffffff8164b330 0000000000000ae6 ffffffff81081825 ffffffff81750510

[  135.180753] Call Trace:

[  135.180766]  [<ffffffff8158b2dd>] ? dump_stack+0x4a/0x74

[  135.180777]  [<ffffffff810817ac>] ? warn_slowpath_common+0x7c/0xb0

[  135.180786]  [<ffffffff81081825>] ? warn_slowpath_fmt+0x45/0x50

[  135.180793]  [<ffffffff8124d796>] ? __btrfs_abort_transaction+0x46/0x110

[  135.180800]  [<ffffffff8126a479>] ? btrfs_run_delayed_refs.part.66+0x129/0x280

[  135.180810]  [<ffffffff8127a1ab>] ? btrfs_commit_transaction+0x3b/0x9f0

[  135.180817]  [<ffffffff810a48f7>] ? preempt_count_add+0x47/0xa0

[  135.180824]  [<ffffffff81590c31>] ? _raw_spin_unlock+0x11/0x30

[  135.180830]  [<ffffffff8129b031>] ? release_extent_buffer+0x21/0xc0

[  135.180837]  [<ffffffff812ba032>] ? btrfs_recover_log_trees+0x392/0x450

[  135.180843]  [<ffffffff812717a0>] ? free_root_pointers+0x60/0x60

[  135.180849]  [<ffffffff812b7710>] ? replay_one_extent+0x6c0/0x6c0

[  135.180858]  [<ffffffff81277ca8>] ? open_ctree+0x17a8/0x20d0

[  135.180866]  [<ffffffff8124f08e>] ? btrfs_mount+0x60e/0x880

[  135.180873]  [<ffffffff81133b9b>] ? pcpu_alloc+0x35b/0x680

[  135.180881]  [<ffffffff8115a56c>] ? mount_fs+0xc/0x90

[  135.180890]  [<ffffffff8117325d>] ? vfs_kern_mount+0x5d/0x110

[  135.180897]  [<ffffffff81175ef3>] ? do_mount+0x1b3/0xab0

[  135.180903]  [<ffffffff81121082>] ? __get_free_pages+0x12/0x50

[  135.180911]  [<ffffffff81176af3>] ? SyS_mount+0x83/0xd0

[  135.180920]  [<ffffffff81591532>] ? system_call_fastpath+0x12/0x17

[  135.180924] ---[ end trace f7322caa403bc2aa ]---

[  135.180930] BTRFS: error (device sdb) in btrfs_run_delayed_refs:2790: errno=-5 IO failure

[  135.183392] BTRFS: error (device sdb) in open_ctree:2898: errno=-5 IO failure (Failed to recover log tree)

[  135.747992] verify_parent_transid: 132 callbacks suppressed

[  135.748001] BTRFS (device sdb): parent transid verify failed on 1698203516928 wanted 271612 found 270330

[  135.775232] BTRFS (device sdb): parent transid verify failed on 1699047669760 wanted 271132 found 270376

[  135.798876] BTRFS (device sdb): parent transid verify failed on 1698344681472 wanted 271222 found 270331

[  135.872745] BTRFS (device sdb): parent transid verify failed on 1698960424960 wanted 270745 found 270367

[  135.888021] BTRFS (device sdb): parent transid verify failed on 1698530361344 wanted 271739 found 270331

[  135.909895] BTRFS (device sdb): parent transid verify failed on 1698501984256 wanted 271740 found 270336

[  135.924789] BTRFS (device sdb): parent transid verify failed on 1698543632384 wanted 271486 found 270079

[  135.925251] BTRFS (device sdb): parent transid verify failed on 1698502017024 wanted 271740 found 270332

[  135.932269] BTRFS (device sdb): parent transid verify failed on 1698512224256 wanted 271617 found 270337

[  135.955203] BTRFS (device sdb): parent transid verify failed on 1698502066176 wanted 271613 found 270332

[  136.299296] BTRFS: open_ctree failed

----------

