# csum failed ino

## Adel Ahmed

again I keep getting these messages, this is a bad omen, whenever I get these messages, a FS corruption is around the block

[ 5478.403071] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.403345] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.403808] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.404050] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.405374] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.405616] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.405825] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.406056] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.407265] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

[ 5478.407501] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497

any idea on how to deal with these errors?

this is a raid 5 btrfs FS made up of 4 1 TB disks

just as I had expected, corruption:

whenever I start up my newly created rhel7 VM I get a kernel panic unable to mount root fs on unknownblock

and this is fora snasphot I had reverted to plenty of times today, before thepanic comes up I get:

failure reading sector 0x1e508 from 'hd0'

thanks

----------

## ct85711

well, considering it's all at the same offset, I'd start thinking checking your drives for bad sectors.  So, I'd suggest running smartctl on your drives, see if what it says for reallocated sectors, and also current_pending_sector iirc (I may be missing some others, but other people should be able to correct what I am missing).  Depending on the results, I'd start considering to get your backup ready in the worst case, and a new drive(s) ready to replace the failing drive(s).  With 1 failing drive, your relatively safe still on your data, but you can't loose another till the raid is back up fully.  Rebuilding the raid to replace the failed drive can potentially cause other drives to start failing, so the risk is there.

----------

## davidm

You could run a btrfs scrub on the array.  With Raid 5 it should have duplicate data to be able to correct the data errors present.  You might also run a SMART extended test as well as a non destructive probe using badblocks (but make sure you use the non-destructive options).

Ofc ourse as said previously you should have backups.  But that goes without saying because you should always have backups for anything you cannot afford to lose.  Raid isn't a backup and especially not btrfs raid as it isn't entirely stable and well tested.  Even more so for the raid 5/6 implementations.

----------

## Adel Ahmed

all 0s:

http://pastebin.com/sw4ewkhZ

I see no I/o errors so far

I see this in dmesg though:

[ 5327.890593] ata3.00: exception Emask 0x50 SAct 0x10000 SErr 0x90a00 action 0xe frozen

[ 5327.890596] ata3.00: irq_stat 0x01400000, PHY RDY changed

[ 5327.890601] ata3.00: cmd 60/80:80:00:d3:31/01:00:01:00:00/40 tag 16 ncq 196608 in

                        res 40/00:80:00:d3:31/00:00:01:00:00/40 Emask 0x50 (ATA bus error)

----------

## Adel Ahmed

tried a scrub on the fs

and dmesg just went crazy:

http://pastebin.com/XAxnYpVQ

ended up with:

ERROR: There are uncorrectable errors.

pc ~ # btrfs scrub status /media/raid/

scrub status for 4be16663-041d-4aa8-8557-e272e0d534af

	scrub started at Tue Feb 16 14:22:06 2016 and finished after 268 seconds

	total bytes scrubbed: 5.54GiB with 432 errors

	error details: read=16 csum=416

	corrected errors: 384, uncorrectable errors: 48, unverified errors: 0

I definitely need to take a look atthe ata bus errors

any idea what that might be?

all errors are on disk /dev/sdc which is good at least I'm narrowing things down:

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   118   100   006    Pre-fail  Always       -       186957558

  3 Spin_Up_Time            0x0003   098   096   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   088   088   020    Old_age   Always       -       12377

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       7451

  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       892

 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   088   088   020    Old_age   Always       -       12337

183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0

184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0

189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   053   053   045    Old_age   Always       -       47 (Min/Max 47/47)

194 Temperature_Celsius     0x0022   047   047   000    Old_age   Always       -       47 (0 17 0 0 0)

195 Hardware_ECC_Recovered  0x001a   038   031   000    Old_age   Always       -       186957558

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       119 (88 230 0)

241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1923886593

242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       54783455

----------

