# LVM cache recurring corruption -- how to reinitialize device

## mbar

Hello all!

Since yesterday I have strange problem with my LVM RAID 5 (HDD) + LVM cache (on SSD) setup. Here are some details:

- this is relatively new minimal setup (new install on 25th of August 2017) on J3455-ITX (4 core Celeron), only cryptsetup, LVM RAID 5 (builtin), 1 x SSD as /dev/sda (rootfs/system is not encypted), 5 x HDD Samsung 1,5 TB encrypted: /dev/sdb1 -> /dev/mapper/crypt1 and so on

- on top of crypt[1-5] devices is LVM RAID 5 (no MD Raid layer)

- added 100 GB SSD cache on encypted /dev/sda4 partition -- referene: https://rwmj.wordpress.com/2014/05/22/using-lvms-new-cache-feature/

All was working OK for like 3 weeks. Two days ago I had to replace one HDD as it began to fail.

So:

- I had to uncache the LVM (went ok) -- reference: https://rwmj.wordpress.com/2014/05/23/removing-the-cache-from-an-lv/

- removed one drive

- added new encrypted 1,5 TB drive

- resynced LVM RAID 5

- fsck -- all OK

- LVM, filesystem status -- healthy

- then I reattached SSD cache. All seemed to be working OK, cache was up and running

- first reboot: LVM missing, cache device corrupted

I did cache removal / attach before a few times as a test before the drive was replaced. It went without any errors then.

Since the cache corruption I had to do a manual recovery to uncache the LVM and get access to the data: I had to edit vgcfgbackup by hand and do vgcfgrestore.

This is similar https://www.redhat.com/archives/linux-lvm/2016-December/msg00015.html not a single tool could help me to uncache LVM with corrupt cache.

Anyway, after manual recovery the data was intact, so I tried to do it again.

I did pvremove on ssd, pvreate, vgextend and so on. None of those commands displayed any error message, so I was sure the SSD cache was properly reinitialized

LVM was cached until next reboot (today), when it went missing again. Seems it is not usable now for reason unknown to me.

Is there any way to check / wipe the SSD cache partition (apart from overwriting it with /dev/zero)?

----------

## mbar

I don't understand this:

```
root@carbon:~# lvcreate -n cache0meta -L 120M vg0 /dev/mapper/luks_cache

  Logical volume "cache0meta" created.

root@carbon:~# lvcreate -n cache0 -l 25568 vg0 /dev/mapper/luks_cache

  Logical volume "cache0" created.

root@carbon:~# ls /dev/mapper/

control     luks_lvm1  luks_lvm3  luks_lvm5   vg0-cache0meta  vg0-lvol0_rimage_0  vg0-lvol0_rimage_2  vg0-lvol0_rimage_4  vg0-lvol0_rmeta_1  vg0-lvol0_rmeta_3

luks_cache  luks_lvm2  luks_lvm4  vg0-cache0  vg0-lvol0       vg0-lvol0_rimage_1  vg0-lvol0_rimage_3  vg0-lvol0_rmeta_0   vg0-lvol0_rmeta_2  vg0-lvol0_rmeta_4

root@carbon:~# cache_check 

No input file provided.

Usage: cache_check [options] {device|file}

Options:

  {-q|--quiet}

  {-h|--help}

  {-V|--version}

  {--clear-needs-check-flag}

  {--super-block-only}

  {--skip-mappings}

  {--skip-hints}

  {--skip-discards}

root@carbon:~# cache_check /dev/mapper/vg0-cache0

examining superblock

  superblock is corrupt

    bad checksum in superblock

root@carbon:~# cache_check /dev/mapper/vg0-cache0meta 

examining superblock

  superblock is corrupt

    bad checksum in superblock

root@carbon:~# lvconvert --type cache-pool --poolmetadata vg0/cache0meta vg0/cache0

  Using 128,00 KiB chunk size instead of default 64,00 KiB, so cache pool has less then 1000000 chunks.

  WARNING: Converting logical volume vg0/cache0 and vg0/cache0meta to cache pool's data and metadata volumes with metadata wiping.

  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)

Do you really want to convert vg0/cache0 and vg0/cache0meta? [y/n]: y

  Converted vg0/cache0_cdata to cache pool.

root@carbon:~# ls /dev/mapper/

control     luks_lvm1  luks_lvm3  luks_lvm5  vg0-lvol0_rimage_0  vg0-lvol0_rimage_2  vg0-lvol0_rimage_4  vg0-lvol0_rmeta_1  vg0-lvol0_rmeta_3

luks_cache  luks_lvm2  luks_lvm4  vg0-lvol0  vg0-lvol0_rimage_1  vg0-lvol0_rimage_3  vg0-lvol0_rmeta_0    vg0-lvol0_rmeta_2  vg0-lvol0_rmeta_4

root@carbon:~# cache_check /dev/mapper/vg0-cache0meta 

/dev/mapper/vg0-cache0meta: No such file or directory

root@carbon:~# cache_check /dev/mapper/luks_

luks_cache  luks_lvm1   luks_lvm2   luks_lvm3   luks_lvm4   luks_lvm5   

root@carbon:~# cache_check /dev/mapper/luks_cache 

examining superblock

  superblock is corrupt

    bad checksum in superblock

```

This is on newly wiped/trimmed SSD partition, with NEW luks key, luksFormat, pvcreate, etc.

If I add the cache to my LVM RAID 5, then it will get b0rked on reboot.

----------

## MageSlayer

Are you sure your SSD is ok?

----------

## Roman_Gruber

 *Quote:*   

>   superblock is corrupt
> 
>     bad checksum in superblock 

 

Did you checked your cables, connection, power supply, redo the wiring?

OFC latest firmware on the drive. is the drive healthy?

 *Quote:*   

>  /dev/mapper/vg0-cache0meta: No such file or directory 

 

Looks like it does not exists or is not visible to the operating system. 

Sometimes i had to initialize, make it visible to the OS, with vg-scan, vg -ay (or what the commands are, please check manpage!) Sometimes only a reboot did the trick on some sysrescue-cd discs.

--

I never had a broken SSD, sold my 5 year old daily used Plextor SSD recently. I usually sell, replace HDDs out of habbits every seond, third year average.

----------

## mbar

SSD seems to be healthy.

/dev/sda2 is 16GB system partition that has no trouble reading, writing, updating.

SMART info is clean, dmesg also, no errors, even crc32.

But I'll convert sda4 to plain ext4 and make some tests with files.

vg0-cache0meta0 is hidden by LVM after it is added to pool as a csche for HDD. Hence you can't check it explicitly.

----------

## mbar

SDD is OK, I just did long SMART test and 25GB copy and md5 checksum test on BTRFS partition (of course I rebooted the machine in the meantime):

```
smartctl -a /dev/sda

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.12.0-1-amd64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Device Model:     SAMSUNG MZNTD128HAGM-00000

Serial Number:    S15YNYAD625624

LU WWN Device Id: 5 002538 50003cf55

Firmware Version: DXT2300Q

User Capacity:    128,035,676,160 bytes [128 GB]

Sector Size:      512 bytes logical/physical

Rotation Rate:    Solid State Device

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c

SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:    Sat Sep 16 08:29:39 2017 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

(...)

SMART Attributes Data Structure revision number: 1

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       4090

 12 Power_Cycle_Count       0x0032   096   096   000    Old_age   Always       -       3664

177 Wear_Leveling_Count     0x0013   095   095   000    Pre-fail  Always       -       56

179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0

181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0

182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0

183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0032   069   048   000    Old_age   Always       -       31

195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0

199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0

235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       252

241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       10202610775

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      4089         -

root@carbon:~# free && sync && echo 3 > /proc/sys/vm/drop_caches && free

              total        used        free      shared  buff/cache   available

Mem:        8021580      232696      137676        5104     7651208     7497636

Swap:       7812092        4608     7807484

              total        used        free      shared  buff/cache   available

Mem:        8021580      232176     7703052        5104       86352     7599356

Swap:       7812092        4608     7807484

[dest dir on SSD, btrfs, after reboot] md5sum *

c7caf4e97cadf52a2489a176284ed8f4  1.mkv

1095527685e2aba668bee2c2958229af  2.mkv

19e2153cce10b4317e2add27747c4356  3.mkv

62b1cac0498a245a69056506e4d6356c  4.mkv

26f9230e3da7158a87c60d526ed7eb26  5.mkv

79e4d0db765a93ac5dee1b2ed1b53e39  6.mkv

a8c438783ee10fe75fcb3ba2cd636238  7.mkv

81e141a09074e7a16756fb472458df9e  8.mkv

706a2ee617186be6796b20d425eb836d  9.mkv

a13e62ddad20cc0e796f7e9a46c09a83  10.mkv

[source dir on HDD] md5sum *

c7caf4e97cadf52a2489a176284ed8f4  1.mkv

1095527685e2aba668bee2c2958229af  2.mkv

19e2153cce10b4317e2add27747c4356  3.mkv

62b1cac0498a245a69056506e4d6356c  4.mkv

26f9230e3da7158a87c60d526ed7eb26  5.mkv

79e4d0db765a93ac5dee1b2ed1b53e39  6.mkv

a8c438783ee10fe75fcb3ba2cd636238  7.mkv

81e141a09074e7a16756fb472458df9e  8.mkv

706a2ee617186be6796b20d425eb836d  9.mkv

a13e62ddad20cc0e796f7e9a46c09a83  10.mkv

```

----------

## mbar

Seems I'm getting onto something:

```
root@carbon:~# dd if=/dev/zero of=/dev/mapper/luks_cache status=progress

433003520 bajtów (433 MB, 413 MiB), 9 s, 48,1 MB/s      ^C^C^C

root@carbon:~# 

root@carbon:~# 

root@carbon:~# 

root@carbon:~# 

root@carbon:~# dd if=/dev/zero of=/dev/mapper/luks_cache bs=8M status=progress

3447717888 bajtów (3,4 GB, 3,2 GiB), 3,00394 s, 1,1 GB/s

dd: błąd zapisu '/dev/mapper/luks_cache': Brak miejsca na urządzeniu

489+0 przeczytanych rekordów

488+0 zapisanych rekordów

4093915136 bajtów (4,1 GB, 3,8 GiB), 3,80779 s, 1,1 GB/s

```

In short, I tried to wipe the encrypted block device (on top of 100 GB /dev/sda4 partition) and the write process failed after just over 4 GB of data with "no space left on device" message.

dmesg has this at the end:

```
wrz 16 10:27:27 carbon systemd[1]: Stopped target Encrypted Volumes.

wrz 16 10:27:27 carbon systemd[1]: Stopping Cryptography Setup for luks_cache...

wrz 16 10:27:27 carbon systemd[1]: Stopped Cryptography Setup for luks_cache.

wrz 16 10:27:37 carbon kernel: CMCI storm detected: switching to poll mode

```

Encrypted block device "luks_cache" was simply kicked of the system and I think it is connected to the "CMCI storm" (first time I see this).

https://forums.gentoo.org/viewtopic-p-8115134.html <-- here is similar hardware (Intel Celeron J3355 CPU).

Slower writes to the plain BTRFS were approx. 100 MB/s speed (copying from HDD) and the system handled 25 GB with no problem.

4 GB of high speed writes (~1 GB/s) seems to overwhelm it. Where to look next -- software, hardware?

Is there any kernel switch that can help here (I'm still on 4.12 series on this machine)?

----------

## mbar

Writing to raw (unencrypted) sda4 device seems OK, no storm here:

```
107487428608 bajtów (107 GB, 100 GiB), 872,027 s, 123 MB/s

dd: błąd zapisu '/dev/sda4': Brak miejsca na urządzeniu

25630+0 przeczytanych rekordów

25629+0 zapisanych rekordów

107497914368 bajtów (107 GB, 100 GiB), 887,286 s, 121 MB/s

```

----------

## mbar

OK, the question is:

why raw write to encrypted device is much faster (I suspect some kind of buffer in dm-mapper layer?) than raw write to unencrypted device?

----------

## mbar

Small success here: just like in the referenced thread, upgrading the kernel to 4.13.x seems to have solved the "superfast writes" and device kicked out of dm-mapper:

```
dd if=/dev/zero of=/dev/mapper/luks_cache bs=4M status=progress

9441378304 bajtów (9,4 GB, 8,8 GiB), 67,0111 s, 141 MB/s

...

```

Write speed seems normal, dmesg reports no storm.

----------

## mbar

I reinitialized the LVM cache and I'm at loss here:

```
root@carbon:~# cache_check /dev/mapper/luks_cache 

examining superblock

  superblock is corrupt

    bad checksum in superblock
```

EDIT:

I did a quick test:

```
root@carbon:~# lvconvert --type cache --cachepool vg0/cache0 vg0/lvol0

Do you want wipe existing metadata of cache pool vg0/cache0? [y/n]: y

  WARNING: Data redundancy is lost with writeback caching of raid logical volume!

  Logical volume vg0/lvol0 is now cached.

...

root@carbon:~# lvremove vg0/cache0

Do you really want to remove and DISCARD logical volume vg0/cache0? [y/n]: y

  Flushing 0 blocks for cache vg0/lvol0.

  Logical volume "cache0" successfully removed

```

No rebooting with cache enabled.

----------

## mbar

This is the last episode in this series  :Smile:  (I hope) -- or "how I learned to stop worrying and love the cache".

After doing extensive testing on unencrypted device (I even moved the partition to another location by 20 gigabytes, also used smaller size), wiped with zeroes, I came to conclusion that check_cache "superblock corruption" status is probably a bug. Even tried with downgraded to 0.6.1 version.

I disabled the cache_check in lvm.conf and my LVM RAID 5 with BTRFS survived 3 reboots already and btrfsck after each reboot (uncached and cached -- no errors).

```
root@carbon:~# ./lvmcache-statistics.sh 

-------------------------------------------------------------------------

LVM [2.02.173(2)] cache report of found device /dev/vg0/lvol0

-------------------------------------------------------------------------

- Cache Usage: 4.6% - Metadata Usage: 23.7%

- Read Hit Rate: 27.1% - Write Hit Rate: 65.9%

- Demotions/Promotions/Dirty: 0/5412/0

- Feature arguments in use: metadata2 writeback 

- Core arguments in use : migration_threshold 2048 smq 0 

  - Cache Policy: stochastic multiqueue (smq)

- Cache Metadata Mode: rw

- MetaData Operation Health: ok

root@carbon:~# lvs

  LV    VG  Attr       LSize  Pool     Origin        Data%  Meta%  Move Log Cpy%Sync Convert

  lvol0 vg0 Cwi-aoC--- <5,46t [cache0] [lvol0_corig] 4,68   23,74           0,00     
```

Now we wait.

----------

