# dmcrypt encrypted swap crashes system

## nicolasbock

I have all my partitions encrypted with dmcrypt, including 2 swap partitions. I use this setup for the swap partitions:

```
/dev/conf.d/dmcrypt:

swap=crypt-swap-1

pre_mount='mkswap -f ${dev}'

source='/dev/sda2'

swap=crypt-swap-2

pre_mount='mkswap -f ${dev}'

source='/dev/sdb2'

```

Booting the system works flawlessly, I get no complaints and the "free" command reports the correct size of swap enabled. However, once an application allocates large amounts of memory and paging to swap starts, the system just hangs. I have not noticed this before, and I am pretty sure that when I set up this system about a year ago, the encrypted swap worked just fine. Of course in the meantime a lot of things changed on the system through regular software updating and since I have a fairly decent amount of physical memory, a stability issue with swap might have gone unnoticed for quite a while. Unfortunately I don't see any error messages in the log files and I am pretty much in the dark as to the reason for these crashes.

I have a few more data points that might help in figuring out where the problem lies. As a check I have, right after booting, "swapoff"ed the encrypted swap, removed the dmcrypt mapping ("cryptsetup remove crypt-swap-1"), and re-enabled an unencrypted swap on the 2 partitions ("mkswap /dev/sda2; swapon /dev/sda2"). Now the system is rock solid and does not crash upon swapping. Also, as I mentioned above, since all my other partitions are encrypted as well, the dmcrypt encrypted mapping must work. Since normal partitions are LUKS and swap is not, this last statement might no be very meaningful however.

Since swapping and device mapping is done in the kernel, my best guess is that this is a kernel problem. I am using 2.6.34-gentoo-r11 on AMD64. Any help is greatly appreciated.

----------

## manaka

From your description, seems a kernel bug. In order to confirm, you should see if the kernel logs some message to the console (it usually does when hitting a bug). Kernel last messages aren't typically logged by syslog. You won't find them in any file after rebooting.

To see the kernel messages, switch to a console if necessary (for example, Ctrl-Alt-F2 if inside an X session). Then, enable kernel message logging with dmesg -n8. And start the memory demanding applications.

----------

## mr.sande

Have you tried making an encrypted swap that is permanent? From what I can tell you are recreating the encrypted swap partitions on each boot. So maybe there is a problem with that part of your setup. Although I dont understand why there should be a problem with that.

I can confirm that encrypted swap is working as Im currently using it myself, only difference is that I have it inside an LVM. Im currently using 2.6.34-gentoo-r12.

----------

## nicolasbock

Thanks for the replies.

manaka, I did exactly as you said and I don't get any kernel error messages. I did notice something however. Since I switched to a text console I noticed that the system doesn't actually freeze. It's more like as if the disks freeze, but the rest of the system is still up. I can switch consoles with the keyboard for instance. I can ping the machine. I can not do much else though, anything that seems to involve a disk access, i.e. log in, switch to graphical console, or try to abort the high memory job is frozen. Since the system is still up to some extent I can use the magic sysyreq keys but I haven't figured out yet how that might be useful to find out more about this problem.

mr.sande: Unfortunately your suggestion didn't make any difference. I looked at the difference between my swap partitions and the regular ones. I found 2 obvious ones: 1) I am using "direct" encryption on swap and LUKS on regular, and (2) the cipher was aes-plain on swap and aes-cbc-essiv:sha256 on the regular partitions. I made the swap partitions LUKS and changed the cipher to match the one of the regular partitions, but this did not have any effect. The system still freezes once swap is needed.

I also switched kernels, and went with vanilla 2.6.34.5, also with no change.

----------

## manaka

 *Quote:*   

> 
> 
> I did exactly as you said and I don't get any kernel error messages. I did notice something however. Since I switched to a text console I noticed that the system doesn't actually freeze. It's more like as if the disks freeze, but the rest of the system is still up. I can switch consoles with the keyboard for instance. I can ping the machine.
> 
> 

 

When disk I/O freezes, the kernel displays some error messages. But it appears you don't have any of these.  :Confused: 

SysRQ would let you kill processes, but not selectively, that it's what you need. Perhaps its T function (show-task-states) may provide some insight on what's happening. As your system doesn't write log messages to disk, a serial console (or netconsole) would be advisable in order to record all those lines.

----------

## nicolasbock

I posted this question to the dm-crypt mailing list and it appears as if this is a known problem. From what I have gathered so far, the system is actually not crashing, it's just sort of stuck in I/O. I have let it stay in this frozen state and after about 20 minutes my job actually unfreezes and the system becomes normal again. According to the people on the mailing list, this is a known kernel problem with something called a write-barrier. Supposedly the issue is being worked on right now, and 2.6.36 is already improved over 2.6.34. Unfortunately, in my particular case, upgrading to 2.6.36 didn't fix it, but I was told that 2.6.37, once it is out, will be even better. Well, let's see. Here is the link to my initial post to dm-crypt:

http://www.saout.de/pipermail/dm-crypt/2010-November/001329.html

In the meantime, I see only 2 options: Do not use swap, or go to unencrypted swap.

----------

## Watcom

I have the same problem, the first time it happened was probably a few months ago. I remember it used to work when I first set it up, maybe a couple years back. Something got broken in the kernel, dm-crypt, or both.

In my case the system completely freezes for 5-10 minutes, mouse pointer freezes, completely non-responsive. The HDD led blinks very slowly while it's frozen, I'd say there's almost no disk activity. Remote access is also not possible, even ping returns nothing.

Since I rarely hit swap this is hardly ever a problem for me but it's certainly very annoying when it happens. I'm surprised to find almost nothing on the subject when googling.

----------

## nicolasbock

Yes I was surprised by the lack of google hits too. I guess not many people use encrypted swap and never notice this.

----------

## frostschutz

I use encrypted swap, with zero problems - however my entire setup is custom (full disk encryption dmcrypt/luks with custom kernel and custom initramfs), not using the Gentoo dmcrypt init/config system.

as for kernel, I'm on 2.6.36.1 on amd64, however I've been using encryption for some time...

----------

## rh1

I also use encrypted swap. Mine is permenately encrypted for hibernation reasons. I haven't had any issues at all with it. Works perfect. I'm also using my own initramfs and tuxonice-sources-2.6.36. Never tried the init/config system.

----------

## nicolasbock

rh1 and frostschutz, have you tried to allocate a lot of memory to force the memory manager to swap out? Can you fill the swap while the system stays responsive?

----------

## rh1

 *Quote:*   

> Can you fill the swap while the system stays responsive?

 

Sure can ,  had to set up large tmpfs and copy bunch of stuff to it to get it to start using swap. This is from top while posting this from desktop using firefox, and also running emerge -uND world and downloading some torrent files in background.

```
Mem:   6119300k total,  6076704k used,    42596k free,   156056k buffers

Swap: 15734604k total,  6339256k used,  9395348k free,  5546048k cached
```

I'm not sure how large a hibernation image is but it always works fine for that too.

----------

## nicolasbock

Thanks, now I am really impressed  :Smile:  I use an older kernel, 2.6.34-gentoo-r12, maybe that's the problem. Although I did try and went to 2.6.36 with no change. Maybe it's the tuxonice patches that fix something?

----------

## rh1

 *Quote:*   

> mr.sande: Unfortunately your suggestion didn't make any difference. I looked at the difference between my swap partitions and the regular ones. I found 2 obvious ones: 1) I am using "direct" encryption on swap and LUKS on regular, and (2) the cipher was aes-plain on swap and aes-cbc-essiv:sha256 on the regular partitions. I made the swap partitions LUKS and changed the cipher to match the one of the regular partitions, but this did not have any effect. The system still freezes once swap is needed. 

 

Just curious, when you tried this earlier, did you still use the init script or did you encrypt and activate manually?

----------

## Watcom

Or maybe something different with the Gentoo init/configs? Of course if a race condition is occurring there's definitely a bug in the kernel, but perhaps it only occurs under certain odd conditions. For me the problem started a few months ago, possibly after a kernel upgrade, I don't know.

In my fstab I have:

```
/dev/mapper/crypt-swap  none            swap            sw              0 0
```

In my /etc/conf.d/dmcrypt I have:

```
## swap

# Swap partitions. These should come first so that no keys make their

# way into unencrypted swap.

# If no options are given, they will default to: -c aes -h sha1 -d /dev/urandom

# If no makefs is given then mkswap will be assumed

swap=crypt-swap

source='/dev/sda2'

```

This reminds me that recently I had a problem with my non-luks partitions as they've changed the defaults so I had to manually specify the cipher. Maybe this is a dmcrypt config bug? Just speculating.

----------

## nicolasbock

rh1: I tried both ways. It didn't make any difference.

Watcom: The link I posted a little bit up, to a discussion on the dm-crypt mailing list seems to suggest that this is a kernel bug. It was claimed there that there are "write barriers" in the kernel, don't ask me why or what they do, that cause this behavior. Supposedly it is being worked on by the kernel people, but not quite fixed. That's why I am surprised that the tuxonice sources apparently work fine.

----------

## frostschutz

I'm using unpatched vanilla kernel. So I don't think it's related to tuxonice.

If it's because of write barriers, those are most likely disabled for me. I don't think they work through DM-Crypt and LVM.

(Swap -> LVM -> LUKS -> GPT Partition -> SATA AHCI drive)

----------

## nicolasbock

Thanks frostschutz,

I am running vanilla-sources-2.6.36.2 now and can report that all of my problems with swap have gone away. Amazing!

----------

## Watcom

Then it must be something only present in gentoo-sources. That's strange because I thought tuxonice had all the gentoo patches and more.

----------

## nicolasbock

I take back what I posted earlier. It still doesn't work. What happened was that I tried my memory allocation test first on the console without load on the system. That worked great. The second time I tried (after posting here) it was under light IO load with X and gnome running, and the system froze again. There is no difference as far as I can tell between the gentoo and the vanilla sources.

Maybe to make this more concrete, here is the code I am running. I have 8GB physical memory and 8GB swap in 2 swap partitions of 4GB each. They are encrypted using /etc/conf.d/dmcrypt.

```
#include <stdio.h>

#include <stdlib.h>

int

main ()

{

  char *data = NULL;

  size_t i;

  size_t L;

  size_t N;

  L = 12; /* Number of bytes in GB. */

  N = L*1024*1024*1024; /* Number of bytes. */

  printf("starting to allocate %lu bytes = %lu GB\n", N*sizeof(char), L);

  data = (char*) malloc(sizeof(char)*N);

  for (i = 0; i < N; i++)

  {

    data[i] = (char) rand();

  }

  printf("done.\n");

  while (fgetc(stdin) != '\n') {}

}

```

The relevant section from /etc/conf.d/dmcrypt:

 *Quote:*   

> swap=crypt-swap-1
> 
> #options='-d /dev/urandom'
> 
> pre_mount='mkswap -f ${dev}'
> ...

 

where I added the pre_mount option to get rid of the error message I get otherwise, complaining that mkswap will not touch a whole disk.

----------

