# mdadm RAID5 + LUKS + Ext4: help with options

## Havin_it

Hi,

I'm trying to set up a mdadm 3-drive RAID5 array, on which I'll create a LUKS volume holding an Ext4 partition for /home. I'm considering getting a swap partition in there too, though I also have the option of putting this on the smaller HDD that holds the Gentoo install.

The first problem I've hit is the difficulty of establishing the best (or at least sensible) options to set when doing so. I started working off this posting as it seemed to mostly match my plan (the part for /dev/md7 in the posting), so I did the following:

```

mdadm --create /dev/md0 --level=5 --auto=md --chunk=128 --spare-devices=0 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd

cryptsetup --cipher=aes-xts-plain64 --hash-sha256 --key-size=256 --align-payload=512 luksFormat /dev/md0 /root/keyfile

```

However that's as far as that gets me, as the remainder of that item discusses putting XFS on the LUKS volume. I've nothing against XFS but think I'd rather stick with the relatively known quantity (for me) of Ext4. Also from what little I have picked up about XFS, it won't perform well with a regular churn of small files (my dev repos will live in there). So, some questions:

1) What with the underlying encryption, will it actually make any difference what alignment/block-size/stride/stripe-size options i use for the Ext4 format?

2) Would it be better (performance-wise) to put the encrypted swap as:

(a) A separate encrypted volume on top of the RAID, alongside the /home LUKS volume;

(b) A "normal" swap partition inside the LUKS volume, alongside the Ext4 /home partition (this was initially my plan); or

(c) An encrypted swap partition on the smaller OS/boot drive?

3) I wanted to make the dmcrypt initscript depend on mdadm (by adding "need mdadm"), but it appears they want to start in the opposite order (I got caught in a dependency loop when I tried to start the modified dmcrypt). This seems odd! Is there another way to make init ensure the RAID is good before trying to start dmcrypt? (There'll be other services needing the LUKS /home to be mounted before they can start.)

Probably more to ask, but those are all that immediately come to mind. For reference, the drives comprising the RAID5 set are 3 identical 500GB SATAII drives, sector size 512B. The boot drive is a 250GB SATAII. Processor for the box (a headless HP Microserver) is an AMD Turion II Neo N40L. TIA for anyone taking a crack at any/all of these queries.

----------

## NeddySeagoon

Havin_it,

```
mdadm --create /dev/md0 --level=5 --auto=md --chunk=128 --spare-devices=0 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd
```

is a bit unusual.

You normally donate partitions to md devices, not whole drives.  This is no loger the problem it once was and the kernel can work with partitioned /dev/mdX.

Until recently, it wolod allow you to partition but not actually use the partitions.

I use lvm on top of raid5 on my HP Microserver but my paranoia level is lower than yours - I don't use encryption at all.

This wiki page is nice and simple and its easy to leave out the bits you don't want.

You won't need the initrd as you have root on a single drive.

I use my microserver for KVMs so I have the bare metal install in one LVM PV and the VMs in another.

----------

## frostschutz

 *Havin_it wrote:*   

> 1) What with the underlying encryption, will it actually make any difference what alignment/block-size/stride/stripe-size options i use for the Ext4 format?

 

Recent versions of cryptsetup/LUKS take care to get the alignment properly, i.e. the LUKS metadata at the beginning of the drive is blown up to multiple of MiB (1 or 2 MiB I don't remember).

That should be good enough for alignment questions as underlying raid chunk size or drive block size is usually smaller than 1 MiB (4k, 128k, whatever). Thus the encryption should not matter in terms of alignment questions, the raid chunk size might though (not sure how large ext4 metadata is and how it aligns by default).

 *Havin_it wrote:*   

> 2) Would it be better (performance-wise) to put the encrypted swap as:
> 
> (a) A separate encrypted volume on top of the RAID, alongside the /home LUKS volume;
> 
> (b) A "normal" swap partition inside the LUKS volume, alongside the Ext4 /home partition (this was initially my plan); or
> ...

 

Well with swap you can use a separate encrypted volume, it has the advantage that you can use a completely random key every time you boot, so no one can decrypt it as there is no password. Performance wise it's better to have separate encrypted volumes assuming the kernel still uses only one thread per volume. So with separate encrypted volumes in use concurrently the CPU is utilized better if it's a multicore CPU. If your encryption is accelerated somehow (AES-NI) it probably won't matter though.

 *Havin_it wrote:*   

> 
> 
> 3) I wanted to make the dmcrypt initscript depend on mdadm (by adding "need mdadm"), but it appears they want to start in the opposite order (I got caught in a dependency loop when I tried to start the modified dmcrypt). This seems odd! Is there another way to make init ensure the RAID is good before trying to start dmcrypt? (There'll be other services needing the LUKS /home to be mounted before they can start.)

 

Not sure about that. I don't use the dmcrypt init script at all - my initramfs does the decrypting and then there's a separate volume for backups where the backup script itself takes care of it (luks container is only open while the backup process is running).

----------

## Havin_it

Thanks both for the replies. I'll answer one at a time as time is short right now...

 *NeddySeagoon wrote:*   

> Havin_it,
> 
> ```
> mdadm --create /dev/md0 --level=5 --auto=md --chunk=128 --spare-devices=0 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd
> ```
> ...

 

Is it *bad* to do it this way though? The way I saw it, creating partitions when I'm using the whole drives anyway would just be a needless extra layer of abstraction (and take a teaspoonful away from the overall capacity, I guess).

You also mentioned LVM, which a lot of people do seem to involve in this type of setup, but I'm not seeing what benefits it would provide either, really.

If you have a Microserver too, you'll know it lives up to the "Micro" part, which I love about it, but that does make it a little more portable than its predecessor (a big tank of a thing!) in the event of burglary. That's the main motivation for the encryption; it might not be necessary for the whole capacity I guess, but that just seemed easier and more flexible -- I've always shied away from partitioning when not absolutely necessary because I fear ending up with the free space not in the partition where I need it...

Thanks for the wiki link, shall chew over it soon as time allows.

----------

## NeddySeagoon

Havin_it,

There is nothing I need to encrypt on my microserver, its my mailserver, firewall and media sever.

Its also in my garage, about 50m from the house, on the end of 1G wired networking.

It wold be a nusance if it were stolen ... mostly bacause of the time it took to feed it 1200 DVDs. I still have the CD collection to do.

The attraction of LVM is that it allows you to move space around from one 'partition' to another as needs change.

LVM is also good for Virtual Machines as you can donate a logical volume to a VM and have the VM make a filesystem there.  It save the overhead of a filesystem in a file for a VM.

There is no overhead for partitioning, other than the space lost to the partition table, 1 physical sector. 

You can reclaim the rest if you wish. Each partition is a block device to the kernel. The 'overhead' is in setting up the block device at boot, maybe a few microseconds.

----------

## Havin_it

 *NeddySeagoon wrote:*   

> Havin_it,
> 
> There is nothing I need to encrypt on my microserver, its my mailserver, firewall and media sever.
> 
> Its also in my garage, about 50m from the house, on the end of 1G wired networking.

 

50m garden huh? I'm guessing you don't live as close to the Banana Flats as I do   :Rolling Eyes:  and my building's main door doesn't lock   :Shocked: 

Mostly a web/mail/media server in my case, as I said it doesn't all need encrypting but some of it definitely does. Sadly, no hardware crypto, so read speed on the LUKS volume is about 1/3 that of the raw mdadm volume  :Sad:  Probably I'll split this up into 2 partitions and only encrypt the vital part. However:

 *NeddySeagoon wrote:*   

> The attraction of LVM is that it allows you to move space around from one 'partition' to another as needs change.
> 
> LVM is also good for Virtual Machines as you can donate a logical volume to a VM and have the VM make a filesystem there.  It save the overhead of a filesystem in a file for a VM.

 

No likely use for VMs on this box, and as much as I read about LVM I still can't see how it makes trading space between partitions any easier. Per Ye Wiki, I'd say it just makes it more complicated as you have to resize the LVs as well as the filesystems (if they support it). What am I missing?

On the other hand, it does raise a couple of interesting new possibilities I hadn't thought of. I could do a RAID6 across all 4 drives and have the rest in RAID5 (I had been wondering what to do with the extra space on the boot drive that's far more than needed for the Gentoo install). Or maybe use part of the RAID5 set as a snapshot/mirror target for the boot drive. Too many choices!

 *NeddySeagoon wrote:*   

> There is no overhead for partitioning, other than the space lost to the partition table, 1 physical sector. 
> 
> You can reclaim the rest if you wish. Each partition is a block device to the kernel. The 'overhead' is in setting up the block device at boot, maybe a few microseconds.

 

Certainly if I go for any of the more complex arrangements mooted above, I guess I'll be doing it anyway, but were I just to stick with Plan A, would there be any actual downside to not using partitions?

----------

## frostschutz

 *Havin_it wrote:*   

> I'd say it just makes it more complicated as you have to resize the LVs as well as the filesystems (if they support it). What am I missing?

 

Resize the LVs and resize the filesystem can be done with a single command each.

Without LVM, if you wanted to resize traditional partitions, you also have to move tons of data around since the only way to resize a traditional partition is to move the partitions next to it out of the way too.

 *Quote:*   

> were I just to stick with Plan A, would there be any actual downside to not using partitions?

 

The only possible downside is that without partitions, $some_software may not be able to detect what the drive is being used for, and may be inclined to partition/format it.

----------

## Havin_it

 *frostschutz wrote:*   

> 
> 
> Without LVM, if you wanted to resize traditional partitions, you also have to move tons of data around since the only way to resize a traditional partition is to move the partitions next to it out of the way too.

 

I still don't get it   :Embarassed:   wouldn't this be the case anyway? I mean, the data is there one way or another and still needs to be moved, so what's the difference? Sorry if I am being terribly dim.

----------

## NeddySeagoon

Havin_it,

The space making up a logical volume need not be contiguious as it must be with a conventional parition. It need not even all be on the same drive.

```
  --- Logical volume ---

  LV Name                /dev/vg/var

  VG Name                vg

  LV UUID                lPqpOK-Ps1H-WKmD-9cqa-PDsb-r31i-KDSKEt

  LV Write Access        read/write

  LV Status              available

  # open                 1

  LV Size                59.00 GiB

  Current LE             15104

  Segments               3

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:3
```

Notice the Segment count.  This logical volume and its filesystem have been extended twice.  No data moves, no reboots, no downtime.

Its all done 'on the fly'.  Shrinking is a little more complex.

This logical volume is similar to three partitions scattered randomly on some number of drives (three or less) being used seamlessly as a single partition.

----------

## Havin_it

DOOOHHHHH Okay I get it now. Thanks for getting me there in the end   :Very Happy: 

I'm still doubtful that I'll use it, as I think whatever partitioning scheme I finally decide on will be one I'll want to be happy with long-term anyway (hence all this agonising). But it is something to consider.

Now, going back to the issue of correct tuning for the filesystem on top of the LUKS volume. I put a (gpt) partition table on it, and gparted then reported its sector-size as 512, but is this accurate? The volume is reported as being the same capacity as the underlying mdadm volume, but shouldn't the encryption make it smaller? (Or maybe this is only true of Truecrypt, which is what I've previously used for drive encryption?)

Something else regarding encryption: I've got the AES (x86_64) item enabled in the kernel, but the standard AES item is force-selected. Is the former definitely the one that'll be used, or do I need to somehow specify it when running cryptsetup luksFormat?

----------

## frostschutz

It'll use the fastest AES available for your CPU... (if you enabled it)

Most 4k disks still report their sector size as 512 byte for compatibility reasons. Just align everything to MiB boundaries anyway.

Encryption (LUKS) loses you 1MiB since LUKS stores its metadata at the beginning of the device and then adds empty space, so the payload will again be aligned to MiB boundary.

You'll probably lose a much larger amount of space for filesystem metadata (superblock, backup superblocks, journal, etc.) than for LUKS metadata.

----------

## Havin_it

 *frostschutz wrote:*   

> 
> 
> Most 4k disks still report their sector size as 512 byte for compatibility reasons. Just align everything to MiB boundaries anyway.

 

Yikes. I did some reading about this and it sounds like a right headache. Is there any way of knowing for sure whether the drives are 4KB or 512B? The model numbers are

WD5000AAKS (3x WD Caviar Blue 500GB 3Gbps)

VB0250EAVER (1x HP 250GB 3Gbps)

In both cases gparted reported physical sector-size of 512B.

Does this mean I should create partitions on the RAID5 set to avoid misalignment, or will using them raw as I have been thus far be OK? I got advice here recently on partitioning an SSD and was advised to leave 1MB before the first partition: is this the same underlying reason?

And is it actually correct that (metadata notwithstanding) 1MB of data in the LUKS volume will = 1MB on the underlying RAID volume? How does LUKS achieve this parity where Truecrypt cannot? Sorry for so many questions but I really want to understand this well...

ADD: I just noticed the LinuxQuestions post I linked up top was using 4KB-sector drives, so I guess as long as they knew what they were doing (always a gamble), then what I've done so far is OK. And if his equations are to be trusted, and as you say the LUKS layer doesn't make a difference to any of this, that means for my 3-drive RAID5 I'd want

Spindle = 2

stride = (128*1024)/4096 = 32

stripe-width = 32*2 = 64

----------

## Havin_it

Y'know, I can't see anything to suggest that these drives are 4k "Advanced format". There's a denial of such here for the WD ones, and following the gist of that, the HP one being even smaller capacity is less likely to be so. Nothing concrete; HP don't seem to mention anywhere on their site that they even make HDDs, and the only mention of Advanced Format is in offering a Windows tool to identify such drives. WD say all their AF drives are labelled as such, hopefully that goes for HP too, so I'll pop all the drives later and have a look.

----------

## frostschutz

Just align everything to 1MiB anyway; then you're good for the future (if a drive breaks and you have to replace it) and it doesn't really matter if the drive uses 512b, 4k, or larger blocks (as long as they dont get larger than 1MiB)

----------

## Havin_it

 *frostschutz wrote:*   

> Just align everything to 1MiB anyway; then you're good for the future (if a drive breaks and you have to replace it) and it doesn't really matter if the drive uses 512b, 4k, or larger blocks (as long as they dont get larger than 1MiB)

 

So does that mean needing to leave a 1MB gap at the start of each drive (even if being used in raw style as above)?

----------

## frostschutz

No, you don't have to leave a 1MiB gap when you use it raw.

However if you use a partition table the partitions should start on MiB boundaries.

----------

## Havin_it

Okay, consider me convinced on that point if there's nothing lost from it. Future-proofing rocks  :Wink: 

So I'm now thinking I might make my overall strategy more complex based on some of the ideas mentioned in this thread. I still prefer to keep a regular physical partition on the smaller drive to boot off, but this only needs to be about 14GB. So now I'm thinking I could make a 4-drive RAID5 array across all 4 drives using the remaining ~236GB from that drive (a 708GB volume), then another 3-drive RAID5 with the remaining space on the 3 big drives (528GB). And if my understanding of LVM is correct, I could layer that on top to make a single logical volume (totalling 1236GB) from the two arrays, and then do as I please with that: some encrypted, some not. That gives me about the same total space to play with as my original plan, but more protected Am I sound?

----------

## Havin_it

OK, I've started working towards the above plan by shrinking the root partition on the small disk to 20GB and making a second one  with the excess. As I'm doing this on the CLI, Can I just check something:

In parted, I have the boot partition starting at sector 2048 (to satisfy 1MB alignment). Is this right, or should it be 2049s?

Better get this right now before I go further...

EDIT: Whatever the answer I assume it then applies to subsequent partitions too, so my second partition's start-sector should be 2048*N, rather than (2048*n)+1.

EDIT EDIT: Think I have that one answered, as I put a partition on one of the other drives using MiB alignment with gparted (now I've got GUI back) and parted showed it as starting at 2048s.

So now I have the 2 raid5 volumes, one across 4 drives and one across 3. My plan is to make a LVM2 volume-group from these, but do I need to partition them first or can I use them "raw"?

----------

## Havin_it

Right, got there in the end. I realised the above wouldn't work so I've thrown out LVM, because my VG would be made up of two arrays with different "Spindle" values, meaning different optimal stripe-widths for the filesystem.

So instead I've made 3 RAID5 arrays:

1) 6GB 3-drive array (3GB per drive), encrypted swap

2) 640GB 4-drive array (~213GB per drive), holding 6GB encrypted /var/tmp and the rest encrypted /home (best raw read speed to compensate for the encryption overhead)

2) 500GB 3-drive array (250GB per drive), unencrypted, for media

Thanks for all the input, I learned a lot!

----------

