# [solved] Most reliable filesystem on large raid volume

## MarcusXP

I will create one Raid-6 volume consisting of 8x 2TB drives. (total usable capacity about 11.5 TB)

I need to decide what filesystem to use on this volume.

What my requirements are:

(1). Reliability: The filesystem should be able to recover after a crash/unclean shutdown, with minimum data loss.

(2). A FS that is old enough, or has enough tools on the market so I can recover most of the data in case of human error (e.g. using mk2fs on that volume) or if for some other reasons, the partition table became corrupted.

(3). The FS must be able to handle a partition that large (~11.5TB)

(4). Performance is not important. This volume will be used for storing important personal data (wedding movies, photos, etc), so performance is not that important, most filesystems should be able to deliver the minimum is needed in a home environment...

After doing some research, I came to conclusion that ext2/ext3 would be the way to go.. the question is each one?

I had problems with ext4 partitions getting corrupted after system was shut down unexpectedly (power loss) and not only once.. TWICE! So ext4 is out of question. Plus that the tools to recover data from ext4 are not widely available and not that good yet. Whereas tools for recover data from ext2 / ext3 exists and they reached a certain level of maturity. Also, from my experience, it seems that ext4 is corrupting data sometimes - I have 2 servers, one desktop and one laptop running Gentoo, and I have the impression that ext4 corrupted some files on some occasions (but I might be wrong, or there were some other problems) - maybe you guys have had similar experiences?

As for other filesystems, JFS, XFS, ReiserFS.. my requirement no. (2) and partly, no. (1), pretty much eliminates them.

With a block size of 4KB, both ext2 and ext3 seems to support partitions up to 16TB, and files up to 4TB, so requirement (3) seems to be met by them (according to Wikipedia).

I would really appreciate some feedback regarding your experience with filesystems, especially regarding my requirement (1) and (2). So what filesystems have you used, if you had data corruptions on them, if you had problems when it was unexpectedly shut down (due to power failure or reset, etc) and if what tools you know that can help recover data due to partition corruption, or data deleted by mistake, that would give an advantage to a filesystem over another.

I tend to go for ext2 because tools to recover lost data for it seems to exist, however, wikipedia states that:

"Its main advantage over ext2 is journaling which improves reliability and eliminates the need to check the file system after an unclean shutdown."

http://en.wikipedia.org/wiki/Ext3

Because of journaling, however, I think it would be a bit harder to recover lost/deleted data from it (but I might be wrong, so please correct me).

BTW, requirement no. (2) exists because recently I ran: "mk2fs.ext3 /dev/sdb1 -m 0" on a filesystem that contained data, and it was already formatted with ext4. I did this by mistake, believe me.. stupid drive names "sdb". "sdc" confused me from some unknown reason. After a few seconds I realized what I was doing and I pressed CTRL-C, but it already started creating some innodes, so the partition is not usable anymore. Now I am struggling to recover the data from that partition (about 6.8TB), so if anyone knows a good tool to repair ext4 partitions or to recover data from them, it would be greatly appreciated.

BTW, why "mk2fs" doesn't have a simple question, like "Are you sure you want to do this - Yes/No" and a quick summary of the partition the user is about to format/recreate (like total size, used size, existing filesystem on it, etc)? It would be great against human stupidity (at least in my case, it would've worked most likely..)

Also, some best-practices against disasters (human-made or hardware/software issues) would be appreciated as well. I've read that "partimage" is a good tool to store the partition information, but it can be used with ext2/ext3 only (not with ext4 yet), and I am not sure if it can be used with other FS, like XFS, ReiserFS, etc.. so this counts for requirement no. (2) as well, I guess...

thanks a lot, and Happy New Year, Gentoo users!Last edited by MarcusXP on Thu Jan 07, 2010 12:38 am; edited 2 times in total

----------

## Earthwings

Using ext3 on a fileserver at work. It is a combination of a hardware RAID controller with 12 devices, two RAID 10 volumes, LVM2 and ext3, which stores critical data. So far (three years) there have been three or four unclean shutdowns because of power failure (UPS is short-living and Linux doesn't get its powerdown signal) and kernel panics due to XEN. To my knowledge no data got lost at any time. Ext2 is no option in my opinion as the needed file system checks takes hours to complete, which is highly annoying. Ext4 is not out in the wild long enough yet.

Notice that we do additional daily backups which is a must for any critical data. As you described above, user errors easily wipe out all data regardless of the underlying storage system.

For data recovery, I have used testdisk successfully on other systems.

 *Quote:*   

> BTW, why "mk2fs" doesn't have a simple question, like "Are you sure you want to do this - Yes/No" and a quick summary of the partition the user is about to format/recreate (like total size, used size, existing filesystem on it, etc)? It would be great against human stupidity (at least in my case, it would've worked most likely..)

 

It is used in scripts / setup tools and generally assumes the user knows what he is doing. Then again, an --interactive switch like cp/mv have probably wouldn't hurt.

----------

## jormartr

Ext2 is not an option, never, because it lacks journaling. Ext3 is ext2 plus journaling.

In regard of the mkfs command, you can pass to it the -n parameter, and it will just print what would do. Commands do not usually prompt, everything is made the way you know what you are doing, and asking for everything would make it not posible to automate things.

There are other file systems, but the most tested is ext3.

----------

## gentoo_ram

I'm not sure why you are against XFS.  I've used it for years and have found it reliable.  It's specifically designed for large filesystems and is especially good at large files.

I did a test and found you were right about using mke2fs on an existing filesystem.  That utility will always go forward.  Interestingly mkfs.xfs will give you a warning if it detects a pre-existing XFS or ext* filesystem.  You must specifically pass the "-f" flag to override the check.

The xfsutils package includes utilities like xfs_check, xfs_db (debug), and others that can help recover data.  But if you really need to use them then you're in a world of hurt no matter which filesystem you're using.

Here's another interesting difference.  I believe ext* filesystems can be re-sized only while they are not mounted.  XFS filesystems are resized only while mounted.  So if you are using LVM, you can re-size XFS pretty easily.

The only real profile I wouldn't use XFS for is if you need to use lots of small files.  Especially deleting them.  That's slow.  But large files are fast.

----------

## lagalopex

xfs broke for me after a power failure... I use ext3 for some years now, without serious problems.

Even added a drive to the raid6 and resized it. Without any shutdown  :Wink: 

 *man resize2fs wrote:*   

> The resize2fs  program will resize ext2, ext3, or ext4 file systems. It can be used to enlarge or shrink an unmounted file system located on device. If the filesystem is mounted, it can be used to expand the size of the mounted filesystem, assuming the kernel supports on-line resizing. (As of this writing, the Linux 2.6 kernel supports on-line resize for filesystems mounted using ext3 only.). 

 

----------

## drescherjm

xfs or ext4. At home and at work am now in the process of migrating my 2 to 7TB software servers away from xfs to ext4. One reason is xfs tends to be slow for operations that deal with thousands of small files. For me ext3 is not an option because it is very slow on operations with large files. Try deleting a 20 GB file on ext3 and you will see what I mean. For ext3 I have seen this take minutes while for xfs it takes a few seconds (if that).

An additional benefit to ext4 over xfs is that with ext4 you can shrink volumes.

[EDIT]Since you said performance is not important. I would say ext3 is your best choice.[/EDIT]

----------

## drescherjm

 *Quote:*   

> Also, some best-practices against disasters

 

Backup your data if it is essential that you do not loose it. Any raid level is never a substitute for backups. At work I double backup everything on my raid servers with a dual drive LTO2 archive and at the moment 80 LTO2 tapes. This is however way too expensive for a home user. 

Raid also does not prevent against a total filesystem loss created by catastrophic file system corruption and of course it does not prevent 

```
rm -rf /mnt/my_really_large_raid
```

If you can not afford a backup, one way to minimize the chances of total loss is to use LVM on top of the array and smaller filesystems on top of that.

----------

## MarcusXP

 *drescherjm wrote:*   

> xfs or ext4. At home and at work am now in the process of migrating my 2 to 7TB software servers away from xfs to ext4. One reason is xfs tends to be slow for operations that deal with thousands of small files. For me ext3 is not an option because it is very slow on operations with large files. Try deleting a 20 GB file on ext3 and you will see what I mean. For ext3 I have seen this take minutes while for xfs it takes a few seconds (if that).
> 
> An additional benefit to ext4 over xfs is that with ext4 you can shrink volumes.
> 
> [EDIT]Since you said performance is not important. I would say ext3 is your best choice.[/EDIT]

 

I tend to go for ext3 from what I've seen so far. There is NO WAY I will use ext4 on a partition where I store valuable data.

I already had TWO filesystem failures (with ext4) when the system shut down improperly (power loss) during read/write operation.

Luckily I didn't have anything important there, so I didn't care too much. But that proves that ext4 is not mature enough yet, if unclean shutdowns can cause this kind of problems..

Plus that occasionally I noticed some file corruptions on my / partition (might or might not be related to the fact that / was using ext4 filesystem..)

Also, there was a nasty bug in KDE4 that was causing file corruptions during heavy read/write operations on the filesystem (many p2p users experienced it for sure) - it was a big deal about it a while ago, I'm not sure if it was solved by the KDE team, or by the FS/kernel developers or it was fixed on both sides. All this proves that ext4 is not mature enough yet, and should not be used on partitions where you store important data.

I might give btrfs a try for my / partition soon (probably on my laptop and/or desktop), but I will not try ext4 again, that's for sure.

For better protection, I was thinking to setup a 2nd server (I already have the hardware) to do something like a RSYNC mirror of the first one.. so that I have another copy at all times.

As best practice against potential data loss, I think copy of the partition table using "partimage" is a good idea.

If I had that, probably I would've been able to restore my partition after I ran the "mk2fs" command.. anyone knows if this is true or not?

----------

## MarcusXP

 *gentoo_ram wrote:*   

> I'm not sure why you are against XFS.  I've used it for years and have found it reliable.  It's specifically designed for large filesystems and is especially good at large files.
> 
> I did a test and found you were right about using mke2fs on an existing filesystem.  That utility will always go forward.  Interestingly mkfs.xfs will give you a warning if it detects a pre-existing XFS or ext* filesystem.  You must specifically pass the "-f" flag to override the check.
> 
> The xfsutils package includes utilities like xfs_check, xfs_db (debug), and others that can help recover data.  But if you really need to use them then you're in a world of hurt no matter which filesystem you're using.
> ...

 

I'm not against XFS, I am open to suggestions.. but XFS seemed not so reliable compared with the ext3 so far. It is good indeed with large files, which makes it pretty appealing, as most of my files are large..

About the xfsutils package.. why the mkfs.xfs will give a warning if it detects an existing partition, but mk2fs.ext* doesn't? That's  sux.... a damn warning saying I have already a partition there, would've protected me against my own stupidity  :Sad: ((((((  Performing a critical action like making a filesystem, should throw a warning if existing partition already exists.. it's COMMON SENSE!

I might also give LVM a try, especialy because I'd like to learn to use it.  :Smile: 

----------

## MarcusXP

But I see that no one gives ReiserFS a chance here?

It should be pretty reliable (it is pretty old), with good/fair performance, as far as I've read.

And from what I've seen, there are tools for data recovery from ReiserFS as well..

----------

## LesCoke

I use JFS.  JFS seems to provide the best mix of being fast for small files while also being fast for large files.  However;  I agree with the assertion that there are not enough tools that understand how to recover data from such file systems.   I have since learned about testdisk, but haven't played with using it yet.

Last year I had two drives go down in one of my servers.   Both were on same IDE cable; one drive died and went busy;  this disrupted communications with the other drive causing evms' bad blocks plug-in under my JFS partition to go haywire corrupting the file-system.  I had backups, but I played with the tools I knew about to see how much data I could retrieve from the blown file-system.  I proved I could recognize certain types of files and retrieve them provided enough patience and a keen eye using 'od -A x -t x1z', but fragmentation and files larger than the extent size foiled most attempts.  With a bit more study of the data structures within a JFS volume I believe a recovery tool is possible.

After that incident, I tried a 3 way raid 1 mirror;  I'm uncomfortable spanning a volume across multiple drives and figured a 3 way redundancy of the data was a reasonable option.  Well,...  I quickly found it wasn't.  I managed to do something stupid that took down the kernel while the volume was being accessed.  Upon system restart the volume came back up and proceeded it's task of rebuilding the degraded mirror.  With the 3 way mirror, I figure, only one of the drives had a good journal, and the other two being stale appeared to the raid software to be in a better state, thus it began writing them back to the first.   The directory where the activity was when the system went down disappeared into the lost and found with all files having names unrelated to the originals.   Fortunately backups got us back to the prior day's work.

I like the idea of raid 6, but haven't tried it for the same reasons I'm uncomfortable spanning a volume across more than one drive.  I currently use hourly rsnapshots to a secondary drive and perform regular backups as before.  rsnapshot is nice in that I can use it to track hourly file changes for weeks with only the extra space used by the files that have changes.

Lessons learned:

I dumped using the bad block plug-in in evms.

On most new drives I have also dumped evms completely, since it is nolonger included in the latest Live System Recovery CD and doesn't appear to have been updated in the last two years (I still need an alternative to evms for managing luks volume mounting by label to cope with device naming and order being different from one kernel to another).  That being said, the only problem I had with evms was the incident with the bad block layer going haywire.  Lack of universal support in live distros is my only reason to look for an alternatve.

Do not use drives on same IDE cable as a redundancy measure for the same volume.  Probably rings true for SATA port expanders as well.

backup, backup, backup,... Raid is not a backup.

LesLast edited by LesCoke on Wed Jan 06, 2010 1:46 am; edited 1 time in total

----------

## drescherjm

 *Quote:*   

> As best practice against potential data loss, I think copy of the partition table using "partimage" is a good idea.
> 
> If I had that, probably I would've been able to restore my partition after I ran the "mk2fs" command.. anyone knows if this is true or not?

 

You use sdfisk for backup of the partition table and restore. Although I have made partition table backups for a couple of dozen machines I do not recall ever needing to restore the table. However since its a very small file (less than 512 bytes) its not like I will worry about the space.

 *Quote:*   

> But I see that no one gives ReiserFS a chance here? 

 

A few years ago I migrated my systems (home and work) away from that to xfs because of poor performance. 

 *Quote:*   

> There is NO WAY I will use ext4 on a partition where I store valuable data.
> 
> I already had TWO filesystem failures (with ext4) when the system shut down improperly (power loss) during read/write operation.
> 
> Luckily I didn't have anything important there, so I didn't care too much. But that proves that ext4 is not mature enough yet, if unclean shutdowns can cause this kind of problems.. 

 

I have only once experienced a problem with ext4 and that was with a 2.6.26 kernel sometime in 2008 and under a very specific situation. I was enlarging the filesystem online while running a vm under the filesystem that was being enlarged. This caused a kernel bugcheck.

Now I have over 2TB of data on ext4 at home and 7TB or so on ext4 at work. The work data all gets backed up with the LTO2 archive but the home data only gets part of the data backed up.

----------

## MarcusXP

 *drescherjm wrote:*   

> 
> 
> I have only once experienced a problem with ext4 and that was with a 2.6.26 kernel sometime in 2008 and under a very specific situation. I was enlarging the filesystem online while running a vm under the filesystem that was being enlarged. This caused a kernel bugcheck.
> 
> Now I have over 2TB of data on ext4 at home and 7TB or so on ext4 at work. The work data all gets backed up with the LTO2 archive but the home data only gets part of the data backed up.

 

Given my bad luck with ext4 so far, collaborated with my stupidity (formatting the partition by mistake), I'd like to keep my data on something that is a bit more mature and has more chances of recovery in chance of failure/human error.

There are only a few commercial tools to recover data from ext4, I've only tried R-Studio so far. It seems to be working pretty well, but after it scanned everything (it took like 2 days), when I try to see the filesystem structure (this also takes about 10-15 minutes to load the file list) it crashes, due to very large number of files probably... so they might improve it in a later version. However if I cancel loading the files at a previous point, I am able to see some of the files, and I was even able to recover a big file (about 1GB), and its integrity seems to be ok!

I'll just keep the drives in a safe place, for the day when a good tool will arrive, to repair the partition or recover the data, if I cannot repair the partition (probably a newer version of testdisk will be able to repair the partition table?).

So (for me) the winner is EXT3 for now. If I change my mind, it will be XFS probably, due to the fact that is works well with large files and large raid arrays..

----------

## drescherjm

 *Quote:*   

> Given my bad luck with ext4 so far, collaborated with my stupidity (formatting the partition by mistake), I'd like to keep my data on something that is a bit more mature and has more chances of recovery in chance of failure/human error. 

 

Understood.

 *Quote:*   

> So (for me) the winner is EXT3 for now.

 

This is the best choice for reliability.

----------

