# Some ext3 Filesystem Tips

## codergeek42

Copyright (c) 2005 Peter Gordon

Permission is granted to copy, distribute and/or modify this document

under the terms of the GNU Free Documentation License, Version 1.2

or any later version published by the Free Software Foundation;

with no Invariant Sections, no Front-Cover Texts, and no Back-Cover

Texts. A copy of the license can be found here.

Overview

I'm a big fan of the Third Extended ("ext3") filesystem. It's in-kernel and userspace code has been tried, tested, fixed, and improved upon more than almost every other Linux-compatible filesystem. It's simple, robust, and extensible. In this article I intend to explain some tips that can improve both the performance and the reliability of the filesystem. 

In the document, /dev/hdXY will be used as a generic partition. You should replace this with the actual device node for your partition, such as /dev/hdb1 for the first partition of the primary slave disk or /dev/sda2 for the second partition of your first SCSI or Serial ATA disk.

I: Using The tune2fs and e2fsck Utilities

Before we begin, we need to make sure you are comfortable with using the tune2fs utility to alter the filesystem options of an ext2 or ext3 partition. Please make sure to read the tune2fs man page:

```
$ man tune2fs 
```

It's generally a good idea to run a filesystem check using the e2fsck utility after you've completed the alterations you wish to make on your filesystem. This will verify that your filesystem is clean and fix it if needed. You should also read the manual page for the e2fsck utility if you have not yet done so:

```
$ man e2fsck 
```

 :Exclamation:  WARNING: Make sure any filesystems are cleanly unmounted before altering them with the tune2fs or e2fsck utilities! (Boot from a LiveCD such as Knoppix if you need to.) Altering or tuning a filesystem while it is mounted can cause severe corruption! You have been warned!

II: Using Directory Indexing

This feature improves file access in large directories or directories containing many files by using hashed binary trees  to store the directory information. It's perfectly safe to use, and it provides a fairly substantial improvement in most cases; so it's a good idea to enable it:

```
# tune2fs -O dir_index /dev/hdXY
```

This will only take effect with directories created on that filesystem after tune2fs is run. In order to apply this to currently existing directories, we must run the e2fsck utility to optimize and reindex the directories on the filesystem:

```
# e2fsck -D /dev/hdXY

```

 :Idea:  Note: This should work with both ext2 and ext3 filesystems. Depending on the size of your filesystem, this could take a long time. Perhaps you should go get some coffee  :Wink: 

III: Enable Full Journaling

By default, ext3 partitions mount with the 'ordered' data mode. In this mode, all data is written to the main filesystem and its metadata is committed to the journal, whose blocks are logically grouped into transactions to decrease disk I/O. This tends to be a good default for most people. However, I've found a method that increases both reliability and performance (in some situations): journaling everything, including the file data itself (known as 'journal' data mode). Normally, one would think that journaling all data would decrease performance, because the data is written to disk twice: once to the journal then later committed to the main filesystem, but this does not seem to be the case. I've enabled it on all nine of my partitions and have only seen a minor performance loss in deleting large files. In fact, doing this can actually improve performance on a filesystem where much reading and writing is to be done simultaneously. See this article written by Daniel Robbins on IBM's website for more information: 

http://www-106.ibm.com/developerworks/linux/library/l-fs8.html#4

In fact, putting /usr/portage on its own ext3 partition with journal data mode seems to have decreased the time it takes to run  emerge --sync significantly. I've also seen slight improvements in compile time. 

There are two different ways to activate journal data mode. The first is by adding data=journal as a mount option in /etc/fstab. If you do it this way and want your root filesystem to also use it, you should also pass rootflags=data=journal as a kernel parameter in your bootloader's configuration. In the second method, you will use tune2fs to modify the default mount options in the filesystem's superblock:

```
# tune2fs -O has_journal -o journal_data /dev/hdXY
```

Please note that the second method may not work for older kernels. Especially Linux 2.4.20 and below will likely disregard the default mount options on the superblock. If you're feeling adventurous you may also want to tweak the journal size. (I've left the journal size at the default.) A larger journal may give you better performance (at the cost of more disk space and longer recovery times). Please be sure to read the relevant section of the tune2fs manual before doing so:

```
# tune2fs -J size=$SIZE /dev/hdXY
```

IV: Disable Lengthy Boot-Time Checks

 :Exclamation:  WARNING: Only do this on a journalling filesystem such as ext3. This may or may not work on other journalling filesystems such as ReiserFS or XFS, but has not been tested. Doing so may damage or otherwise corrupt other filesystems. You do this AT YOUR OWN RISK. 

Hmm..It seems that our ext3 filesystems are still being checked every 30 mounts or so. This is a good default for many because it helps prevent filesystem corruption when you have hardware issues, such as bad IDE/SATA/SCSI cabling, power supply failures, etc. One of the driving forces for creating journalling filesystems was that the filesystem could easily be returned to a consistent state by recovering and replaying the needed journalled transactions. Therefore, we can safely disable these mount-count- and time-dependent checks if we are certain the filesystem will be quickly checked to recover the journal if needed to restore filesystem and data consistency. Before you do this please make sure your filesystem entry in /etc/fstab has a positive integer in its 6th field (pass) so that it is checked at boot time automatically. You may do so using the following command:

```
# tune2fs -c 0 -i 0 /dev/hdXY
```

V: Checking The Filesystem Options Using tune2fs

Well, now that we've tweaked our filesystem, we want to make sure those tweaks are applied, right?  :Smile:  Surprisingly, we can do this options iusing the tune2fs utility quite easily. To list all the contents of the filesystem's superblock, we can pass the "-l" (lowercase "L") option to tune2fs:

```
# tune2fs -l /dev/hdXY
```

Unlike the other tune2fs calls, this can be run on a mounted filesystem without harm, since it doesn't access or attempt to change the filesystem at such a low level.

This will give you a lot of information about the filesystem, including the block/inode information, as well as the filesystem features and default mount options, which we are looking for. If all goes well, the relevant part of the output should include "dir_index" and "has_journal" flags in the Filesystem features listing, and should show a default mount option of "journal_data".

This concludes this filesystem tweaking guide for now. Happy hacking!  :Very Happy: 

----------

## i92guboj

Nice thing!

Just one thing: woundn't it be:

```

tune2fs -O dir_index,has_journal /dev/hdXY

tune2fs -o journal_data /dev/hdXY

```

in place of:

```

tune2fs -O dir_index /dev/hdXY

tune2fs -o has_journal,journal_data /dev/hdXY

```

 :Question: 

Editer for clarification: notice that 'has_journal' parameter is valid for '-O' (upper case 'o') modifier, not for '-o' (lower case 'o').

----------

## codergeek42

Gah! I can't believe I didn't see that.  :Embarassed:   Thanks 6thpink...

----------

## jetsaredim

is there a way to list the options for a particular ext3 fs options that have been set?

I'm not sure if things are working correctly...  I formatted the FS with mke2fs -j /dev/vg/ext3, but when I went to mount it it seems to be using the ext2 kernel module...

----------

## saffy

Thanks!

You are absolutely right about ext3 and full journal mode. I have several systems running a combination of XFS and ReiserFS and my main workstation has used both for various periods. Recently I backed up my workstation and switched to ext3 with full journal and have to report that the performance is nothing short of amazing! I also don't 'notice' disk access as much as I used to with XFS or ReiserFS, ext3 appears to be much smoother with disk access not interrupting normal workstation activity during heavy usage.

I would also recommend adding orlov and commit=9999 to you mount options.

As you can tell I am now a big fan of ext3 full journal mode  :Cool: 

----------

## i92guboj

 *codergeek42 wrote:*   

> Gah! I can't believe I didn't see that.   Thanks 6thpink...

 

Thanks to u, I just got an error and looked into man page  :Laughing:  Now my super-ext3-home reads smooth. The rest of my fs's are reiserfs, I think I'm gonna try, since I really prefer a hard-stable system, and ext3 if for me the best fs all over the world when it comes to reliability.  :Cool: 

----------

## jetsaredim

So, if I were looking for the best performance, I would run the commands in the original post?

I'm trying to do some performance testing for some research I'm working on...  Its not easy trying to find good recommendations of performance tuning options for any of the filesystems (ext3, reiserfs, etc...)

----------

## i92guboj

 *jetsaredim wrote:*   

> So, if I were looking for the best performance, I would run the commands in the original post?
> 
> I'm trying to do some performance testing for some research I'm working on...  Its not easy trying to find good recommendations of performance tuning options for any of the filesystems (ext3, reiserfs, etc...)

 

Yes, codergeek corrected it.

----------

## codergeek42

 *jetsaredim wrote:*   

> is there a way to list the options for a particular ext3 fs options that have been set?

 `tune2fs -l /dev/hdXY` will list the current contents of the filesystem's superblock.

----------

## acasto

 *codergeek42 wrote:*   

> 
> 
> There are two different ways to activate journal data mode. The first is by adding journal=data as a mount option in /etc/fstab. If you do it this way and want your root filesystem to also use it, you should also pass rootfsflags=data=journal as a kernel parameter in your bootloader's configuration. 

 

Shouldn't that be data=journal for the option to /etc/fstab? Or does the order not matter?

- Adam

----------

## codergeek42

 *acasto wrote:*   

> 
> 
> Shouldn't that be data=journal for the option to /etc/fstab? Or does the order not matter?
> 
> 

 Your kernel has to mount your root filesystem before it can read the /etc/fstab file that it contains, so only putting it in there would effect everything _except_ your root partition  :Wink:  . It's kind of like the chicken-and-egg problem...

----------

## acasto

 *codergeek42 wrote:*   

> Your kernel has to mount your root filesystem before it can read the /etc/fstab file that it contains, so only putting it in there would effect everything _except_ your root partition  . It's kind of like the chicken-and-egg problem...

 

I thought that when it boots up and initially mounts the root filesystem as read-only, that it then looks at /etc/fstab and applies it upon remount to read-write. I just ask becaue I've been using data=journal in my /etc/fstab on the root filesystem for about the last year and it shows up as being mounted properly with that option.

- Adam

----------

## codergeek42

Hm.. perhaps since Gentoo's initscripts do remount the root filesystem that does apply. Good point, Adam. Thanks.  :Smile: 

----------

## OneOfMany

I think the above post was pointing out that the above instructions shouldn't be "journal=data", but "data=journal" (for "/etc/fstab").

----------

## codergeek42

 *OneOfMany wrote:*   

> I think the above post was pointing out that the above instructions shouldn't be "journal=data", but "data=journal" (for "/etc/fstab").

  :Laughing:  I'm stupid. I didn't even notice that.  :Embarassed:  Thanks OneOfMany

----------

## barrct

I _should_ be running all of my drives and partitions in ext3, but I seem to end up with corruptions regularly. Is there a way to view the journal, or see if it is functioning?

----------

## codergeek42

 *barrct wrote:*   

> I _should_ be running all of my drives and partitions in ext3, but I seem to end up with corruptions regularly. Is there a way to view the journal, or see if it is functioning?

 Malfunctioning? That's definitely something I don't expect ext3 to do. Perhaps your hardware is going dead? Do other filesystems (such as ReiserFS, XFS, or *shudder* FAT32) work ok on the same disk(s)?

----------

## irondog

Is there any way to do defragmentation on ext3? My experience is that havily used ext3 partitions become slower and slower while the amount of files in the filesystem and used diskspace don't significally change.

----------

## codergeek42

ext3 automagically handles defragmentation as you use it. The best thing I can think of is to try re-optimizing the filesystem structure by running the following:

```
# e2fsck -D /dev/hdXY
```

(/dev/hdXY should be unmounted as explained in the first post in order to avoid filesystem corruption.)

----------

## barrct

 *codergeek42 wrote:*   

>  *barrct wrote:*   I _should_ be running all of my drives and partitions in ext3, but I seem to end up with corruptions regularly. Is there a way to view the journal, or see if it is functioning? Malfunctioning? That's definitely something I don't expect ext3 to do. Perhaps your hardware is going dead? Do other filesystems (such as ReiserFS, XFS, or *shudder* FAT32) work ok on the same disk(s)?

 

The hardware has seemed to be fine, I haven't used it much, well at all really in anything other than our main server since it required an HVD SCSI connection that isn't exactly a common thing.

As for the hardware going bad, well it should be, it seems to happen on every drive, and every drive is mirrored to another array. The setup in like this.

Server

||--- Array 1 A B C D D D D

|--- Array 2 A B C D D D D

Each array is on a different controller and different cables, and the drives are mirrored across the two arrays, PLUS the 4D drives are in a RAID 5 that is mirrored.

So if there is something going bad, I think that if ext3 can't catch it, then the RAID5 should and if it's the card or cable, the mirror should.

right?

So is there any way to test the journal? To view it? To view it's size?

----------

## codergeek42

That I'm not sure of sorry  :Confused: 

----------

## barrct

Yea, it's got me fairly perplexed as well. Of course, I was makeing shure that my backups are running and I saw this.

```
Apr 12 03:02:25 William kjournald starting. Commit interval 5 seconds

Apr 12 03:02:25 William EXT3 FS 2.4-0.9.19, 19 August 2002 on md(9,3), internal journal

Apr 12 03:02:25 William EXT3-fs: mounted filesystem with ordered data mode.

Apr 12 03:09:53 William kjournald starting. Commit interval 5 seconds

Apr 12 03:09:53 William EXT3-fs: mounted filesystem with ordered data mode.
```

So that leads me to think that the journal is running at least.

----------

## c0bblers

Hi,

After having used XFS for a LONG time I decided to switch totally to ext3 after playing with ubuntu on ext3.  Here's what I noticed....disk head movement (or what sounds like it anyway).  With XFS the disk is thrashed to within an inch of it's life at times compared to ext3, this probably isn't as noticable with some disks but my Barracudas are whisper quiet when the head isn't moving.  It's particularly noticable when scrolling through a list of emails in a folder in Evolution, which for some reason causes a LOT of HD activity on XFS....ext3 is whisper quiet by comparison.  I've noticed no slow down of any note, even with the full data journalling codergeek suggests.  In fact, some apps seem to start up quicker...though that may be a placebo effect.  All in all I'm very happy with ext3 so far, plus I have the warm fuzzy feeling of full data journalling keeping my precious data together.  :Wink:   I'm officially a convert.

Cheers,

James

----------

## codergeek42

 *c0bblers wrote:*   

>  I'm officially a convert.

 Excellent.  :Cool:   :Very Happy: 

----------

## Crazor

 *saffy wrote:*   

> I would also recommend adding orlov and commit=9999 to you mount options.

 

when does one benefit from using orlov?

and what exactly does commit=9999 mean?

----------

## codergeek42

You do not need to use orlov as a mount option, since, according to the mount(8) man page, it is the default if neither oldalloc nor orlov is specified. This option would tell ext2/ext3 whether to use the old inode allocator or the new Orlov inode allocator.

I also highly recommend against using commit=9999. This mount option specifies how often (in second intervals) to sync the data to disk. Setting this too high may cause excessive usage of memory and possibly CPU/swap resources. This really is not needed (and from my experience) will not give you a large performance increase at all.

Edit: Disabled smiley.

----------

## i92guboj

 *codergeek42 wrote:*   

> You do not need to use orlov as a mount option, since, according to the mount( man page, it is the default if neither oldalloc nor orlov is specified. This option would tell ext2/ext3 whether to use the old inode allocator or the new Orlov inode allocator.
> 
> I also highly recommend against using commit=9999. This mount option specifies how often (in second intervals) to sync the data to disk. Setting this too high may cause excessive usage of memory and possibly CPU/swap resources. This really is not needed (and from my experience) will not give you a large performance increase at all.

 I agree and also add that a so high commit raises the probability of data loss, and we dont want that (if we wanted we would be using reiser4  :Twisted Evil:  )

----------

## Boris27

The doc by drobbins says its rootflags=data=journal instead of rootfsflags. I was kinda wondering why it wasn't working  :Wink: 

PS: I'm on dir_index now  :Smile:  Nice stuff

----------

## codergeek42

Oops.  :Embarassed:  Thanks Boris27.  :Cool: 

----------

## monkey89

After each of the filesystems I'm mounting is listed in the boot scripts, Gentoo always says (check at next mount).  Ubuntu isn't doing this and its dual-booting with the same partitions.  Is this fixable?

I've done everything as said in the howto.  Otherwise, thanks for the tips, hopefully it will speed things up.

Edit: Weird, but even though ubuntu and gentoo both have e2fsprogs 1.35, upgrading gentoo's to 1.37 seems to hide the warning.

-Monkey

----------

## warrior

Hi, guys.

Does anybody tried to compare tuned ext3fs perfomance with reiserfs36?

I dont' sure, that namesys tests are reality...

----------

## wing

 *warrior wrote:*   

> Hi, guys.
> 
> Does anybody tried to compare tuned ext3fs perfomance with reiserfs36?
> 
> I dont' sure, that namesys tests are reality...

 

I know this isn't what you asked, you can ignore my post. I'm just going to recap my experiences with two systems and reiser3, reiser4 and ext3.

I'm reinstalling my system with ext3 from a previous install with reiser4, I've enabled ext3's b-trees and I honestly cannot notice the difference. emerge sync goes as fast as it ever did, as do my tar extractions and whatnot. Also, it is nice to use a FS that you know that you have a good chance of getting your data from in a worst case scenario. I'm using a SATA setup here though, that might effect things for either better or worse. 

So I'll tell you my experience in the past. I've had horrible experiences with reiser3 on my x86 ATA system though, I've found that I'd install and literally three to four boots down the line my fs would be hosed. Their fsck tools are no guarantee for safety, either, they barely helped at all, at best they just chocked EVERYTHING in /lost+found. ext3 on the same system performed pretty much on the same level.

Honestly, I love that feeling you can get from being on the bleeding edge, but on a system like gentoo it's good know you're planning for the long term. Even though reiser3 has been around for a while, I think I like the contentment of stability better than than the excitement of the bleeding edge (though I can't help but be ~amd64  :Smile: ).

edit: missing words yay!

and thank you vvv  :Smile: Last edited by wing on Thu Apr 28, 2005 8:26 am; edited 2 times in total

----------

## warrior

 *wing wrote:*   

> 
> 
> I know this isn't what you asked, you can ignore my post. I'm just going to recap my experiences with two systems and reiser3, reiser4 and ext3.
> 
> 

 

Its really a goot post.

I try to deside for myself what do i want.

I dont have a SATA drive. I'm using notebook. Originally it has 4200rpm/8192Kb cache drive. It's slow one, much more slower than SATA, you know.. I've found another  drive 5400rpm/16384Kb just for performance increasing. I also have a bad expirience with reiserfs, but it was few years in past.. May be current versions are more stable. And it will vexing situation when i'll upgrade my hardware but will use slow fs.

I understand, it's not a question, just speak out...

May be somebody was in the same situation and tell his experience..

----------

## ruben

I have an iBook with a 30Gb 4200rpm/2Mb drive, which is pretty slow. I had it running with the /-partition on Reiserfs and the /home-partition on ext3, but over time i just got the impression that things were just getting slower and slower on my laptop. And then mainly starting up applications. From gdm to a fully working gnome desktop took 45 seconds, while the cpu didn't seem busy at all during those 45 secs. Then i read this thread and also the optimisation to use B-tree for the directories and i decided to backup all my data, reformat and repartition the harddrive and copy everything back. I already suspected that my /-partition must have suffered a lot from fragmentation... the /-partition has always been reasonably filled, and i also think that a gentoo system is especially 'hard' on the filesystem. I run a "~ppc" on the iBook, and there is a frequent installing of new programs and removing of old programs, also a lot of creating temporary files during the compiles, i think all those operations are causing fragmentation on your partition. So i made 3 partitions, one for /home, one for /, and one partition that is used to store all ebuilds and as compile space for portage. I want my data in /home to be as secure as possible and i used ext3 for that. For the / is just took plain old ext2 with the B-tree optimisation stuff. And for the third partition, i also took ext2 and i made sure that the block size was small, since this partition will contain a lot of tiny files. I took ext2 for those partitions since it is faster than ext3, and i don't care that much if i'd lose something from those partitions (besides all data on the iBook is also put on backup).

Anyways, the important thing here is that defragmentation can deteriorate your performance, and a good way to 'defragment' is to backup the whole partition (make a tar.gz), reformat and copy your data back: all data which probably belongs together (in the same directory) will be copied right next to each other on the harddisk, and this can speed up application startup. My gdm to fully working gnome desktop went from 45 seconds (on reiserfs) to 20 seconds (on ext2). You can read in several places that ext2/3 does not need defragmenting... well... i'm still suspicious about that (a lot of emerging/unmerging on a nearly full partition just can't be good for the data layout on your harddisk), but i hope it'll do better than reiserfs in the long term. In any case, i also don't have the feeling that this ext2/3 is slower than reiserfs (like it was initially).

----------

## acdispatcher

codergeek42-

Thanks for the tip.  I went from reiserfs to ext3 with no problems.  No difference in speed either.

I ended up creating a new partition "rsync"ing it over to the new ext3 partition

modify fstab and stuff reboot and zoooom!!!

.

----------

## codergeek42

 *ruben wrote:*   

> You can read in several places that ext2/3 does not need defragmenting... well... i'm still suspicious about that (a lot of emerging/unmerging on a nearly full partition just can't be good for the data layout on your harddisk),

 You don't need to defragment  ext2/ext3 because as you use the filesystem file blocks and inodes are moved around and reallocated to keep the data nearly contiguous. It's not perfect, but it works fairly well and you should almost never see a performance degradation caused by the filesystem's fragmentation.

----------

## hbp4c

 *codergeek42 wrote:*   

>  *ruben wrote:*   You can read in several places that ext2/3 does not need defragmenting... well... i'm still suspicious about that (a lot of emerging/unmerging on a nearly full partition just can't be good for the data layout on your harddisk), You don't need to defragment  ext2/ext3 because as you use the filesystem file blocks and inodes are moved around and reallocated to keep the data nearly contiguous. It's not perfect, but it works fairly well and you should almost never see a performance degradation caused by the filesystem's fragmentation.

 

If the filesystem gets very full (lets say,<5% free space available), fragmentation goes to hell as one would expect.  I fsck'ed a 99% full ext3 disk yesterday, and had all kinds of fragmented files and problems.

----------

## codergeek42

 *hbp4c wrote:*   

> If the filesystem gets very full (lets say,<5% free space available), fragmentation goes to hell as one would expect.  I fsck'ed a 99% full ext3 disk yesterday, and had all kinds of fragmented files and problems.

 That's a good point (and quite true, I might add). This is also why I recommend creating filesystems slightly larger than the size you'll expect to need. Thanks for bringing this to my attention.  :Smile: 

----------

## prymitive

What about space usage? reiser 3.6 is very good at this and reiser4 is even little better (well it pack's data more on disk but reserves 5% of space at now for safety of commits so acually it will probably waste more space then save  :Wink: , I hope they will change that 5% to twice the ram size or whatever). I remember that I tested ext3 vs reiser3.6/4 vs xfs and using default settings ext3 had the lowest amount of free space (I used mostly small files like kernel sources and mp3's), xfs was a little better and the best one was reiser3.6 (as I said reiser4 had little lower amount of used space but formatted partition size was smaller due to that 5% he reserves). If I remember correctly the values were:

r3.6 - about 2GB

xfs - about 2,5GB

ext3 - about 2.6GB

this was on 10GB partition, all using default settings.

btw. reiser4 is realy fast for me and it made my hard drive much quiter, unfortunetly it's performance degradates with time and You need to use repacker (which is not ready yet) to resore it.

----------

## ndbruin

 :Exclamation:  WARNING: Make sure any filesystems are unmounted before altering them with the tune2fs or e2fsck utilities! (Boot from a LiveCD if you need to.) Altering or tuning a filesystem while it is mounted can cause severe corruption! You have been warned!

I just tried this out and you will get corruption, so be warned! (had to see it happen  :Wink:  )

Furthermore I am now also convinced of using ext3 instead of reiserfs, so hope to see some more performance options  :Very Happy: 

Thanks for these tweaks

----------

## codergeek42

@prymitive: You can adjust your block size and inode count when you initially create the filesystem. Setting it to use 1KB block sizes and 1 inode per 1KB should give you the most effecient space usage:

```
# /sbin/mkfs.ext3 -b 1024 -i 1024 /dev/hXY
```

Be warned though that decreased the block size and increasing the inode allocation in this manner can cause a significant performance decrease if the filesystem is store larger files as well (since it has to do more journalling, I/O, and resource allocation).

@ndbruin: I warned you about that, silly!  :Razz:  Glad to have another Ext3 fan though.  :Smile: 

----------

## prymitive

 *codergeek42 wrote:*   

> @prymitive: You can adjust your block size and inode count when you initially create the filesystem. Setting it to use 1KB block sizes and 1 inode per 1KB should give you the most effecient space usage:
> 
> ```
> # /sbin/mkfs.ext3 -b 1024 -i 1024 /dev/hXY
> ```
> ...

 

I know that You can change block size but as You said it costs us speed, reiserfs have 4kb blocks but it can pack few files into one cluster so theye are kept inside directory tree, ntfs can also do this but only for files that are 650B or smaller. So under reiserfs big files are being read with full speed while small files can use only needed space without low block size (at least in theory).

----------

## hbp4c

 *prymitive wrote:*   

> 
> 
> I know that You can change block size but as You said it costs us speed, reiserfs have 4kb blocks but it can pack few files into one cluster so theye are kept inside directory tree, ntfs can also do this but only for files that are 650B or smaller. So under reiserfs big files are being read with full speed while small files can use only needed space without low block size (at least in theory).

 

A lot of people who use reiserfs use the notail option, which doesn't pack the small files into a shared block of data.  This is known to increase performance noticably, at the cost of space.

In order to compare oranges with oranges, you should compare sizes of files on a reiserfs filesystem with notail option, without notail option, and compare that to ext3.  If you reference that against speed on all three filesystems, you'd be dangerously close to getting valid and useful results.

----------

## regeya

Wow, y'all just discovered dir_index?  I had suggested it a few times before I went Ubuntu, and the response I usually got was "um, why don't you just use reiser?"

Um, because I like my data and loathe restoring from backups?  :Wink: 

----------

## syrrus

ext3 supports POSIX ACLs which is still fairly unused in the linux world. Check out these links

http://gentoo-wiki.com/HOWTO_Use_filesystem_ACLs

http://www.suse.de/~agruen/acl/linux-acls/online/

http://www.suse.de/~agruen/ea-acl-copy/

http://security.linux.com/article.pl?sid=04/07/28/1746258&tid=23&tid=35

Also this series is a good compilation of information.

http://www-106.ibm.com/developerworks/library/l-fs.html

http://www.linuxplanet.com/linuxplanet/reports/4136/2/

----------

## jdgill0

The ext3 filesystem can make files immutable, where not even root can delete the file when the immutable bit is set.  I have not run across any mention of this in the forums myself.  It seems this could be a nice security feature for such things as config files.

Does anyone here have an opinion on immutable files?  I posted this here, because it's my understanding that only ext2,3 have immutable file support.

----------

## syrrus

No ReiserFS has support aswell.

----------

## jdgill0

syrrus,

What command do you use to set the immutable attribute under reiserfs?  Man pages for chattr and lsattr indicate only functioning with ext2,3.

----------

## syrrus

```
hera etc # grep /dev/hdc3 /etc/fstab

/dev/hdc3               /               reiserfs                noatime,notail,acl                              0 0

hera etc # chattr +i /etc/shadow

hera etc # lsattr /etc/shadow

----i-------- /etc/shadow

hera etc #
```

----------

## syrrus

Another little tidbit that needs to be discussed when talking about these immutable bits is the following.

Now I can simply chattr -i /etc/shadow anytime I want as root and it'll be like it never happened.

However with seclvl (A linux implementation of BSD Secure Levels) the behavior mimics that of

BSD. So when using these attributes remember to echo "2" > /sys/seclvl/seclvl if you have this support

built into your kernel.

The hardened kernel series I know supports this and is always a great idea to use in any secure server

implementation.

----------

## jdgill0

syrrus,

It seems I am misunderstanding a few things with immutable.  Adding acl to my fstab for my /home partition does let root chattr +i somefile, but only root can do this.  Also, I was under the impression with ext2,3 that once the immutable bit was set, the file could not be deleted, i.e. could not rm somefile, until the immutable bit was unset.  I was able to remove the file with the immutable bit set in my reiserfs file.  I thought that was the point of immutable, to not be able to remove the file with that bit set -- only reading or appending the file was allowed. Lastly, I did not think the immutable with ext2,3 was an ACL thing, that it was built into the ext2,3 filesystem?

[EDIT]

I have done some more playing with chattr and lsattr. Under the reiserfs filesytem it "appears" I can set the various bits, however it also appears they hold no meaning, as I can still do whatever I want to the files as root or user that I want regardless of what bits have been set. Unfortunately I do not have an ext2,3 filesystem to play with.

----------

## syrrus

Ok, acl and immutable are completely different. ACL is access control list, thats just a major enhancement to rwx.

The file system flags mantained by chattr are just like the BSD ones that interact with the Secure level. Now if you

chattr +i the file is immutable but just a simple chattr -i can just make that entire concept null. Using th BSD secure

level implementation for linux accually enforces the rules you set by the file.

----------

## jdgill0

syrrus,

See my [EDIT] that I added to my last post -- guess I was too slow  :Sad:  ... anyways, as I said in the edit, it seems to me the chattr bits do nothing under reiserfs from what I can see. Setting chattr +i somefile or even chattr +u somefile (for undeleteable) does not change what I can or can not do to the files, although lsattr somefile clearly shows the bits set.

[EDIT]

I was able to setup an ext3 partition.  Under ext3 the immutable bit works as I was expecting it too -- i.e. the file can not be deleted until the immutable bit is unset. Which now brings me back to my original point --> as an example, setting the immutable bit on config files that you modify would help keep you from overwriting them with an etc-update by accident.  However, I am still not sure why non-root users should not be allowed to use chattr, as it seems it could be useful to normal users to keep from deleting important files of their own.  All the useful things I have seen with ext3, and the compatibility of programs to modify/recover from ext3 is certainly moving me a lot closer to switching from reiserfs to ext3.

[EDIT 2]

(I add this bit just to clear up the confusion on chattr/lsattr with reiserfs)

You can use the BSD levels (i.e. chattr and lsattr) with reiserfs.  To do so, you must use the attrs mount option.

I originally posted about chattr/lsattr for ext3 thinking it might be of interest to those who are looking to use ext3, but I wasn't sure if it fit in with codergeek's ext3 howto in this thread.

----------

## syrrus

Most likely the bit's are not obeyed until the BSD secure level is elevated. I will experiment on one of my newer installs and see exactally whats happening.

[This thread isn't exactally the best place to continue this conversation. Feel free to PM me so we can do some more research]

----------

## tcostigl

I assume this would also apply to a software raid partition(md raid) on /dev/md* formatted with ext3.

----------

## Juzna

I just wonder if I can change to full journal without any downtime on my server, with -o remount handle. Will this break my fs or can I do it?

----------

## codergeek42

 *tcostigl wrote:*   

> I assume this would also apply to a software raid partition(md raid) on /dev/md* formatted with ext3.

 I've no experience with that but since the tune2fs/e2fsck tools operate on the filesystem itself, it should work just fine on RAID disks or other media which use the ext2 or ext3 filesystem.  *juzna wrote:*   

> I just wonder if I can change to full journal without any downtime on my server, with -o remount handle. Will this break my fs or can I do it?

 I just tried it, and my kernel gave me an error saying I couldn't do that: *Quote:*   

> EXT3-fs: cannot change data mode on remount

 This was trying to remount my /usr/portage with 'data=ordered'. Unmounting it, then mounting it with 'data=ordered' works just fine though, so what you're trying to do does not seem possible.

----------

## fallow

good to see this how-to here  :Smile:  maybe more and more users will see the goodies in lower cpu usage and better interactivity of rest of the system in thanks to it  :Wink:  [ in comparision to reiser* ] .

I personally using dir_index feauture and writeback mode. Yeah I like less journalling , full journalling for example gives me too higher cpu usage with filiesystems RW operations. 

cheers & thanks & greetings  :Smile: 

----------

## darklegion

 *codergeek42 wrote:*   

> @prymitive: You can adjust your block size and inode count when you initially create the filesystem. Setting it to use 1KB block sizes and 1 inode per 1KB should give you the most effecient space usage:
> 
> ```
> # /sbin/mkfs.ext3 -b 1024 -i 1024 /dev/hXY
> ```
> ...

 

I tried this out and the results were not at all promising.Listed here is the freespace before and after changing the block size/inode count:

```

Before: 

/dev/hdc               74G   33M   74G   1% /nxbox

After:

/dev/hdc               66G  8.1M   62G   1% /nxbox

```

Well at least the journal is skightly smaller *laughs*.I'm guessing that larger drives are designed to work with larger blocksizes although I don't know if you can call an 80gb drive *large* anymore.

----------

## darklegion

 *codergeek42 wrote:*   

> @prymitive: You can adjust your block size and inode count when you initially create the filesystem. Setting it to use 1KB block sizes and 1 inode per 1KB should give you the most effecient space usage:
> 
> ```
> # /sbin/mkfs.ext3 -b 1024 -i 1024 /dev/hXY
> ```
> ...

 

I tried this out and the results were not at all promising.Listed here is the freespace before and after changing the block size/inode count:

```

Before: 

/dev/hdc               74G   33M   74G   1% /nxbox

After:

/dev/hdc               66G  8.1M   62G   1% /nxbox

```

Well at least the journal is slightly smaller *laughs*.I'm guessing that larger drives are designed to work with larger blocksizes although I don't know if you can call an 80gb drive *large* anymore.

EDIT: I forgot to enable -m0 to get rid of the superuser-reserving-space buillshit,which gave me 66gb but that is still significantly smaller.

----------

## torchZ06

 *codergeek42 wrote:*   

> 
> 
> There are two different ways to activate journal data mode. The first is by adding data=journal as a mount option in /etc/fstab. If you do it this way and want your root filesystem to also use it, you should also pass rootflags=data=journal as a kernel parameter in your bootloader's configuration. In the second method, you will use tune2fs to modify the default mount options in the filesystem's superblock:

 

am i correct in interpreting this statement as meaning if you're running a new kernel and set the flags in the superblock that you DON'T have to add anything to /etc/fstab or your grub.conf in order to take advantage of journal data mode?

is there some way to see what mount options are being used-- not reading the superblock with tune2fs, but rather to actually see what mount is using?

----------

## codergeek42

 *torchZ06 wrote:*   

> am i correct in interpreting this statement as meaning if you're running a new kernel and set the flags in the superblock that you DON'T have to add anything to /etc/fstab or your grub.conf in order to take advantage of journal data mode?

 That's correct. The flags in the superblock are default mount flags. Unless you specify otherwise (via command-line options or /etc/fstab), those options will always be used when mounting. *Quote:*   

> 0is there some way to see what mount options are being used-- not reading the superblock with tune2fs, but rather to actually see what mount is using?

 You can check your kernel log with `dmesg` and you should see sometrhing similar to the following for each Ext3 partition:

```
$ dmesg

[...]

EXT3 FS on hda3, internal journal

EXT3-fs: mounted filesystem with journal data mode. 

EXT3 FS on hda5, internal journal

EXT3-fs: mounted filesystem with journal data mode. 

[...etc...]
```

Hth!

----------

## mauricev

 *codergeek42 wrote:*   

> You don't need to defragment  ext2/ext3 because as you use the filesystem file blocks and inodes are moved around and reallocated to keep the data nearly contiguous. It's not perfect, but it works fairly well and you should almost never see a performance degradation caused by the filesystem's fragmentation.

 

I posed this statement to the ext3 mailing and asked if it were true. One of the ext3 developers, Theodore Ts'o, responds...

No, not true.  (At least not today)

Ext2/3 has advanced algorithms to make sure that the blocks that are allocated avoid fragmentation, but it is not doing any kind of dynamic moving of blocks/inodes.  

(At least, not yet; there has been some talk about creating enough kernel hooks so that a user-space program could do dynamic defragmentation of the filesystem, but none of this exists at the moment.)

						- Ted

----------

## codergeek42

Hmmm I thought Ext3 did dynamic reallocation like that but I guess not. It still kicks butt as an excellent FS anyhoo.  :Wink:  Thanks, mauricev.  :Embarassed: 

EDIT: Link to message: https://www.redhat.com/archives/ext3-users/2005-June/msg00026.html

----------

## blaster999

Hi codergeek42, great guide! I think I'm gonna dump my reiser3.6 partition and switch to ext3. One question (a little OT): is there a FS which actually does full dynamic relocation?

----------

## codergeek42

blaster999,

I think Reiser4 is supposed to use dancing trees to achieve this. I'm not too sure, however, if that's for the file data or purely the metadata or what it's for, as I've not read too much about it.

----------

## m0p

I'll try these tweaks tomorrow cause I'm going to bed soon. Another ext3 fan here too  :Smile: 

----------

## fbvortex

Has anyone here successfully used the rootflags=data=journal kernel command-line parameter for their root partitions?  If so, after boot, what does a listing of the "mount" command show for / ?  On my setup, after having done that, a mount listing does not show data=ordered in the mount listing for / .

The kernel message log does show that rootflags=data=journal is getting passed in.

How can I tell if / is correctly being mounted with data=journal when it goes rw?

----------

## codergeek42

 *fbvortex wrote:*   

> How can I tell if / is correctly being mounted with data=journal when it goes rw?

 You should see something like the following in your kernel log: *Quote:*   

> EXT3-fs: mounted filesystem with journal data mode.
> 
> VFS: Mounted root (ext3 filesystem) readonly.
> 
> EXT3 FS on hda3, internal journal

 When it remounts it as read/write no messages seem to appear in my kernel log. I don't use the "rootflags=data=journal" method though, since I use tune2fs to set the default mount option in my filesystems' superblocks.

----------

## fbvortex

codergeek42,

Can you tell me what the output of 'mount' (without any options) is for your / filesystem?  I'd like to see if the data=journal is supposed to show up there.

----------

## codergeek42

 *fbvortex wrote:*   

> codergeek42,
> 
> Can you tell me what the output of 'mount' (without any options) is for your / filesystem?  I'd like to see if the data=journal is supposed to show up there.

 Sure.:

```
$ mount | grep hda3

/dev/hda3 on / type ext3 (rw)
```

And the superblock information:

```
tune2fs 1.37 (21-Mar-2005)

[...]

Filesystem magic number:  0xEF53

Filesystem revision #:    1 (dynamic)

Filesystem features:      has_journal dir_index filetype needs_recovery sparse_super

Default mount options:    journal_data user_xattr

Filesystem state:         clean

Errors behavior:          Continue

Filesystem OS type:       Linux

[...]
```

----------

## chevelle

dig itLast edited by chevelle on Thu Jul 14, 2005 5:45 am; edited 1 time in total

----------

## codergeek42

Err...you have backups I hope?  :Confused:  I don't know if those errors are fixable...

----------

## alari

Is the information in the first post up-to-date ?

I'm gonna do a reinstall on my syste, i have used reiser4 for about 18months or so and it's damn slow, (i dont have time to mess with the repacker, heard to be seen in reiser4.1)

Can i use the tips in the first post, after i format my partitions with ext3 during the install ?

----------

## codergeek42

 *alari wrote:*   

> Is the information in the first post up-to-date ?
> 
> I'm gonna do a reinstall on my syste, i have used reiser4 for about 18months or so and it's damn slow, (i dont have time to mess with the repacker, heard to be seen in reiser4.1)
> 
> Can i use the tips in the first post, after i format my partitions with ext3 during the install ?

 Yes, the information is up-to-date to my knowledge. I'm using kernel 2.6.12-gentoo-r6 with e2fsprogs version 1.38.

----------

## Francis85

One of my computers here is a PowerMac 7300 with a 1GHz G3 cpu upgrade card, and 1 GB of ram.

That machine has a 50mhz bus speed, and by using the journal_data mount option on my ext3 filesystem, the machine would really choke.

```

kernel BUG in expand at mm/page_alloc.c:413! 

Oops: Exception in kernel mode, sig: 5 [#1] 

PREEMPT 

NIP: C0043E28 LR: C0043E24 SP: E5C9DC10 REGS: e5c9db60 TRAP: 0700 Not tainted 

MSR: 00021032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 

TASK = e727b0b0[7523] 'ftp' THREAD: e5c9c000 

Last syscall: 4 

GPR00: 00000001 E5C9DC10 E727B0B0 00000001 C0C8F040 C0368EBC C0368E70 C0C8F018 

GPR08: 00000080 C0368F1B C0C8F080 C0368F1C 24022424 1004F070 E603F3A8 00000001

GPR16: E5C9DCC0 C036C4FC E74919A0 000003C8 000001E0 0000001F 0000001E 00001032 

GPR24: 000000D2 00000000 00000000 C0368E1C C0C8F000 00000001 00000002 C0368E74 

NIP [c0043e28] __rmqueue+0xc8/0x140 

LR [c0043e24] __rmqueue+0xc4/0x140 

Call trace: 

[c00444ec] buffered_rmqueue+0x26c/0x2e8 

[c0044888] __alloc_pages+0x278/0x410 

[c00411b0] generic_file_buffered_write+0x124/0x648 

[c0041ce0] __generic_file_aio_write_nolock+0x2f0/0x588 

[c0041ff0] generic_file_aio_write+0x78/0x18c 

[c00e86bc] ext3_file_write+0x20/0xe4 

[c00620f0] do_sync_write+0x9c/0x104 

[c00622dc] vfs_write+0x184/0x1ac 

[c00623e0] sys_write+0x4c/0x90 

[c0004380] ret_from_syscall+0x0/0x44 

note: ftp[7523] exited with preempt_count 2 

scheduling while atomic: ftp/0x10000002/7523 

Call trace: 

[c02f0154] schedule+0x74c/0x77c 

[c02f0c68] cond_resched+0x48/0x64 

[c004f7a4] unmap_vmas+0x5c8/0x5d4 

[c0054bc8] exit_mmap+0x70/0x158 

[c0016cf0] mmput+0x54/0x100 

[c001b654] exit_mm+0xc8/0x1e4 

[c001bb58] do_exit+0xec/0xbf0 

[c0004e98] _exception+0x0/0xa8 

[c0004f08] _exception+0x70/0xa8 

[c0004a88] ret_from_except_full+0x0/0x4c 

[c0043e28] __rmqueue+0xc8/0x140 

[c00444ec] buffered_rmqueue+0x26c/0x2e8 

[c0044888] __alloc_pages+0x278/0x410 

[c00411b0] generic_file_buffered_write+0x124/0x648 

[c0041ce0] __generic_file_aio_write_nolock+0x2f0/0x588 

Bad page state at prep_new_page (in process 'kjournald', page c0c8f080) 

flags:0x000122d1 mapping:00000000 mapcount:65536 count:-1013692927 

Backtrace: 

Call trace: 

[c0043980] bad_page+0x64/0xbc 

[c00444ac] buffered_rmqueue+0x22c/0x2e8 

[c0044888] __alloc_pages+0x278/0x410 

[c00481e4] cache_alloc_refill+0x318/0x598 

[c0047cdc] kmem_cache_alloc+0x64/0x68 

[c0043104] mempool_alloc_slab+0x1c/0x2c 

[c0042f18] mempool_alloc+0x5c/0x140 

[c0068498] bio_alloc_bioset+0x28/0x1ac 

[c00658f4] submit_bh+0xb4/0x194 

[c00fefd8] journal_commit_transaction+0x6f8/0x1498 

[c01023b8] kjournald+0xdc/0x284 

[c0006ea4] kernel_thread+0x44/0x60 

Trying to fix it up, but a reboot is needed 

    <any write access fails until I reboot the machine, and freezes the active console>

```

I would get these errors by initiating a FTP transfer on my LAN. File transfer would start at 11 MB/s, and then drop and drop to about 6 MB/s and then die with this error. Disabling journal_data fixes this issue, although I'm back to using reiserfs. Even fsck would provoke such errors! In other words, anything which would need relatively quick access to the drive.

Now, as long as it is able to saturate a 100baseTX link, I have no issue with it, since this machine is my LAN's file server.

Maybe this could be helpful to someone..

----------

## ruben

It might not be related in any way, but i see in your post the following:

```
PREEMPT
```

Just wondering if you have enabled 'kernel preemption' in your kernel config.  You should disable it on ppc machines, it does not work and can lead to all kinds of weird errors. (Alternatively, you can enable it, but then you should also enable SMP, even if your machine is not a dual processor)

----------

## Francis85

It is not enabled in my config, I just checked.

----------

## codergeek42

This isn't really a support forum (perhaps you should post your problem in the Kernel & Hardware forum).

That said, if you're not running your FTP daemon, does mounting it with full data journalling work ok? Perhaps the ftp daemon creates a bad system call or something.  :Confused: 

----------

## Francis85

Nope.. as I said even fsck.ext3 would make the kernel go nuts! (These kernel panics would sometimes make the filesystem completely unrecoverable, unless I booted on another drive and fsck.ext3 it from there)

That said, I know this is not a support forum, but wanted to mention this here in case anyone would get such odd behavior.  :Smile: 

----------

## neuron

Anyone know if somoene has figured out what causes journal mode to be faster than writeback yet?

----------

## vipernicus

can full_journal be safely disabled?

----------

## codergeek42

 *vipernicus wrote:*   

> can full_journal be safely disabled?

 Yes. Running tune2fs again will make the filesystem use ordered journalling mode (the default):

```
# tune2fs -o journal_data_ordered /dev/hdXY
```

Hope that helps.

----------

## pv

I'm going to reinstall Gentoo and make the root partition to be ext3fs (now it is reiserfs).

I've run a small test.

First I run something like

```
mke2fs -j /dev/hda3

tune2fs -o journal_data -O dir_index /dev/hda3

mount -o noatime /dev/hda3 /mnt/tmp

time tar xjf portage-20050712.tar.bz2 -C /mnt/tmp

Real         1m4s

User          19s

System         5s

```

Then

```
mkreiserfs /dev/hda3

mount -o noatime,notail,data=journal /dev/hda3 /mnt/tmp

time tar xjf portage-20050712.tar.bz2 -C /mnt/tmp

Real          44s

User          18s

System        11s

```

I'm sorry but ext3fs doesn't seem to be better (meaning faster) in terms of real time. Unpacking a kernel I had the similar results. (Note that while mounting reiserfs partition I used notail option). Where can I (if can) found results saying that ext3 is better than reiserfs? Was I quite wrong thinking this thread states that ext3fs is both more stable and faster than reiserfs?

BTW, my frient has Fedora Core N and a system very like my one except his root partition is ext3, and his system looks MUCH faster than my Gentoo. Is the reason of that some Fedora patches increasing ext3fs performance?

----------

## codergeek42

 *pv wrote:*   

> I'm sorry but ext3fs doesn't seem to be better (meaning faster) in terms of real time. Unpacking a kernel I had the similar results. (Note that while mounting reiserfs partition I used notail option).

 ReiserFS is heavily optimized to be very fast with large quantities of small files (such as kernel sources or the portage tree). This is why you see it so much faster, even with full data journalling. *Quote:*   

>  Where can I (if can) found results saying that ext3 is better than reiserfs? 

 I'm not a believer of benchmarks, though Google would probably have some. The way I found this out is by installing a full ReiserFS-based system (same full journalling, etc.) and installing a full Ext3-based system.  (Same hardware, kernel, etc.). While Ext3 does tend to be slightly slower in this regard, it was clearly the winner for me in terms of interactivity and "feeling" fast.  *Quote:*   

> Was I quite wrong thinking this thread states that ext3fs is both more stable and faster than reiserfs?

 I'd say it's much more stable than ReiserFS, but it seems to be about on par with it for general desktop use in terms of speed, etc. *Quote:*   

> BTW, my frient has Fedora Core N and a system very like my one except his root partition is ext3, and his system looks MUCH faster than my Gentoo. Is the reason of that some Fedora patches increasing ext3fs performance?

 Fedora does not use full journalling by default, only ordered journalling (writes the metadata to the journal, then flushes the data and commits the journal while trying to do I/O in large blocks).

I don't want this thread to turn into a ReiserFS vs. Ext3 debate though (that's been discussed in other threads).

----------

## wrc1944

 *Quote:*   

> neuron wrote:
> 
> Anyone know if somoene has figured out what causes journal mode to be faster than writeback yet?

 

I'm still wondering too. Has there been any more updated info coming out about preferring the ext3 full journaling mode, as opposed to the "writeback mode?  As one now in the process of converting multiple Gentoo systems / partitions back to ext3 (tweaked with dir_index, journal_data) from reiserfs, I'd really like to know if the info on this thread, and the Robbins article about the journal mode being much faster is in the final analysis, correct.

Reading stuff like on the link below (as well as many other supposedly reliable "expert" sources) leads one to wonder what the truth really is, as most all insist that writeback offers better performance at the expense of data protection.

http://www.linuxplanet.com/linuxplanet/reports/4136/5/

One other basic question for the ext3 experts: 

Am I correct in thinking that all the tune2fs options can be run on unmounted ext3 / partitions (with a full Linux installation already existing on said partition), as long as you run e2fsck -D after tune2fs, and the result will be no data loss, or any other negative impact? 

I have several other Linux distros installed with default ext3 / partitions that I'd like to tweak up a bit.

Oh yeah- Many thanks to codergeek42 for tying this all together for me- the reiserfs fragmentation was becoming a problem.

 I guess what prompted me to go with reiserfs 3+ years ago in the first place was the abysmal default ext3 performance of my first linux distros, and reiserfs was definitely an improvement at the time, especially with the 2.4 kernels of the day.

Any more thoughts on the commit=  setting?  It seems to me syncing all it's buffers the default every 5 seconds is overkill, and must affect performance to some extent. Maybe not commit=9999, but what about a more reasonable sync interval- say every 60, 120, 512, or whatever number of seconds???

A quote from http://navindra.blogspot.com/2004/10/kde-dot-news-ext3s-miserable-failure.html

 *Quote:*   

> And the last interesting parameter is commit. commit defaults to 5 in ext3, and it means that ext3 will sync all its buffers - yes, that's a sync every 5 seconds. It harms performance a _lot_ and ext3 defaults to that value because ext3 developers are really _paranoic_ about data safety. You can mout ext3 it with huge values to increase performance
> 
> In short, if you want to do _fair_ benchmarks reiser vs ext3, you must:
> 
> o use htree
> ...

 

Again, this guy recommends data=writeback and commit=xxxx.  (this blog was from Oct 2004, so it's not too outdated)

----------

## codergeek42

 *wrc1944 wrote:*   

> Reading stuff like on the link below (as well as many other supposedly reliable "expert" sources) leads one to wonder what the truth really is, as most all insist that writeback offers better performance at the expense of data protection.

 Well, I'm personally a fan of "if it gets done much more reliably with a small sacrifice in performance, do it." I've noticed only minor slowdowns on my machine with full journalling enabled, though your experiences may vary of course. In fact, it "feels" just as fast as ordered data mode to me.  *Quote:*   

> One other basic question for the ext3 experts: 
> 
> Am I correct in thinking that all the tune2fs options can be run on unmounted ext3 / partitions (with a full Linux installation already existing on said partition), as long as you run e2fsck -D after tune2fs, and the result will be no data loss, or any other negative impact? 

 Correct. Just make sure it's unmounted before doing this. If it is mounted, I can almost guarantee you severe data loss.  *Quote:*   

> I have several other Linux distros installed with default ext3 / partitions that I'd like to tweak up a bit.

 It should work the same, as long as they all have recent kernel and e2fsprogs versions, etc. *Quote:*   

> Oh yeah- Many thanks to codergeek42 for tying this all together for me- the reiserfs fragmentation was becoming a problem.

 I'm happy to help.  :Smile:  *Quote:*   

> Any more thoughts on the commit=  setting?  It seems to me syncing all it's buffers the default every 5 seconds is overkill, and must affect performance to some extent. Maybe not commit=9999, but what about a more reasonable sync interval- say every 60, 120, 512, or whatever number of seconds???

 That's up to you to tweak as you see fit. I've left it at the default and it works fine. Remember though, it can also sacrificing performance for stability (what if the journal write is still stored in RAM on a power failure?). *Quote:*   

> Again, this guy recommends data=writeback and commit=xxxx.  (this blog was from Oct 2004, so it's not too outdated)

 It really depends on your goals and tests. Like I said, my first priority with my data is keeping it safe. Generally speaking, I can read/write that data Fast Enough(tm).  :Smile: 

----------

## wrc1944

I must be misunderstanding what codergeek42 and 6thpink are saying about commit=xxxx.

I do understand 6thpink's statement: *Quote:*   

> I agree and also add that a so high commit raises the probability of data loss, and we dont want that (if we wanted we would be using reiser4

 

 which seems to agree with the kernel ext3 documentation.

FROM KERNEL DOC *Quote:*   

> Ext3 can be told to sync all its data and metadata
> 
> every 'nrsec' seconds. The default value is 5 seconds.
> 
> This means that if you lose your power, you will lose,
> ...

 

What I don't get is codergeeks statement: *Quote:*   

> This mount option specifies how often (in second intervals) to sync the data to disk. Setting this too high may cause excessive usage of memory and possibly CPU/swap resources.

 

If the number is set higher, meaning less number of actual syncs for a given time period, how can that cause more memory and/or cpu/swap usage? 

In other words, wouldn't the longer the time interval between each sync mean less memory/cpu/swap usage for any given time period considered, because there is less activity?  Put yet another way, if you use the default commit=5 (which equals 60 syncs per 5 minutes), wouldn't commit=60 mean only 5 syncs per every 5 minutes, and cut down on a lot of background activity, translating into a little better performance? And, the higher the commit= number, the less syncs per time period, and the higher the potential for data loss on a crash. So the question seems to be where is the point where data loss potential exceeds the performance benefit of a lessor number of syncs?

If this is incorrect, where am I going wrong? I think I'm getting a headache at this point  :Rolling Eyes: 

----------

## codergeek42

 *wrc1944 wrote:*   

> What I don't get is codergeeks statement: *Quote:*   This mount option specifies how often (in second intervals) to sync the data to disk. Setting this too high may cause excessive usage of memory and possibly CPU/swap resources. 
> 
> If the number is set higher, meaning less number of actual syncs for a given time period, how can that cause more memory and/or cpu/swap usage? 

 Sorry for not being clear there. What I meant was that, since the filesystem driver has been instructed to sync to the disk less often, it will likely use more resources because having that longer sync interval means that the data that still has not been written to the disk yet must be stored in memory. This could lead to more swap usage (if you don't have a lot of memory and set the commit=whatever option too high). I don't actually remember why I thought it would increase CPU usage noticably.  :Embarassed:  If and/or when I do, I'll post that do.  *Quote:*   

> So the question seems to be where is the point where data loss potential exceeds the performance benefit of a lessor number of syncs?

 I'm not an expert on this at all. Basically, I very much trust the kernel hackers who are doing this, so these two (three?) settings are the only options I've changed from the default mke2fs options. Like I mentioned though, you could also play with the journal size (`tune2fs -J size=$SIZE /dev/hdXY`); but be sure to read the relevant section of the man page for the limitations of that size.

----------

## pv

 *codergeek42 wrote:*   

> I don't actually remember why I thought it would increase CPU usage noticably.  If and/or when I do, I'll post that do.

 

I think that after, for example, an hour of buffering data in the memory and not saving to disk, the processor (and disk itself) will have to write much more data to disk than after 30 seconds consuming more time while doing this.

 *codergeek42 wrote:*   

> Basically, I very much trust the kernel hackers who are doing this.

 

Agree. The kernel is the kernel and it must be stable as well as productive (fast and requiring little memory), and nobody but the kernel hackers knows how it works and how it can be improved.

----------

## wrc1944

Thanks codergeek42 and pv.  

Your clarifications really help me understand this much better, and make perfect sense now that I think more carefully about it.

So I guess the conclusion might be that there's really no significant benefit from having a lessor number of syncs per given time period, and in fact it might even be worse. I guess the theory would be that having a very small sync every 5 seconds is far less noticable (and affects overall performance less) than to be going along in some process, and then suddenly having it preempted by a huge sync process, even if you have plenty of memory, and aren't using swap at all. 

Now the only question is how long can I resist the temptation to play around with the commit= setting? And of course then there's the theory that maybe I'm just too much of a nit-picker about these types of things for my own good.   :Smile: 

----------

## RuiP

Another convert here.

codergeek42 you only convince me by half.  :Wink:  The other half was my 75% full reiserfs partition that make my gentoo soo sloooow last 2-3 month   :Laughing: 

After read that reiserfs is not on development anymore i give up invest time on a dead horse and make my move.

I change to a mix of ext3 (with your tune sugestion) and xfs for partiitons with large files and everithing returns to the good old speed.

many thanks!

oh btw. i still keep my /usr/portage on ext2 (emerge --sync is a nice journal sistem  :Smile: ). Do you note a real improvment, in speed terms, to move to the magic ext3 with journal mode?

----------

## codergeek42

 *RuiP wrote:*   

> oh btw. i still keep my /usr/portage on ext2 (emerge --sync is a nice journal sistem ). Do you note a real improvment, in speed terms, to move to the magic ext3 with journal mode?

 To be honest, every partition on my system (including /boot) is Ext3 with these tweaks, so I have no comparison with a "standard" Ext2 setup.  :Smile: 

----------

## wrc1944

I had forgotten about what RuiP mentions- about reiserfs not being actively developed anymore.  I guess all efforts are going into reiser4, where a 6 month trial (/ included) didn't convince me that it was ready- same gradual slow-down problem, but worse.

As for being a convert, I think I'm almost there.  :Very Happy:   I'll shake down this one ext3 tweaked box for a few more days, and then convert my other reiserfs Gentoo boxes if all goes well.

I have noticed 2 little things so far. One, the shell script and python script icons reverting to generic (no others did), and two, on bootup my system sometimes hangs at loading "local start" items for about 30-40 seconds before continuing to a normal kdm boot, whereas it use to only pause 2 seconds under reiserfs. Don't know if that's ext3 related, or not, but I haven't synced and emerged anything since I converted /, so it seems likely ext3 could be the reason.

EDIT: The booting "hang" was fixed by resetting my modem and router, so not ext3 related.

----------

## enderandrew

Has anyone tried this patches?

http://www.bullopensource.org/ext4/

I can't get all six to play nicely with each other.

----------

## codergeek42

 *enderandrew wrote:*   

> Has anyone tried this patches?
> 
> http://www.bullopensource.org/ext4/
> 
> I can't get all six to play nicely with each other.

 I'm waiting for those to hit upstream.  :Very Happy: 

----------

## wrc1944

I looked at the benchmarks on this page, but unfortunately, for some reason they didn't test with journal_data. After reading  this thread and a few other sources, one would think journal_data definitely warrants comparison testing. Surely they (the ext2/ext3 improvement project people) must also be aware of the information and conclusions on journal_data mentioned here, but since they don't even mention it, I'm wondering once again what all this really means.  How could they ignore the Dan Robbins article's statement and conclude journal_data wasn't worth testing?

 *Quote:*   

> Therefore, ext3's data=journal mode, which was assumed to be the slowest of all ext3 modes in nearly all conditions, actually turns out to have a major performance advantage in busy environments where interactive IO performance needs to be maximized. Maybe data=journal mode isn't so sluggish after all!

 

Or is this info now too outdated to be considered, given newer kernels, gcc versions, and far more powerful hardware, etc?

----------

## the_g_cat

Hi all,

Just wanted to ask if there was a way to retrieve the journal and read some parts of it. Bottom line is: I have accidentaly deleted some files, and it is clear to me that I won't be able to undelete them, but I'd need to know the names of the files so I can evaluate what I would need to start looking for again and what not.

Thanks,

Cat

----------

## wrc1944

Well, I guess I'm now an ""official" tuned ext3 convert, as all my Gentoo installations are now converted.  Again, thanks much to codergeek42 for sharing this great info with all of us!

I noticed fallow on page 3 of this thread uses writeback mode.  I was wondering if anyone had any more info, comparisons, or thoughts on this, as opposed to using the journal_data mode? 

I had at first considered using writeback, but after reading the Robbins article and comments here, I chose journal_data. I'm still wondering if I made the correct choice.  Don't get me wrong- I'm very pleased with the change, and even converted my gcc-4.1 test box / to tuned ext3. I just wish to squeeze every last % of performance out of my Linux boxes.

If I did decide to try writeback, am I correct is thinking I can just freely switch back and forth between modes by mounting a knoppix cd, and running:

tune2fs -o journal_data_writeback /dev/hdxy

and then, e2fsck -D /dev/hdXY 

on my unmounted Linux partitions? (And of course, experience no difficulties?)

----------

## ExZombie

tune2fs manpage has a short and to-the-point description of the three journal modes available. Writeback mode is the most performant, but if the filesystem is not unmounted cleanly you might (and you will) find that some files contain old data. To be exact, the files that were written to last and did not yet have their data synced will contain old data.

This is not actually that bad, provided you manually use 'sync' command before doing something that could hardlock your system. Having 'Magic SysRq' support compiled in your kernel can also be useful  :Smile:  .

And yes, you can freely change journal modes. It's not even necessary to use tune2fs - by using it you are only changing the defaults which are written in the superblock. You can simply supply the option in fstab.

----------

## XenoTerraCide

```
Filesystem features:      has_journal dir_index filetype needs_recovery sparse_super
```

 what do the last 2 features mean?

----------

## codergeek42

needs_recovery means that the filesystem is marked "dirty". If it is currently mounted, then this is perfectly normal behavior, as it will be marked "clean" when it is unmounted normally. This is to tell the startup scripts that the partition needs to be fsck'ed before it can be used again if, for example, a power outage occurs and the partition is not unmounted correctly.

sparse_super means that the filesystem stores less backup copies of the superblock, which increases the space available to the standard filesystem semantics, and is usually very safe since there are still many backup superblock copies anyway.

Edit: Some minor wording corrections. 

----------

## XenoTerraCide

yeah it's mounted... and otherwise says clean... thanks...

----------

## jedsen

My reiserfs box still beats out my ext3 box when emerge syncing by about 2-3x, even after these tweaks. And no problems yet (knock on wood).Last edited by jedsen on Thu Dec 22, 2005 10:44 am; edited 1 time in total

----------

## BeteNoire

Ok, there is a nice discussion here but I've got a question: what ext3 options do you recommend for a large filesystem (data storage) about 65 GB, which will store 10000-12000 files size of 3-15 MB?

I had ext3 on this partition once, created with default options and was very unhappy with its performance. Then I switched to xfs and noticed better performance, but now I want to give ext3 another try.

I want to give it maximum read performance as writes will not occur very often - this is to be only storage filesystem.

----------

## enderandrew

I'm still looking for a way to get these working.

http://www.bullopensource.org/ext4/

----------

## itsr0y

Well, in case anyone wants some actual (if meaningless) numbers, here they are:

```
+----------+------+----------+

|          | ext3 | reiserfs |

+----------+------+----------+

| bootup   | 45   |  50      |

| login    | 10   |  12      |

| sync     | 506  |  93      |

| updatedb | 662  |  90      |

| ctags    | 24   |  55      |

+----------+------+----------+

```

Notes:

all numbers are in seconds

reiserfs mounted using noatime,notail

ext3 mounted using noatime and using the dir_index option (not the writeback thingie)

bootup is grub to gdm login

login is password entry in grub to finishing loading PekWM and my desktop apps (conky, etc)

sync is an "emerge sync"

updatedb is just that

ctags is running "exuberant-ctags -R" on usr include

Background:

I had a reiserfs partition of about 8 gigs at the end of my disk that I needed to make bigger.  So I wiped off my windows partition (30 gigs) and made a new ext3 partition at the front of the drive.  Then I did a "cp -ax" to copy all files from the old reiserfs to the new ext3 partition.  Before copying, I did a "tune2fs -O dir_index /dev/hda3".  Then, I booted up into each and ran the above commands.  Each partition was in the same exact state when I ran them.  If you really care for more details, let me know.

Conclusion:

DAMN! ext3 is SLOW when trying to access a lot of files.  I mean, it is like 500% slower doing emerge sync and updatedb!  Interestingly enough, booting and ctags were a little faster.  Maybe this has to do with the dir_index option.  If I can disable it and try running the test again, I will.  Can I just run "tune2fs -O ^dirindex /dev/hda3" and will that get rid of all dir indexes?  Also, maybe that writeback thing can speed things up.  Does this mean you put the journal on its own partition?  How do you set that up?

Hopefully someone might find this interesting or useful.  If you'd like me to try anything else, or you think you can help me fix that SUPER SLOW emerge problem, please tell me!

----------

## ExZombie

Definitely try writeback, i think (but am not sure) reiserfs uses it as well. It would be interesting to see if it is really any faster. I am too lazy to do any king of benchmarking  :Wink:  .

As for putting the journal to a separate partition, it's useless, or even harmful to performance because the head has to reach a completely different part of the disk to write to the journal. On the other hand, putting it to a separate drive should boost performance significantly, especially in full data journaling mode.

----------

## pv

itsr0y, please, try data=journal while mounting ext3/reiserfs or journal_data with tune2fs.

I run some tests including unpacking portage-20050712, linux-2.6.13.4 and full my /home/ partition, the latter having about 4.5G of files. I did that with data=journal,noatime with both reiserfs and ext3 and with dir_index for ext3 and I had the results like your ones (except sync and updatedb). My results for the portage and the kernel are described above. As to 4.5Gb of /home/ files (KDE config, all the distfiles I use, my work files, etc), ext3 is 10-20% FASTER then reiserfs.

----------

## pv

I have just seen TWO kjournald in the process list. I guess it's due to my having two ext3 partitions mounted now.

Can I reduce the number of kjournalds to only ONE? I'm afraid TWO daemons eat too much resources in spite of that I don't really need both of them.

Why cannot a single daemon solve the disk access tasks?

----------

## itsr0y

Ok, I ran a few more tests and I found reiserfs to be WAY faster than ext3:

```
+----------+-----+-----+-----+-----+-----+-----+

|          | (1) | (2) | (3) | (4) | (5) | (6) |  all times in seconds

+----------+-----+-----+-----+-----+-----+-----+  (1) original reiserfs (notail)

| bootup   | 50  | 45  | 45  |     |     | 45  |  (2) ext3 (dir_index)

| login    | 12  | 10  | 10  |     |     | 10  |  (3) ext3 ()

| sync     | 93  | 506 |     |     |     | 53  |  (4) ext3 (journal_data)

| updatedb | 90  | 662 | 567 | 729 | 735 | 57  |  (5) ext3 (writeback,dir_index)

| ctags    | 55  | 24  | 23  |     |     | 16  |  (6) reiserfs ()

+----------+-----+-----+-----+-----+-----+-----+

| space*   | 7.6 |          8.0          | 7.0 |  * space consumption in GB

+----------+-----+-----+-----+-----+-----+-----+

```

Setup:

I had an 8 gb reiserfs partition (1) using notail at the end of the drive.  I created a new 22 gb partition at the beginning of the drive, first using ext3 (2-5) then finally switched over to reiserfs with tail (6), copying the data from the original partition using cp -ax.  I tried using the options listed above, and they only seemed to slow down the system.  The blanks indicate a test I didn't perform.

Conclusion:

I'm back on reiserfs, and I'm never going to ext3 again.  It was rediculously slow and used up a heck of a lot more space.  I think any speedup gained from it (such as the faster ctags and bootup time) was simply because it was defragged by copying the data to a new partition.  Additionally, I will no longer use the notail since it just uses up space and seems slower anyway.

I realize my test results are far from definitive, but there is no way I can live with the 10-12x slowdown in emerge sync and updatedb!

----------

## XenoTerraCide

well I stopped using reiser cause it caused issues... and recovery wasn't as good.

----------

## wrc1944

itsr0y,

I notice you didn't list ext3 with BOTH dir_index and journal_data enabled for any of your tests. Apparently, that was the combination that supposedly is the best for desktop interactive response. I've used that combination for 2+weeks now (coming from reiserfs, and notice no slowdown in emerge sync, and updatedb on 4 different Gentoo installations. I'm still looking for some more reports on writeback vs. journal_data mode as to which is the best overall.

Your better reiserfs performance without notail is very curious, as it seems to contradict most info I've ever seen.

----------

## andrewd18

codergeek42 -

Thank you for your excellent documentation. I've been using ext3 for a while - reiserfs on SuSE was iffy at best with recovery, and Partition Magic recognizes ext3, so it made my life easier to switch over. Little tweaks like these continue to solidify my growing affection for ext3. Thanks much.

~~ Andrew D.

----------

## codergeek42

andrew18, Happy to help.  :Smile: 

----------

## neenee

as i was running out of space on my separate root partition,

i had to increase its size - i could not shrink my /home par-

tition to make room for it (it was xfs), so i had to make new

partitions anyway and put back the files.

so i moved to ext3  :Smile: 

everything seems to be working fine, no problems at boot

and i like that my kernel is a bit smaller because i could re-

move xfs support.

portage seems a bit faster than on xfs, which is good too.

thanks for the nice guide / tips  :Wink: 

----------

## BeteNoire

I wonder why, oh why, everyone ignores my previous post in this thread?

----------

## enderandrew

It has been stated a few times (including a few posts above you) that dir_index and journal_data seem to offer the best results for ext3.

Depending on your kernel version, you can also try the ext3 kernel patches I posted above.  They claim to offer improved performance, but they were designed for 2.6.11 and don't work with the latest kernels.

----------

## ruben

 *BeteNoire wrote:*   

> Ok, there is a nice discussion here but I've got a question: what ext3 options do you recommend for a large filesystem (data storage) about 65 GB, which will store 10000-12000 files size of 3-15 MB?
> 
> I had ext3 on this partition once, created with default options and was very unhappy with its performance. Then I switched to xfs and noticed better performance, but now I want to give ext3 another try.
> 
> I want to give it maximum read performance as writes will not occur very often - this is to be only storage filesystem.

 

One should always try to use the right file system for the job. Maybe xfs is better for the kind of usage you describe. If you go with ext3, you should take a large block size and definitely use the dir_index optimisation. I do think however that with a size of 65GB, mkfs.ext3 will automatically chose a large block size.

Here's what i'd use:

```
mkfs.ext3 /dev/hdxn -b 4096 -m 1 -O dir_index
```

Large block size, and only 1% reserved for root (instead of default 5%).

I would mount it with data=writeback and commit=600. I believe this will give you good performance and is in line with the usage you describe.

----------

## pv

 *ruben wrote:*   

> I would mount it with data=writeback and commit=600. I believe this will give you good performance and is in line with the usage you describe.

 

It was said too much about large values for commit= option so I don't think it's good to make it being more than 10-20 seconds. I HAVE TRIED that even with 5 seconds the system sometimes (rarely) become very unresponsible for 1-2 seconds.

As to data=writeback, I live in city where nobody knows when the electricity may be turned off so I prefer data=journal.

 *BeteNoire wrote:*   

> I wonder why, oh why, everyone ignores my previous post in this thread?

 

As to me, unfortunatelly I cannot give you a piece of advice except those given above in this forum. You can also read 

```
man tune2fs

man mke2fs

```

----------

## itsr0y

I don't understand why the method of writing the journal would have anything to do with read speed.  ven if I turned off journaling (which should give the biggest performance boost), I can't imagine that it would have any affect on the read speed, and certainly not the 10x speedup required to beat reiserfs.  As for dir_index, enabling it should improve read access, but in my tests it clearly did not.

----------

## ruben

 *itsr0y wrote:*   

> Conclusion:
> 
> DAMN! ext3 is SLOW when trying to access a lot of files.  I mean, it is like 500% slower doing emerge sync and updatedb!  Interestingly enough, booting and ctags were a little faster.  Maybe this has to do with the dir_index option.  If I can disable it and try running the test again, I will.  Can I just run "tune2fs -O ^dirindex /dev/hda3" and will that get rid of all dir indexes?  Also, maybe that writeback thing can speed things up.  Does this mean you put the journal on its own partition?  How do you set that up?
> 
> Hopefully someone might find this interesting or useful.  If you'd like me to try anything else, or you think you can help me fix that SUPER SLOW emerge problem, please tell me!

 

These results don't really surprise me. Especially not the results of the "sync" test. If you'd like to know why, check the on-disk size of /usr/portage (excluding the distfiles directory). You would see that on the reiserfs system, this is a *lot* smaller than on the ext3 system. The original reiserfs partition is 8Gb in size, i wonder how big the block size is. The ext3 partition is 22Gb in size, i guess that block size will be 4k, which means you'll have a lot of waste, since /usr/portage contains lots of tiny files. If you'd like to use ext3 for /usr/portage (excluding distfiles), i'd just stick to ext2 with dir_index with a block size of 1k because journalling isn't important for this, it's faster, and lots of tiny files, thus take a small block size for less waste.

The difference between the 2 reiserfs timings is not very surprising either. There's two issues here: fragmentation and 'notail'. It seems you don't really know what "notail" actually means, since you say "notail just uses up space and seems slower anyway". Of course, "notail" is using up more space, since it actually disables a space optimisation. When you don't use the "notail" option, the data which would fill a block only partially, is stored in the tree itself. This means that very small files and the end of files is stored in the tree itself. If you use "notail", the small files and the end of a file always consume a whole block, e.g. a 4096 byte block, even if only 100 bytes are used. So, you gain space by doing this optimisation. This also means that your "sync" is gonna be faster, since it needs to access less disk space. So, "notail" is not necessarily faster, it can be faster since it uses less cpu time, but it can be slower since you need to access less blocks on disk for accessing the same amount of data. The other factor is fragmentation: believe me, reiserfs gets fragmented over time much more than ext2/3. So, the difference in timings between the 2 reiserfs measurements is also caused by fragmentation (although i don't know how old the 8Gb partition is in terms of disk activity). But i've seen that fragmentation can have a dramatic effect on performance for a reiserfs partition.

The "sync" and "updatedb" in the ext3 system is worse than in the reiserfs system. Lots of tiny files is the strong point of reiserfs, so that explains the "sync" performance. I don't know an explanation for the "updatedb", it would mean that reiserfs is faster in reading the directory entries than ext3. You did mount the ext3 with dir_index *before* copying the files to the ext3, right?  I never measured "updatedb" performance myself though. I've done some tests with "sync" performance and "gnome" startup time. Also, reiserfs uses a hash function to find files in directories.. maybe that has something to do with it too.

Another thing is that you should keep in mind that by default ext3 does more journalling than reiserfs does. To get that to the same, you'd need to use "writeback" mode for the journalling. In addition, by default, ext3 commits the journal each 5 seconds, while i've read that reiserfs does that much less frequently. Finally, you do not use the option "noatime", which means that for each read, you also do a write to update the access time of the file or directory. I'm not sure whether this kind of write is also journalled or not, but this can also be something that has influence.

As for me, i still have some reiserfs partitions on my desktop, but for my laptop, i only use ext3. Performance degradation due to fragmentation was really bad with reiserfs. For /usr/portage, i used a small partition with ext2 to (a) limit any fragmentation to that partition, to (b) performance and (c) small block size. However, nowadays i use Debian on my laptop and i have only ext3 partitions now. As a final remark, i cared more about startup time of gnome than about "sync" or "updatedb" performance. Ext3 uses less cpu time than reiserfs, and that kind of applications can be run in the background without dragging the whole system down.

----------

## ruben

 *itsr0y wrote:*   

> I don't understand why the method of writing the journal would have anything to do with read speed.  ven if I turned off journaling (which should give the biggest performance boost), I can't imagine that it would have any affect on the read speed, and certainly not the 10x speedup required to beat reiserfs.  As for dir_index, enabling it should improve read access, but in my tests it clearly did not.

 

It depends on what is put in the journal. I'm not sure, but i wouldn't be surprised if an "ok" entry is added to the journal every 5 seconds even though you only do reads. And as i mentioned in the other post, there is a write for updating the access time. About "dir_index", it should be enabled before files are put on the system, otherwise it only has effect for files copied to the file system later on. Also, "dir_index" improves the time needed to find the location of the file on disk, it does not improve the time needed to read the file itself, but it is part of the total read speed of course. Its effect will be most visible in directories with lots of files. "dir_index" improves read access compared to ext2/3 without "dir_index".

----------

## BeteNoire

I keep track this thread since few weeks trying to find best solution for me, but there is always one question which concerns me: if ext3 is the best linux filesystem (as some say) then why it doesn't use its best features by default? Why do I have to read manuals, make test or search for reliable opinions, to get its best features to work?

I switched to reiserfs some time ago, because I was unhappy with ext3 performance. I noticed best performance and overall activity of reiserfs without any tuning. Never had any corruption and never lost any data stored on my r-fs partitions. So I stayed with it.

But now it's time to re-think my solutions, and if someone can explain why do I have to do some special procedures to get ext3's best features, why it doesn't use them by default - I'd be very thankful.

----------

## alexlm78

Very usefull, thanks a lot.

----------

## pv

 *BeteNoire wrote:*   

> But now it's time to re-think my solutions, and if someone can explain why do I have to do some special procedures to get ext3's best features, why it doesn't use them by default - I'd be very thankful.

 

1. As you can notice MOST Linux programs WOULD be configured before usage. This is the reason why novices are much afraid of Linux at all and tries using "simplier" software. And this is why Linux appears to be VERY configurable OS (especially source-based distros). If you've configured some Linux feature (for example, configure network or tune filesystem or compiler flags) you have the idea HOW IT WORKS. In some other OSes if you've configured some feature you just have the idea HOW TO CONFIGURE IT. In other words, you cannot correctly configure Linux without the knowledge of basic principles of its working. I think it's wonderful  :Very Happy: 

2. The idea of "the best features" is relative. For example, I think data=journal is the best for its stability. But many people think it's too slow and recommend everybody using data=writeback.

Linux is YOUR choice. So YOU (not YOUR DISTRO'S VENDOR) HAVE to choose.

----------

## XenoTerraCide

 *Quote:*   

> but there is always one question which concerns me: if ext3 is the best linux filesystem (as some say) then why it doesn't use its best features by default? 

  the answer is simply this. what's best is your choice. it's like Windows XP and it's flashy xp style an animations are all on by default. I make my system go back to the 2000 and prior way of looking because the only system I've run that this causes no performance loss is my 64-bit laptop. the defaults are what the developer(s) think is best for most people. correct me if I'm wrong but but doesn't data=journal consume a little more hard drive space. which is probably why it's off by default. and data=writeback... couldn't that cause problems if the system loses power suddenly? so the default's for say ext3 are the compromise between speed and space. this is an opinion however feel free to dispute it.

----------

## linuxtuxhellsinki

I'm also using ext2/3 on almost all of my partitions, only on few computer's /home partitions are reisers.

Only place where I think that reiserfs(or ext2) could be better because it's fast with small files is /usr/portage (& /tmp). So I think that I'll make one about 4GB partition and mount it to /usr/portage and then using some /usr/portage/tmp/ as /tmp-dir for compiling etc.(which is cleaned in boot-time)

Because those are not so important and it's easy to sync portage-tree back if something happens   :Wink: 

I was also trying first time that dir_index option on one machine's /usr partition, where I copied the whole system from another almost similar P4, and it's feeling really fast with emerging & syncing with it.

That could be because there's 10K RPM SCSI drive in it, while the older machine has 7.2K SCSI. 

Maybe I could run some tests, but it'd not be fair for that older system cause it's not with fresh filesystems  :Rolling Eyes: 

codergeek42 - Thaks for nice informatic thread   :Smile: 

PS. That copying the whole system to another 'quite' similar computer is really nice 'feature' in Linux-world   :Razz: 

----------

## XenoTerraCide

hey linuxtuxhellsinki how did you do the copying of the whole filesystem. I wanted to do that once but no one could tell me how. I'm sure I could do it now... but I'm wondering how you did it.

----------

## pv

 *XenoTerraCide wrote:*   

> Correct me if I'm wrong but doesn't data=journal consume a little more hard drive space which is probably why it's off by default.

 

Sorry, you are wrong. When creating ext3 partition the disk area for the journal is allocated in any case. When mounting, it already exists and so there is no difference in disk usage between data=journal and data=writeback.

BUT! Using data=journal forces more data to be written to disk because the data is written to the journal and only then to the disk. The latter increase the time neccesary to write data to disk.

 *XenoTerraCide wrote:*   

> And data=writeback... couldn't that cause problems if the system loses power suddenly?

 

Yes, data=writeback can cause problems in this case.

----------

## XenoTerraCide

yeah but data=journal isn't the default neither is data=writeback.

----------

## XenoTerraCide

the default is metadata. and the question asked is why do they not use the best option's by default. my answer for data=journal is that it would take up more hard drive space than say metadata. I don't know why dir_index wouldn't be on by default though.

----------

## codergeek42

 *XenoTerraCide wrote:*   

> the default is metadata. and the question asked is why do they not use the best option's by default. my answer for data=journal is that it would take up more hard drive space than say metadata.

 To my knowledge, the size of the journal does not change, whichever data-writing mode you choose (unless you run `tune2fs -J <options>`).

----------

## XenoTerraCide

well if that's the case I may be wrong. I'm really making an educated guess. if data=journal commits more to the journal than standard metadata journaling then it would make sense that it would take more space... how much i don't know. but if you say it doesn't I stand corrected.

----------

## StringCheesian

So is a "The default ext3 options suck, change them!" petition in order? Who's attention do we bring this to anyway?

----------

## linuxtuxhellsinki

 *XenoTerraCide wrote:*   

> hey linuxtuxhellsinki how did you do the copying of the whole filesystem. I wanted to do that once but no one could tell me how. I'm sure I could do it now... but I'm wondering how you did it.

 

I just installed the new (scsi)drive into that other computer & made partitions & filesystems on it.

Mounted the partitions to /mnt/zda/ ../boot ../usr & that original Gentoo to /mnt/gentoo/ ....etc. via LiveCD, so that there's nothing changing when copying and there's no /dev, /proc & /tmp to exclude from copy.

#cp -a /mnt/gentoo/* /mnt/zda/ to copy all the data to the new drive.

Fixed the entries in /etc/fstab, grub.conf and some hostname & IP in /etc/conf.d/

Then I booted to that my Gentoo to install the grub to the new drive (could be just chrooted I think). I had to doublecheck this part cause I had 4 hd's & two scsi-drives had almost similar partitiontables, so grub showed them same way (hd2 & hd3) with similar partitions. So I checked 'em in another console with fdisk -l.

Shutdown, move drive to another computer(with similar scsi-card), boot & that's it Voila  :Cool: 

To clear a things little bit, the both computers were P4 (1.4 & 1.6Ghz) with quite similar Intel chipsets but different mobos. I had to run xorgconfig cause there was different GPU (I always boot first to textmode).

1st I thought that I've to make new Kernel to the other computer, but then I realized that there's so similar chipsets that the same kernel should work (& so it did work).

But I think that you can copy the whole system to another machine when there's not any big architechtural differences (like P4 > AMD64) with just making the new Kernel with the support to right chipset & HW  :Rolling Eyes: 

& now I remembered when I had first time installed Gentoo to this very same ThinkPad600 which I'm writing this from, and used this (PII 300Mhz) to install it to another PI 133Mhz ThinkPad's harddrive cause it hadn't got cd-rom or NIC on it, and that went well also while it was my second install of Gentoo (totally n00b)  :Very Happy: 

----------

## lnxz

 *XenoTerraCide wrote:*   

> well if that's the case I may be wrong. I'm really making an educated guess. if data=journal commits more to the journal than standard metadata journaling then it would make sense that it would take more space... how much i don't know. but if you say it doesn't I stand corrected.

 

I don't think it commits *more* it just commits to metadata before it commits to the disk.

----------

## XenoTerraCide

from tune2fs manpage.

```
          

                   journal_data

                          When  the  filesystem  is  mounted  with journalling

                          enabled, all data (not just metadata)  is  committed

                          into  the  journal  prior  to being written into the

                          main filesystem.

                   journal_data_ordered

                          When the  filesystem  is  mounted  with  journalling

                          enabled, all data is forced directly out to the main

                          file system prior to its metadata being committed to

                          the journal.

                   journal_data_writeback

                          When  the  filesystem  is  mounted  with journalling

                          enabled, data may be written into the main  filesys-

                          tem  after  its  metadata  has been committed to the

                          journal.  This may increase throughput, however,  it

                          may  allow old data to appear in files after a crash

                          and journal recovery.

```

so what were saying is that in the default all data is still committed to the journal however. it does metadata then the file then the rest of the data? cause if the default only does metadata it would seem to me that committing all the data to the journal would take up more space although the space may be extremely minuscule size. but everyone here says I'm wrong so yeah probably am... even if I was right... I'm not changing my options. It obviously doesn't take up enough space as to worry about it, and it's way better than what ntfs does. and can't say as I thought reiserfs was better. and when I had fs problems I can't say I could get them fixed in reiserfs either. never have played with xfs or jfs however.

----------

## ruben

With "journal=data", there is effectively more data written to the journal, that's for sure. The size of the journal is at least 1024 filesystem blocks. But if i got it right, the journal is flushed each time a "commit" happens, thus by default each 5 seconds. And i suppose that it is written in a way that the journal is flushed/committed when it's full.  So, i suppose that when you have "journal=data" there might be more 'forced' commits if you're having a high write load.

As to the question whether "ext3" is the "best" file system, and why it does not have by default the "best" options. Let me tell you that there is no such thing as the absolute "best" file system and there is no such thing as the absolute "best" set of default options. You chose a file system for its strengths and that's what's making that particular file system the "best" file system for *your* needs. The same with the options, the best set of options is completely dependent on your situation and your needs. The developers just chose a set of default options which are/seem reasonable for most use cases.

----------

## carpman

Hello, been reading this thread with great interest, normally use this layout

```

/boot        ext2

/              ext3

/usr          ext3

/var          reiserfs

/opt          xfs

/home       xfs

/tmp         reiserfs

```

May well try the suggestion in 1st post on laptop and see what occurs.

I am also building a server for a socketmail.com project and currently have

```

/var/lib/mysql         xfs

/var/www               reiserfs

```

Now after reading this i thinking of changing to ext3 with optimisation but not sure if it would benefit system?

Any feedback as to why i should change would be appreciated.

cheers

----------

## linuxtuxhellsinki

Maybe /home to reiser (for performance) ? ? ?

----------

## XenoTerraCide

I would say /home on ext3. because reiserfs is geared towards smaller files and ur more likely to have medium size /home. and ext3 has better recovery options.

----------

## carpman

Thanks for replies, was thinking more of server partitions.

Choose xfs for mysql as anticipate large database, choose reiserfs for /var/www as it will hold users files.

Have been happy with xfs but now thinking that ext3 with optimisation maybe better for /var/www extra data security being one reason.

For my home systems i use xfs for /home and am happy with it so need convincing ext3 would be better, maybe /home as ext3 with optimisation and have dir that have large file, images, mp3 extra as xfs?

Only problem i have run into with xfs is resizing when used with lvm2, this alone may make me change if i can get good performace with ext3.

----------

## RuiP

Hi, the question about /home should not be, i believe, optimazation, but safety. 

XFS is good but a lot of people had problem with system not properly shutdown due to sudden power off. 

They files became empty or filled with zeros... 

In fact any part of system could be "recovered" eventually by installation, but your personal data is unique. 

Your letters, docs, photos, whatever. If something goes wrong there is no way that you can get that data from nowhere!

Besides XFS is good special for large files. Delete is not good benchmarked and read/write small files (what usually happens when a user are logged, with configs and caches beeing written all the time) are usually beat by ext3 or reiser. 

The only point i can see in using XFS for home is if you work specially with video or large size images... in that case maybe one gains something in terms of speed, and ususally those came from scanners and cameras so a copy or a redo maybe available most of time. Even in that case i prefer a separate XFS partition linked to my home. I use one for files and isos download from internet and for gentoo distfiles as ubuntu archives.

My 2 cents. Hope they help in any way.

----------

## neenee

please keep filesystem-choice discussions

confined to The Filesystem Choice thread  :Wink: 

----------

## carpman

 *RuiP wrote:*   

> Hi, the question about /home should not be, i believe, optimazation, but safety. 
> 
> XFS is good but a lot of people had problem with system not properly shutdown due to sudden power off. 
> 
> They files became empty or filled with zeros... 
> ...

 

Hello, that was my plan for future layout, but still does not answer question relating to server, will ext3 with optimisation be better then xfs for large database?  i am thinking from posts that although slightly slower ext3 with optimisation is going to be better for /var/www !

/var/www is going to get lot of simultanious reads and writes?

----------

## pv

Hi all!

Several hours ago my computer was suddenly halted due to electricity problems. When electricity was restored I booted Gentoo and the following happened.

While mounting ReiserFS I saw the message like

```
ReiserFS: hda7: replayed 19 transactions in 1 seconds

```

which I could found in /var/log/messages or see with dmesg then.

But my ext3 partition (kernel module) didn't show me the similar message although while booting I saw something like

```
Checking all filesystems

/dev/hda3: recovering journal

/dev/hda3: clean
```

The latter is (as I found int /etc/init.d/checkfs) due to running fsck on the corresponding partition. Moreover I'm unable to find something concerning journal recovering in /var/log/messages.

So I have a few questions:

1. Does ext3 kernel module recover journal itself (not using fsck program) while mounting wrong-unmounted filesystem?

2. If so, why doesn't it show the corrresponding message in /var/log/messages?

3. Can I rely on the kernel module and, for example, disable checkfs script during boot process?

----------

## codergeek42

 *pv wrote:*   

> So I have a few questions:
> 
> 1. Does ext3 kernel module recover journal itself (not using fsck program) while mounting wrong-unmounted filesystem?

 It seems to from my tests. Nice catch.  :Smile:  *Quote:*   

> 2. If so, why doesn't it show the corrresponding message in /var/log/messages?

 All it shows is that "recovering journal" message. I think you can enable the JBD (journalling block device layer) debugging for more verbose messages. *Quote:*   

> 3. Can I rely on the kernel module and, for example, disable checkfs script during boot process?

 Though you probably could, I wouldn't. Though this is probably a matter of personal preference...

----------

## pv

 *codergeek42 wrote:*   

>  *Quote:*   2. If so, why doesn't it show the corrresponding message in /var/log/messages? All it shows is that "recovering journal" message. I think you can enable the JBD (journalling block device layer) debugging for more verbose messages.

 

I made a stupid mistake  :Sad: 

JBD doesn't solve this problem because the problem is the following.

My set of ext3 partitions (unfortunatelly) doesn't contain the root one so Gentoo supposes they are local filesystems. During bootup process Gentoo startup scripts firstly checks all local filesystems marked in /etc/fstab with the last field not equal to 0 and only then mount them. So the startup script run fsck that recovers the ext3 partition and then tries to mount it. As fsck recovers the journal the kernel has nothing to do with it because it's recovered by the time it's being mounted. It's why the message like 'recovering filesystem' doesn't appear in var/log/messages.

When I had marked ext3 partitions with 0 as the last field in /etc/fstab, fsck didn't checked them while booting after hard-reset and the kernel wrote the recovery message in /var/log/messages even without JBD support.

I hope this will somehow help somebody in understanding linux boot process.

----------

## wrc1944

After much thought and re-reading everything, I decided to try the ext3 journal_data_writeback mode (changed from my original journal_data choice), so I booted from a knoppix cd and ran:

tune2fs -o journal_data_writeback /dev/hda3

and then as an experiment edited my fstab ext3 lines to:

/dev/hda1		/boot		ext3		noauto,noatime,commit=120		1 2

/dev/hda3		/		ext3    	noatime,commit=120		0 1

Then I rebooted back  into Gentoo, and dumpe2fs shows the new journal mode, but apparently not the new commit=120 mount option.  What am I missing, and if commit=120 is in effect, how would I know for sure?

EDIT: dmesg shows:

kjournald starting.  Commit interval 5 seconds

EXT3-fs: mounted filesystem with writeback data mode

So I guess commit=120 isn't in force for some reason I'm missing.

---------------------

EDIT NUMBER TWO:

Reading some more, I decided instead of fstab, I actually needed to edit the grub.conf kernel boot line.

commit=nrsec (*) Ext3 can be told to write all its data and metadata +

every 'nrsec' seconds. The default value is 5 seconds. ...  

What is the correct syntax? I've tried adding  commit=nrsec=120, commit=120, and commit-nrsec (120) to my grub.conf line,

like this. Where am I going wrong? I've googled for hours, read hundreds of sites, and not found one example of the correct syntax. All that I've seen is just what the kernel Docs and manpages say, but no real world example of what they actually mean as to syntax. All my attempts and rebooting result in dmesg still reporting the same default 5 second interval.

title=2.6.13-gvivid

root (hd0,0)

kernel (hd0,0)/boot/2.6.13-gvivid root=/dev/hda3 video=vesafb:ywrap,mtrr vga=0x317 splash=verbose,theme:gentoo CONSOLE$=/dev/tty1 quiet commit=120

initrd /boot/fbsplash-gentoo-1024x768

----------------------------------------------

EDIT NUMBER 3: APPARENTLY SOLVED!

I added:

rootflags=commit=120 

to my kernel boot line in grub.conf, and dmesg now shows:

kjournald starting.  Commit interval 120 seconds

EXT3-fs: mounted filesystem with writeback data mode.

How ridiculous! Should have figured this out much sooner.  :Embarassed: 

------------------------------------------------

mymachine wrc # dumpe2fs /dev/hda3

dumpe2fs 1.38 (30-Jun-2005)

Filesystem volume name:   <none>

Last mounted on:          <not available>

Filesystem UUID:          484ed04b-a858-425c-a917-0a25a1b28990

Filesystem magic number:  0xEF53

Filesystem revision #:    1 (dynamic)

Filesystem features:      has_journal dir_index filetype needs_recovery sparse_super

Default mount options:    journal_data_writeback

Filesystem state:         clean

Errors behavior:          Continue

Filesystem OS type:       Linux

Inode count:              1221600

Block count:              2441376

Reserved block count:     122068

Free blocks:              1027232

Free inodes:              842422

First block:              0

Block size:               4096

Fragment size:            4096

Blocks per group:         32768

Fragments per group:      32768

Inodes per group:         16288

Inode blocks per group:   509

Filesystem created:       Sat Dec 10 09:35:31 2005

Last mount time:          Tue Jan  3 14:00:59 2006

Last write time:          Tue Jan  3 14:00:59 2006

Mount count:              16

Maximum mount count:      26

Last checked:             Mon Dec 19 11:46:25 2005

Check interval:           0 (<none>)

Reserved blocks uid:      0 (user root)

Reserved blocks gid:      0 (group root)

First inode:              11

Inode size:               128

Journal inode:            8

First orphan inode:       977337

Default directory hash:   tea

Directory Hash Seed:      36681716-6609-4e78-a3f2-0699a3adb1e9

Journal backup:           inode blocks

Group 0: (Blocks 0-32767)

  Primary superblock at 0, Group descriptors at 1-1

  Block bitmap at 2 (+2), Inode bitmap at 3 (+3)

  Inode table at 4-512 (+4)

  0 free blocks, 16277 free inodes, 2 directories

  Free blocks:

  Free inodes: 12-16288

Group 1: (Blocks 32768-65535)

  Backup superblock at 32768, Group descriptors at 32769-32769

  Block bitmap at 32770 (+2), Inode bitmap at 32771 (+3)

  Inode table at 32772-33280 (+4)

  27697 free blocks, 14292 free inodes, 560 directories

  Free blocks: 33833-36863, 36869-38911, 38914-43007, 43012-51200, 51203-53247,  etc.  etc.

----------

## carpman

Hello, i have just finised building a system using ext3 and tips given here, except /usr/src which i used reiserfs.

While compiling a number of app emerge stopped complaining of no space on 1.5gb /usr/portage partition, strange thought that would be enough.

I emptied /usr/portage/distfiles on this system and my notebook (reiserfs) to compare size:

```

Notebook, reiserfs /usr/portage = 175mb

New system ext3 /usr/portage = 580mb

```

Both of these are on own partitions.

I have to say that is big difference and if same across whole system is not good but suspect it is the many small files in portage, may change this partition to reiserfs.

Anyway rest o system is feeling nice and snappy

----------

## ruben

@carpman:

Could you tell us the block size on the ext3 system? (with dumpe2fs)  You might want that to be 1024 bytes per block, that should save space with all the tiny files in /usr/portage.

----------

## XenoTerraCide

/usr/portage/distfiles is 1.6 on mine 

```
SLAVE-I ~ # du -sh /usr/portage/

2.1G    /usr/portage/

```

----------

## carpman

 *ruben wrote:*   

> @carpman:
> 
> Could you tell us the block size on the ext3 system? (with dumpe2fs)  You might want that to be 1024 bytes per block, that should save space with all the tiny files in /usr/portage.

 

Thanks for reply, here is first part output, take it you don't want all of it?

```

dumpe2fs /dev/vg/usrportage

dumpe2fs 1.38 (30-Jun-2005)

Filesystem volume name:   <none>

Last mounted on:          <not available>

Filesystem UUID:          0bb5438b-e4b2-4afc-90ee-2d304ccea4e2

Filesystem magic number:  0xEF53

Filesystem revision #:    1 (dynamic)

Filesystem features:      has_journal dir_index filetype needs_recovery sparse_super

Default mount options:    journal_data

Filesystem state:         clean

Errors behavior:          Continue

Filesystem OS type:       Linux

Inode count:              192000

Block count:              384000

Reserved block count:     19200

Free blocks:              229280

Free inodes:              58978

First block:              0

Block size:               4096

Fragment size:            4096

Blocks per group:         32768

Fragments per group:      32768

Inodes per group:         16000

Inode blocks per group:   500

Filesystem created:       Wed Jan  4 11:58:04 2006

Last mount time:          Sat Jan  7 00:09:43 2006

Last write time:          Sat Jan  7 00:09:43 2006

Mount count:              6

Maximum mount count:      37

Last checked:             Wed Jan  4 11:58:04 2006

Check interval:           15552000 (6 months)

Next check after:         Mon Jul  3 11:58:04 2006

Reserved blocks uid:      0 (user root)

Reserved blocks gid:      0 (group root)

First inode:              11

Inode size:               128

Journal inode:            8

Default directory hash:   tea

Directory Hash Seed:      810e1be4-fdec-4858-9225-a69341b0d545

Journal backup:           inode blocks

```

----------

## XenoTerraCide

I have a question for ruben so he can educate me... why did we ask carpmen for the output of dumpe2fs? couldn't we have gotten that information from 

```
tune2fs -l /dev/xyz 
```

? without all the crap dumpe2fs put's out.

----------

## RuiP

 *XenoTerraCide wrote:*   

> I have a question for ruben so he can educate me... why did we ask carpmen for the output of dumpe2fs? couldn't we have gotten that information from 
> 
> ```
> tune2fs -l /dev/xyz 
> ```
> ...

 

what thats for?

tune2fs -l output is short but I think carpman's computer should have resist the brutal impact of dumpe2fs  :Wink: 

carpman you can re-format your partition with a smaller block size to optimize space:

```
mke2fs -j -b 1024 /dev/whatever
```

----------

## XenoTerraCide

I was curious... because I't more of a dumpe2fs being a cosmetic pain and I don't see anything that would be useful in this situation, for running dumpe2fs as opposed to tune2fs -l. I was hoping to learn something... but maybe there's nothing to learn? I do have a question though is there an advantage to using larger block sizes as opposed to smaller ones? because I would think smaller is better.

----------

## RuiP

they have the same funcionality. 

tune2fs -l is just an informative mode of runnig tune2fs, a tool that can made changes the parttions. 

dumpe2fs is a tool to give only informationon a partition, but as much as possible. For the case, of course thats was not necessary...

to be picky, it's only need, for the case tune2fs -l /dev/hdxy | grep size

About size, it seems that larger block size gives better performance, but it takes more space. Much more. I have a 1G partition for /usr/portage and with default size 4096 it was 97% full, and block size =1024:

```
/dev/hda6              1059093    198935    806342  20% /usr/portage
```

portage is a very special case of forder tree. Thousands of very small files. It a special case, different of a regular linux root system or /home or /usr that could have all size files on it.

----------

## carpman

 *RuiP wrote:*   

> they have the same funcionality. 
> 
> tune2fs -l is just an informative mode of runnig tune2fs, a tool that can made changes the parttions. 
> 
> dumpe2fs is a tool to give only informationon a partition, but as much as possible. For the case, of course thats was not necessary...
> ...

 

I think in this case reiserfs will be better, data on this partition is not an issue, same reason i have /usr/src as reiserfs.

cheers

----------

## ruben

 *XenoTerraCide wrote:*   

> I have a question for ruben so he can educate me... why did we ask carpmen for the output of dumpe2fs? couldn't we have gotten that information from 
> 
> ```
> tune2fs -l /dev/xyz 
> ```
> ...

 

dump2efs was overkill, the way you mention is clearly better. I was a bit in a hurry and just thought about dumpe2fs.

@carpman:

The example from RuiP makes clear why i asked. Small block size is important for /usr/portage. Still, the numbers from RuiP surprise me a bit though, i know it has a big impact, but block size 1024 and block size 4096.. partition size shouldn't differ more than a factor 4 i'd expect. (i'd expect a factor 4 in the 'worst' case: every file is smaller than a 1024 byte block) Maybe you also changed something else?  (diminished amount of space reserved for root maybe?)

----------

## carpman

 *ruben wrote:*   

> 
> 
> dump2efs was overkill, the way you mention is clearly better. I was a bit in a hurry and just thought about dumpe2fs.
> 
> @carpman:
> ...

 

I just did as guide suggested, should note these were newly created not converted.

I am going to convert to reiserfs by moving data, converting and then moving it back to new FS, i am happy with it on other partitions just seems that when used with /usr/portage (usr/src) it may not be the best way forward.

----------

## RuiP

ruben, you are right. Here is my values with same partition ans different block sizes:

1024:

```
/dev/hda6              1059093    198935    806342  20% /usr/portage
```

2048:

```
/dev/hda6              1059326    313058    692452  32% /usr/portage
```

4096:

```
/dev/hda6              1059360    561948    443596  56% /usr/portage
```

i was just saying from memory, but i decided to experiment with block sizes. 

I remeber now, that my problem was not 99% full, but with a slight smaller partition i don't had enough space for portage. 

It reported free space, but cp -a stop at middle complain about no space. 

If i resize my partition to lower size then 1000000 and use default block size it will stop cp process copying most of files and report a no free space, but df and du will report yet a lot of free space !!

Even with that size if i make a new temp folder inside it and try co copy all /usr/portage/app-office to /usr/portage/temp it will report no free space enough altough it says it has 443596 available !!??

edit: oh, btw, i use ext2 (without journalizing) and tune2fs -O dir_index, on that partition.

----------

## ruben

 *RuiP wrote:*   

> If i resize my partition to lower size then 1000000 and use default block size it will stop cp process copying most of files and report a no free space, but df and du will report yet a lot of free space !!
> 
> Even with that size if i make a new temp folder inside it and try co copy all /usr/portage/app-office to /usr/portage/temp it will report no free space enough altough it says it has 443596 available !!??

 

The numbers you get now are much more credible, but still show that it has a very big impact.

For what you mention about the free space, i can think about only one thing, and that's that you might run out of inodes when you make the partition smaller than 1000000. You could check that for sure with "df -i". One inode is needed for each directory and file. I don't know how mkfs.ext2 computes the default number of inodes for a partition, but say you have a block size of 4096 and all files fit in one block, then you'd want certainly as much inodes as the number of blocks you'll have. In general though, that'd be a waste of space, since typically most files will contain multiple blocks.

I just checked my root file system (ext3, but it's the same for ext2), and from the output of tune2fs:

```
Inode count:              917504

Block count:              1835008

Reserved block count:     18350

Free blocks:              115966

Free inodes:              613709

```

So.. it seems to have half as much inodes as blocks. And as you can see, I still have plenty of inodes left. But that'd mean that with tiny files, it might be worth it to increase the number of inodes when creating the file system (you can' change it afterwards).

I just check "df -i" on the same partition, and it gives me this:

```
Filesystem            Inodes   IUsed   IFree IUse% Mounted on

/dev/hda4             917504  236996  680508   26% /

```

I wonder why it reports more inodes still available than tune2fs.

----------

## XenoTerraCide

hey codergeek can you add a part about the tune2fs -l in your howto?

----------

## codergeek42

 *XenoTerraCide wrote:*   

> hey codergeek can you add a part about the tune2fs -l in your howto?

 Good idea. I've added that to the end of my guide. Thanks.   :Smile: 

----------

## RuiP

Hello ruben,

you were right, i run out of free inodes. 

My results with the 1G partition with -b 1024 default number of inodes:

```
Inode count:              135168

Block count:              1076320

Reserved block count:     53816

Free blocks:              859976

Free inodes:              1952
```

so with block size of 4096 it should have even less... 

My previous experiments with smaller partition sizes going always to out of space when space was detected by df was more certain caused by this.

Here is the new results with default block size (4096) and inode count raised by 50%, on same partition:

```
Inode count:              202752

Block count:              269080

Reserved block count:     13454

Free blocks:              122109

Free inodes:              69536
```

that is 35x more free inodes with block size 4x higher!

Thanks for your suggestion.

I wander if there is some penalty for raising the inode numbers?...

----------

## wrc1944

Thought I'd post a link to an update of the LinuxGazette File system tests- I was one of people who emailed Justin and asked him to re-run his great tests with a new kernel (He used 2.6.14.4 for these). Apparently, I guess he used a default ext3, not using the dir_index and data=writeback (or journal_data) modes. I just changed all my Gentoo boxes to ext3 from reiserfs (with the ext3 tweaks), so I'd really be interested in having his tests run with those ext3 options, and see if there is significant improvement. 

http://linuxgazette.net/122/piszcz.html

A quick perusal of these new tests indicate that:

On test 001 (touch 10,000 files) ext2/3 lag way behind. (perhaps the default commit interval of 5 seconds is the cause- mine is set to 600 seconds).

On test 004 (make 10,000 directories), again ext2/3 lag way behind.

Table for test 015 (split a 10mb file into 1000/1024/2048/4096/8192 size byte pieces (fixed).  (Graph makes differences much more striking).

                                                                                      ext2   ext3   JFS      R3    R4      XFS   

015  Split 10M File into 1000 Byte Pieces  57.26  57.77  2.99  4.35  2.95  4.87

016  Split 10M File into 1024 Byte Pieces  28.73  28.97  2.24  4.04  2.61  4.01

017  Split 10M File into 2048 Byte Pieces    7.02   6.98   1.39  2.26  1.55  1.95

018  Split 10M File into 4096 Byte Pieces    1.85   1.83   0.67  1.05  0.99  0.98

019  Split 10M File into 8192 Byte Pieces    0.58   0.58   0.36  0.56  0.62  0.57

At 4096 and 8192  bytes, ext2/3 essentially equal all the others, but drastically lag performance-wise below that size (maybe a block size tweak for certain circumstances would improve this?)

On the 18 other tests, ext2/3 equals or beats all the other file systems. I'm assuming this is with ext3 defaults (for what they're worth). All 21 tests are also listed as cpu usage. Unfortunately, there was no very small file test, and I still think reiserfs might be significantly better for a dedicated /usr/portage partition, and I'll probably set that up next on all my systems

More FS info at Gentoo the slow down thread.

https://forums.gentoo.org/viewtopic-p-3016647.html#3016647

----------

## wrc1944

I just ran across this brand new 2.6.15 multiple block allocation to current ext3 patch. Anyone else know more about this, or have any experiences to share?

http://lwn.net/Articles/167266/

I just had compiled a new 2.6.15-ck1 kernel, but this ext3 patch looked so good after googling up on what they are talking about, I had to try it. I noticed it had been included in 2.6.15-mm3, so I went ahead and tried mm3.

It definitely seems to have snappier desktop responsiveness than ck1 does, and ck1 is great, too. No  benchmarks from my  systems- just my subjective impressions, but I'm not one to hype up a placebo effect. Wish I had first tried recompiling ck1 with the ext3 patch added, so I'd really know if it wasn't just the mm3 stuff instead of the ext3 multiple block allocation patch.

I will definitely (try to) incorporate it into all kernels I build that don't have it from now on.

----------

## ruben

 *RuiP wrote:*   

> I wander if there is some penalty for raising the inode numbers?...

 

I think the only penalty is a bit more space "wasted" on the inodes.

 *wrc1944 wrote:*   

> Thought I'd post a link to an update of the LinuxGazette File system tests...

 

It would be interesting to see the same benchmarks but with the "dir_index" optimisation. Tests 1, 4 & 15 all indicate that creating thousands of files in 1 directory is expensive. I think that creating a new file in a directory becomes more expensive as there are more files already in that directory.  With the "dir_index" option, this should become cheaper.  The question is whether this will make other operations more expensive.

----------

## XenoTerraCide

I have new news on how much faster putting /usr/portage in it's own partition with a block size 1024, on ext3 with data=journal, and dir_index, as opposed to part of the / filesystem with 4096, same options. 

I have a laptop with a 4200 RPM HD. the HD is so slow it decrease's the computer's performance. but it's the one with /usr/portage in it's own dir. my desktop has a 7200 rpm sata. started within second's of each other the laptop beat the desktop on emerge sync by minutes. I didn't time it but I was amased. however I will say the laptop has a faster processor by far (athlon-xp 22000 vs athlon64 3200) so that may have a lot to do with it... depending on how much the processor is used to build the cache.

----------

## wrc1944

XenoTerraCide,

Thanks for the info. Is this partition actually just the usr/portage directory itself, or the entire /usr on the partition, and what size (GB) is it?

I just moved my entire portage tree to a 3GB reiserfs partition for small file performance (portage directory only on /mnt/portage, hda6), and also /var and /tmp to their own partitions (ext3, dir_index, data=writeback, 4096). I notice a good improvement also.

In your case, I guess it's hard to figure how much effect slow drive/fast 64bit cpu/board vs. faster drive/slower 32bit cpu/board has.

I'll have to investigate the 1024 block size more as related to partition size.  I guess that  would be good for really small files, but what about the much larger files in distfiles?    

I guess the question would be does the 1024 block and ext3 options equal or closely approach reiserfs small file performance?

Another interesting point: IMO there's no real need for journaling on a dedicated portage partition, so why not disable it and save the overhead, either on ext3 or reiserfs?  Seems like that would also help performance.

----------

## XenoTerraCide

I think reiser is faster. but I had files small ones disappear in reiser so I went with ext3. and it's just /usr/portage. 2.5G Partion. du says 1.2 G is being used that's an improvement from 2.1 on my desktop.

----------

## RuiP

Hi, guys.

I'm in the feeling that i induce some of you to wrong conclusions. 

The suggestion to go for smaller blocks was to get more space or even space enough. Not to improve performance.

In theory, i think, larger block size should improve performance. 

Anyway, as ruben so well pointed, raising the numbers of inodes will allow for a much better use of space with the default block size (for portage tree of course).

XenoTerraCide, you say your /usr/partition is 1.2 of 2.5G. That's strange. Do you have /distfiles dir inside /usr/portage?

That's is, imho, one of the major nonsenses of portage design. Mix a dir with large files with dir with thousand miscroscopic files is insane!

If you move /distfile to another partition (i use one with xfs) and make a link from /usr/portage or change the variable path at /etc/make.conf will result in a much less fragmented file system and you could go back to the 4096 block size and use a partition of something small as 0.6G.

mine is 

```
Filesystem    Size  Used  Avail  Use% Mounted on

/dev/hda6     713M  553M  124M  82% /usr/portage
```

About benchmark performance, emerge sync is probabily the worst think one could try. And one time test is meaningless. It depends on too much things. Server/mirror speed, connection/net speed, state of portage trees (older => slower), number of updates available (weekends =>faster =>developers need to rest  :Smile: ), the random screensaver that starts in the middle of resync was the harder cpu eater or the lighter, etc...

For a while i take notes of times, to make some statistic, till i realize i was being stupid. 

emerge.log has all information i need, for long times. 

I have logs since september 04. So i do a quick spreadsheet.

Jan 05 till Ago 05 i used reiserfs portage inside system with only a separate partition for distfile (and the usual /boot and /home too, irrelevant for this). I sync 155 times

from Sep 05 till Dec 05 i use a separated /usr/portage, ext2 (no journalizing) with block size 1024 and dir_index option. Sync 55 times.

Here are my average results:

```
resiserfs: 2mn 47sec.

ext2: 2mn 11sec.
```

I don't think that small difference has any meanning. Equal speed.

(in a month or two i'll redo with my new ext2 block size=4096. I don't expect much diference anyway...)

Since i was testing something that could have huge flutuations, but could easily been simulated under better controlled conditions, i tried another test. emerge sync is just a rsync of files over net plus extra resync with metadata cache. 

So i do a overforced test, resync a full tree of an existing portage on my original partition to several small partitions with diferent files systems. 

Do it 3 times after reboots, (to avoid linux caches the files tree)

Results (averages):

```
|reiserfs |  2m 30s  | 

|ext3     |  2m 22s  |

|ext2     |  1m 25s  |

|xfs     |  4m 10s  |

```

Again reiserfs and ext3 take more or less the same, ext2 almost 1/2 time, and xfs  with no surprises shows that it's not a good choice  for the case. 

Hope this is of any use, or at least interesting.

edit: oops, i typed dates contractions in portuguese, corrected now.Last edited by RuiP on Thu Mar 23, 2006 2:47 pm; edited 2 times in total

----------

## codergeek42

 *wrc1944 wrote:*   

> Another interesting point: IMO there's no real need for journaling on a dedicated portage partition, so why not disable it and save the overhead, either on ext3 or reiserfs?  Seems like that would also help performance.

 Firstly, the point I mentioned was tha full journalling is supposed to, in fact, increase performance on such partitions which have a lot of virtuallty simultaneous reads/writes. Secondly, there is no way that I am aware of to disable journalling in ReiserFS.  :Smile: 

----------

## XenoTerraCide

no putting portage in it's own partition is supposed to increase performance. the 1024 was to save space but I wasn't sure how much.

----------

## RuiP

 *XenoTerraCide wrote:*   

> no putting portage in it's own partition is supposed to increase performance. the 1024 was to save space but I wasn't sure how much.

 

Yes, but what i was point is that you seems to have a portage tree with 1.2G. That's too big for just the portage. You probabilly have your /distfile inside /usr/portage. 

The increase in performance will come from not fragmented and well used space optimization on partition, wich is hard to obtain from mixed  very small files with large tar.gz and tar.bz2... Move distfile will give you much space then block size change, avoid fragmentation and possibility to choose a diferente filesystem for that partition if you want to try that.

----------

## wrc1944

codergeek42, 

After you convinced me to go ext3 on  all my Gentoo boxes / partitions, I first tried full journalling for a week or two, then tried data=writeback. I honestly can't discern any difference- both are really good. I guess the key is if one's partitions   have a lot of simultaneous reads/writes going on. I might switch back.

BTW, I've been fooling around with the new multiple block allocation ext3 patches. I compiled a new mm3 kernel (which already has them), and also applied them to  2.6.15-ck1 with no problem, and just finished applying them to the new 2.6.15-nitro1 kernel (haven't booted into it yet- had to fix one reject file, and 2 other rejects  appeared correct when I looked at them- anyway, nitro1 compiled fine after I added the new ext3 stuff).

Have you ever tried XFS tuned for normal size files?  I was reading on that, and it seemed interesting- might test it when I have time.

----------

## codergeek42

 *wrc1944 wrote:*   

> Have you ever tried XFS tuned for normal size files?  I was reading on that, and it seemed interesting- might test it when I have time.

 To be honest, i've never tried XFS for any of my partitions. However, I may give it a try at some point in time on my music/videos storage partition to see how it fares against my current dir_index+journal_data ext3  :Smile: 

----------

## wrc1944

codergeek42,

 I'll probably go back to dir_index+journal_data ext3 again as opposed to  dir_index+ writeback, but my main uncertainty still remains- do I have enough simultaneous reads/writes going on to make it better than writeback? 

I guess I'll really need to do my own set of benchmarks to really know anything other than subjective impressions- and I guess the benchmarks are also suspect, depending on which ones you use, and what your real-world computer usage is.

Also- just booted into the new nitro with the added ext3 multiple block allocation ext3 patches included.   Works fine- as good as any other 2.6.15-xxx I've played with. Guess I should set up a better testing environment  where I can really screw around (not on my main Gentoo box) to make any valid comparisons.

I'm still patiently waiting for a new 2.6.15-archck to test these ext3 patches on- been running archck for a while as my favorite kernel.

----------

## XenoTerraCide

yeah distfiles is in there and I get what ur saying. unfortunately repartitioning is somewhat of a pain. is it not? to resize and add a partition. especially because my extended partition is hda2. and I'm not gonna repartition at this time.

----------

## wrc1944

Yeah- just how far can we Gentooists go? Maybe it's time for a dedicated tuned  XFS partition for distfiles, and another reiserfs partition for the rest of the portage tree. At some point,  it surely becomes completely preposterous (unless serious brain exercise is NOT the primary objective).   At 61, in my case, it is.   :Very Happy: 

At this point, repartioning  a bunch of Gentoo boxes is a recreational exercise!  :Very Happy: 

----------

## XenoTerraCide

I have a question at what point did the ext3 thread become the filesystem of choice threadhttps://forums.gentoo.org/viewtopic.php?p=2965333#2965333?

----------

## wrc1944

XenoTerraCide,

 I'm sure you know all this stuff, and they are talking about the same thing- but  for others I would think it was because 3-5 years ago, when people first got into Linux, they discovered the default ext2/3 performance was  preposterously slow, and they subsequentcely tried reiserfs and saw a serious desktop improvement (as I did temporarily), they switched. 

In my case, I was really ignorant, but later on as FS evolved, and thanks to my education in the Gentoo forums and googling, I became wiser about the supposedly non-existent linux fragmentation stuff,   especially with reiserfs.   :Wink:  The fact is,  every file system fragments- it's just a matter of degree.

As  hopefully your own research will teach you, there is no ONE best FS for all circumstances, or  distros, or a file system "of choice" to cover all things, and make it simple.

----------

## XenoTerraCide

I agree and although these fs fragment less than say fat32 this is the ext3 tips discussion thread it's really not the place to compare and contrast reiserfs, xfs, jfs, blah blah blah... but then again this isn't my thread. maybe I should let codegeeker decide if he thinks talking about reiser and xfs and what have you is off topic. so I'm gonna but out till we get back to ext3.

----------

## codergeek42

 *XenoTerraCide wrote:*   

> I agree and although these fs fragment less than say fat32 this is the ext3 tips discussion thread it's really not the place to compare and contrast reiserfs, xfs, jfs, blah blah blah... but then again this isn't my thread. maybe I should let codegeeker decide if he thinks talking about reiser and xfs and what have you is off topic. so I'm gonna but out till we get back to ext3.

 Thanks, Xeno. 

If you want to discuss this stuff, please do so in the filesystem comparison thread linked to earlier. Thanks.  :Smile: 

----------

## wrc1944

Sorry I went off on a tangent- I agree, and I'll try and keep on a specific topic for the thread I'm posting to. Hope I didn't cause any confusion  :Embarassed: 

----------

## XenoTerraCide

NP codegeeker.

----------

## Drysh

codergeek42,

Something to add to the original your tips. I just discovered this, and it's a very god tool.

Making your critical files safe (and setting other special cases)

Don't you hate when you acidently delete a critical file? The ext3 filesystem has a tool to prevent that: attributes. To see the current attribute of a file use "lsattr", to change it use "chattr". To prevent a file from being deleted, you need to set the "i" attribute. Notice that only the superuser may set (and unset) this attribute. Take a look at what the attributes may do:

```
man chattr

man lsattr
```

Notice that:

- To prevent a directory from being deleted, you shouldn't set the "+i" attribute of the directory (that will set all files as undeletable). Create a file in the directory and set it as undeletable, that will prevent the directory itself from being deleted.

```
touch .keep

chattr +i .keep
```

- If you mark a directory for append only (+a), you will be able to create new files, but won't be able to delete them. All files created will also be marked append only (+a).

[end of my tip]

Since you understand much better than me the journal options, please explain why would someone set only part of the filesystem as full journal (+j attribute). You could also take a look at the "+T", it will set the file to the top of the Orlov block allocator. I don't have the knowledge to comment on those.

Another thing I see as a nice application of the attributes is the attributes is to use "+D" (dirsync) and "+S" (sync) combined with a modified sync rate for the whole fs (I was thinking about comit=30 (up to 600 may be good: 10 minutes, more is insane)). I'll have to do some tests: ideas for where to sync are welcome.  :Smile: 

BTW: The only other useful attribute is "+d" that excludes a file from "dump". The rest is either not implemented or is for internal use.

----------

## codergeek42

Drysh: 

I've not played with the extended attribute and ACL features of Ext2/Ext3, so I don't think it would be wise of me to give such advice on that of which I have very little knowledge. I just resumed classes, so hopefully I'll get a chance soon to learn more these features and see if I can add some of it to my tips. 

Thanks for the suggestion, though!   :Very Happy: 

----------

## carpman

Hello, what would be best settings for /var/tmp with ccache in /var/tmp/.ccache or would it be better to use reiserfs?

This will be on lvm2.

cheers

----------

## codergeek42

 *carpman wrote:*   

> Hello, what would be best settings for /var/tmp with ccache in /var/tmp/.ccache or would it be better to use reiserfs?
> 
> This will be on lvm2.
> 
> cheers

 Considering that such a partition is read from and written to a lot, generally, I think it would be wise to put Ext3 with full journalling and dir_index as my tips show.  :Smile: 

----------

## carpman

 *codergeek42 wrote:*   

>  *carpman wrote:*   Hello, what would be best settings for /var/tmp with ccache in /var/tmp/.ccache or would it be better to use reiserfs?
> 
> This will be on lvm2.
> 
> cheers Considering that such a partition is read from and written to a lot, generally, I think it would be wise to put Ext3 with full journalling and dir_index as my tips show. 

 

cheers, that is what i went with in the end.

----------

## wrc1944

codergeek42 wrote *Quote:*   

> Considering that such a partition is read from and written to a lot, generally, I think it would be wise to put Ext3 with full journalling and dir_index as my tips show.

  I may be misunderstanding this, so my conclusion might be wrong, but I thought the Dan Robbins article about data_journal being the performance king was only where much reading and writing is to be done simultaneously. 

If that's true, wouldn't we need to know to which partitions this actually applies, and not just which partitions are simply "read from and written to a lot" ? This seems critical to me.

When I first switched to ext3 and used all the tuning tweaks, I did indeed use data_jounal, but then after a week or two since I didn't really know which partions were actually engaged in a lot of simultaneous read/write activity, I decided data=writeback would probably be the better performance choice. Accordingly, I changed them all to data=writeback, and can't really say I've noticed any difference.

In other words, I guess we (at least I do) need to know which partitions and in what cases there is  much reading and writing being done simultaneously.  In my case, I generally do most of my compiling at night, with no other activity whatsoever- would this make a difference?

Another point- how do we actually define and/or know what activities are indeed consisting of so-called "simultaneous read/write" activity?  For example, would sequentially extracting lots of tar.bz2 files qualify? Would a "data only" storage partition where data is only being saved to (or read from) at any given time NOT qualify?

 I just don't really know, but it seems to me this knowledge would be an important factor to completely understand before we automatically set all our partitions to data_journal, especially considering most other info sources suggest data=writeback is best for performance.

Any thoughts or corrections on these matters is greatly appreciated.

----------

## XenoTerraCide

data=writeback I think is very similar to data=journal.  accept that it commits the information to the drive before the journal. where as data=journal commit's it to the journal first. once again I may be wrong... using data writeback could 'cause problems if you had a power loss or had to do a hard reset.

----------

## carpman

 *XenoTerraCide wrote:*   

> data=writeback I think is very similar to data=journal.  accept that it commits the information to the drive before the journal. where as data=journal commit's it to the journal first. once again I may be wrong... using data writeback could 'cause problems if you had a power loss or had to do a hard reset.

 

I alway hear this phrase: "could cause problems if you had a power loss or had to do a hard reset".

To be honest i don't care if /usr/portage /var/tmp and other non data criticle partition have power outage, what i would like is performance so if we can decide which are best suited to data=journal and data=writeback this would be helpful in context of this thread.

cheers

----------

## XenoTerraCide

you mean this part of the thread... honestly I couldn't tell you... I doubt there really is much of a difference... if you read back on this thread... the difference was discussed earlier... perhaps you should do that. or you could test them both for us and bring us benchmarks?

----------

## wrc1944

Two days ago I switched from ext3 with data=writeback to ext2 with dir_indexing for my /mnt/portage partition, as suggested by 6thpink on another related thread. I'm also considering it for my /var and /tmp partitions. The reasoning is that even having a journalling file system with it's inherent overhead might be overkill and really unnecessary for such specialized partitions, and it also avoids the "data=journal or data=writeback" choice dilemma. Given that ext2/3 benchmarks seem to indicate essentially equal performance, this seems reasonable. I guess I'll know more after observing portage activity for a few more emerges. 

I have also placed the distfiles directory with larger files on another ext3 dir_index data=writeback partition, as it seems that in that directory files are either being written to, read from, or deleted, but not simultaneously. Of course we are talking about extremely fine tuning of the ext2/3 file system as related to Gentoo usage here, and shouldn't expect drastic performance differences, one way or the other, especially on a modern fast system. That being said, I like to think every little % of performance increase helps.

----------

## XenoTerraCide

I'm trying to put /usr/portage and /usr/portage/distfiles in their own partitions... however I keep getting out of space errors df show's otherwise... 

```
Filesystem            Size  Used Avail Use% Mounted on

/dev/sda3              21G  8.1G   12G  42% /

udev                  379M  236K  378M   1% /dev

/dev/sda6             942M  176M  719M  20% /usr/portage

/dev/sda7             1.9G   33M  1.8G   2% /usr/portage/distfiles

/dev/sda4              78G   52G   27G  66% /mnt/winntfs

/dev/hda1              38G  9.9G   28G  27% /mnt/windows

shm                   379M     0  379M   0% /dev/shm

```

 I made a seperate thread in portage and programming... I don't get it... https://forums.gentoo.org/viewtopic-t-424151.html I'm hoping to find out why this is happening. I haven't deleted old portage yet.  :Very Happy: . I just mounted the the new partition's over it. oh /usr portage is ext2 blocksize 1024 and distfiles is ext3 blocksize 4096.

----------

## RuiP

see suggestion by ruben and my posts on previous page.

----------

## Drysh

 *wrc1944 wrote:*   

> I like to think every little % of performance increase helps.

 

One word: tmpfs

... for /tmp (and if you are feeling lucky /var/tmp). I never managed to make it work in /var/tmp when emerging huge packages, but for everyday use it works well. For /tmp it's perfect.

It will use you memory to mount the partition (RAM and swap). It's very fast, and it manages very well the memory, so you won't run out of memory (make sure you have a lot of swap). My system here has 1GB RAM (+ 4GB swap), and I never use more than half of the RAM, even if I copy 3GB to /tmp. I limited /tmp to 3GB: using 4GB for swap, I garantee to have at least 2GB free (counting swap and RAM).

----------

## XenoTerraCide

yeah... my problem is inodes...

----------

## XenoTerraCide

k I got the problem fixed 

```
mke2fs -b 1024 -i 1024
```

 df -h

```
Filesystem            Size  Used Avail Use% Mounted on

/dev/sda3              21G  8.1G   12G  42% /

udev                  379M  236K  378M   1% /dev

/dev/sda4              78G   52G   27G  66% /mnt/winntfs

/dev/hda1              38G  9.9G   28G  27% /mnt/windows

shm                   379M     0  379M   0% /dev/shm

/dev/sda6             838M  197M  593M  25% /usr/portage

/dev/sda7             1.9G   33M  1.8G   2% /usr/portage/distfiles

```

df -hi

```
Filesystem            Inodes   IUsed   IFree IUse% Mounted on

/dev/sda3               2.7M    444K    2.2M   17% /

udev                     95K    5.2K     90K    6% /dev

/dev/sda4                72K     18K     55K   24% /mnt/winntfs

/dev/hda1                59K     59K      10  100% /mnt/windows

shm                      95K       1     95K    1% /dev/shm

/dev/sda6               958K    132K    826K   14% /usr/portage

/dev/sda7               239K      10    239K    1% /usr/portage/distfiles

```

 more info coming soon...

----------

## XenoTerraCide

one question should I have used a bigger number for -i? like 2048?

----------

## XenoTerraCide

is there a way to resize ext2/3 partitions?

----------

## codergeek42

 *XenoTerraCide wrote:*   

> is there a way to resize ext2/3 partitions?

 Yes. You can add the option while creating the fileystem:

```
# /sbin/mkfs.ext3 -O dir_index -E resize=max_online_resize /dev/hdXY
```

This allows the filesystem to grow to a size of max_online_resize blocks. (Check the mke2fs(8) man page for more information.)

Alternatively, you can use something like LVM to manage the partitioning. I think parted can do it too, but I'm not certain.

----------

## wrc1944

Well, I've run into a strange bothersome ext3 mystery.

Suddenly (I haven't noticed this before), I get this on my ext3 tuned partitions (changed from reiserfs 3-4 weeks ago) at the end of dmesg:

JBD: barrier-based sync failed on hda7 - disabling barriers

JBD: barrier-based sync failed on hda5 - disabling barriers

JBD: barrier-based sync failed on hda3 - disabling barriers

spurious 8259A interrupt: IRQ7.

-----------------------------------------------------------------------------------

I searched the forum and googled everywhere for many hours trying to figure this out, and came up with some info, but no mention of what this really means in terms of "should I use the ext3 barrier=1 mount option in fstab to re-enable," or not. I need to find out if there are any serious consequences to enabling barrier again, because I feel my system must be disabling it for some important reason I can't seem to understand. I don't want to modprobe JBD, or add the fstab barrier option until I really know what's going on. lsmod has never shown JBD.

On a similar box set up the same way (except it's a gcc-4.1 rig), this doesn't happen.  The only difference is the commit is default 5 with data=journal, and on the box with the above disabling barrier lines commit=600, and data=writeback. On the disabling barriers box there is also an ext2 /mnt/portage directory, and another old reiserfs data partition I didn't change yet.

I guess I'm asking experienced ext3 users what all this actually means- can it be ignored, or is it something serious I need to contend with? My system seems to function normally, but I subjectively feel it has become less responsive than before I noticed this.

When I booted the next time, there was a forced check, and it corrected many inodes on hda3 (my / partition), and the JBD: barrier-based sync failed on hda3 - disabling barriers line wasn't in dmesg for 3-4 boots.  Now it has returned on hda3, along with one inode fix on / at last boot.

I've been finding things like this on the internet:

 *Quote:*   

> You are running a 2.6 kernel. Is this something new? Or did you 
> 
>  recently change the filesystem type on hdb1? This jbd message can be 
> 
>  considered a warning. Does an lsmod show jbd? If not, modprobing jbd 
> ...

 

At this point, I'm not sure I understand what this means  :Confused: 

----------

## wrc1944

Now I'm thinking this is kernel related. I currently have 5 kernels on this box, compiled in this order (first to last, date-wise).   

2.6.13-gvivid

2.6.15-mm3

2.6.15-ext3   (my own patched vanilla 2.6.15 with individually applied ext3 multiple block allocation patches)

2.6.15-nitro2

2.6.15-nitro3

My above problems were all when booted with nitro3 (maybe I didn't notice with nitro2).

Booted with  nitro2, only hda5 and hda7 had the barriers disabled.

Booted with 2.6.15-ext3, at boot, once again 1 inode needed fixing, and the message afterwards was reboot needed. After reboot, dmesg reported no disabled barriers.

After the 2.6.15-ext3 boot (all OK), I rebooted with 2.6.15-mm3 (which also has the ext3 multi-block patches), all is also OK- no disabled barriers. (Of course the "OK" assumes any of this even matters).  :Confused: 

I booted in reverse order to test, last to first.

----------------------------------------------------------------------------------

Nitro3 has the:

# fix for the ext3 multiblock patch 

 27_ext3-get-blocks-maping-multiple-blocks-at-a-once-ext3_getblk-fix.patch 

Nitro2 has:

ext3-get-blocks-adjust-accounting-info-in-build-fix.patch 

    ext3-get-blocks-adjust-accounting-info-in.patch 

    ext3-get-blocks-adjust-reservation-window-size-for.patch 

    ext3-get-blocks-maping-multiple-blocks-at-a-once-vs-ext3_readdir-use-generic-readahead.patch 

    ext3-get-blocks-maping-multiple-blocks-at-a-once.patch 

    ext3-get-blocks-multiple-block-allocation.patch 

    ext3-get-blocks-support-multiple-blocks-allocation-in.patch 

    ext3_readdir-use-generic-readahead.patch 

My own ext3 multiblock patched kernel had all of ext3 patches included in  nitro2 (and IIRC one more from mm3, broken out), but I did that a day or two before nitro2 was posted.

----------------------------------------------------------------------------------------

Considering both of my last posts on this ext3 stuff, has anyone got opinions, conclusions, advice, on exactly why this would be happening? I built this box, and have compiled many hundreds of testing kernels, so I doubt it's any config anomaly I missed. I'm thinking the ext3 multiblock stuff is OK, and it's something in the Nitro patch set that conflicts???

Just to be thorough, here's my dumpe2fs for hda3, /, and fstab. 

One thing I'm rethinking is the wisdom of disabling the ext3 boot checks with tune2fs -c 0 -i 0 /dev/hdXY. If it means slower boot time to insure the apparently often occurring ext3 inode problems are fixed immediately, so be it- I can live with that.( I hope this post is considered "on topic," as it is all about ext3 problems).  :Smile: 

/dev/hda1		/boot		ext3		noauto,noatime		1 2

/dev/hda3		/		ext3    	noatime,nodiratime	0 1

/dev/hda2		none		swap		sw			0 0

/dev/hda5               /home           ext3            noatime,nodiratime,commit=600      0 2

/dev/hda6               /mnt/portage    ext2            noatime,nodiratime      0 2

/dev/hda7               /mnt/rwstorage  ext3            noatime,nodiratime,commit=600      0 2

/dev/hda8               /mnt/dump       ext3            noatime,nodiratime,commit=600      0 2

/dev/hda9               /mnt/data2      reiserfs        notail,noatime,user     0 2

/dev/sda1               /mnt/sda1       vfat            auto,rw,user          0 0

/dev/sda5               /mnt/sda5       vfat            auto,rw,user          0 0

/dev/sda6               /mnt/sda6       vfat            auto,rw,user          0 0

/dev/sda7               /mnt/sda7       vfat            auto,rw,user          0 0

-----------------------------------------------------------------

mymachine wrc # dumpe2fs /dev/hda3

dumpe2fs 1.38 (30-Jun-2005)

Filesystem volume name:   <none>

Last mounted on:          <not available>

Filesystem UUID:          484ed04b-a858-425c-a917-0a25a1b28990

Filesystem magic number:  0xEF53

Filesystem revision #:    1 (dynamic)

Filesystem features:      has_journal dir_index filetype needs_recovery sparse_super

Default mount options:    journal_data_writeback

Filesystem state:         clean

Errors behavior:          Continue

Filesystem OS type:       Linux

Inode count:              1221600

Block count:              2441376

Reserved block count:     122068

Free blocks:              1598676

Free inodes:              1000005

First block:              0

Block size:               4096

Fragment size:            4096

Blocks per group:         32768

Fragments per group:      32768

Inodes per group:         16288

Inode blocks per group:   509

Filesystem created:       Sat Dec 10 09:35:31 2005

Last mount time:          Fri Jan 20 15:13:34 2006

Last write time:          Fri Jan 20 15:13:34 2006

Mount count:              2

Maximum mount count:      3

Last checked:             Fri Jan 20 14:28:45 2006

Check interval:           0 (<none>)

Reserved blocks uid:      0 (user root)

Reserved blocks gid:      0 (group root)

First inode:              11

Inode size:               128

Journal inode:            8

Default directory hash:   tea

Directory Hash Seed:      36681716-6609-4e78-a3f2-0699a3adb1e9

Journal backup:           inode blocks

 EDIT: (forgot)

mymachine wrc # df -hi

Filesystem            Inodes   IUsed   IFree IUse% Mounted on

/dev/hda3               1.2M    217K    977K   19% /

udev                    111K     883    110K    1% /dev

cachedir                1.2M    217K    977K   19% /lib/splash/cache

/dev/hda5               597K    119K    478K   20% /home

/dev/hda6               1.4M    132K    1.3M   10% /mnt/portage

/dev/hda7               358K     65K    294K   18% /mnt/rwstorage

/dev/hda8               478K    131K    347K   28% /mnt/dump

/dev/hda9                  0       0       0    -  /mnt/data2

/dev/sda1                  0       0       0    -  /mnt/sda1

/dev/sda5                  0       0       0    -  /mnt/sda5

/dev/sda6                  0       0       0    -  /mnt/sda6

/dev/sda7                  0       0       0    -  /mnt/sda7

none                    111K       1    111K    1% /dev/shm

----------

## wrc1944

I just ran across this little gem- it's a disk allocation/fragmentaion viewer for ext2/3.  DAVL can collect the fragmentation status information for a partition, a directory, and a file, regardless of whether filesystem is mounted, and can output its text data or visualize it.

Check out the screen shots- Look Great, with full info in a nice gui! 

http://davl.sourceforge.net/

I looked, but there is no ebuild in portage.

Perhaps somebody experienced in creating ebuilds could whip one up in no time. It requires GTK+1.2 or GTK+2, and looks pretty simple, but I've never created an ebuild before, or I'd do it myself. I'll give it a try (something I need to learn anyway- and this is a good excuse), but it might take me some time to succeeed. I'm pretty sure ext2/3 users will likely want this before I can create one.

----------

## PaulBredbury

I've submitted an ebuild for davl to bugzilla  :Smile: 

----------

## wrc1944

Many thanks!

I just looked at your ebuild, and amazingly it seems like even I could have eventually figured it out. 

When I first extracted the davl tarball, and looked at the Gentoo ebuild howto and wiki documentation,  and at the davl makefiles and read me, creating an ebuild appeared to be pretty complicated procedure. I couldn't figure out how I could  decide which of the ebuild options davl actually needed, and looking at yours, it appears davl needs virtually none of them, and is essentially just a "bare-bones" ebuild. 

I don't want to go OT here (ext3), so I'll leave all my ebuild related questions except one for another thread. My one question is:

Is this ebuild going to also modify the kernel so it installs the optional "davl_liveinfo" module mentioned in the davl README, so  that it also reports on mounted partitions?

----------

## PaulBredbury

 *wrc1944 wrote:*   

> Is this ebuild going to also modify the kernel so it installs the optional "davl_liveinfo" module

 

Not in its current state. I think it would need the linux-mod eclass.

----------

## wrc1944

Just tried your ebuild in my overlay- apparently it works great, and does report on a mounted partition, but I don't seem to have the kernel module as described below from the davl README in /lib/modules/my running kernel. 

I build my kernels in /home/wrc/kern/linux-2.6.xx, and always have the /usr/src/linux symlink pointing to the running kernel. The davl path_list seemed Ok to me, but since I have no kernel davl module, I must have done something wrong, or left out a step (beginning at # 4. in the README below), or the ebuild needs more instructions? Also, since I have no davl module, I don't understand how gdavl is apparently working on mounted partitions, and am worried about even running this on a mounted partition.

davl path_list:

KERN_DIR = /lib/modules/$(shell uname -r)/build

BIN_DIR = /usr/local/bin

DRV_DIR = /lib/modules/$(shell uname -r)/kernel/drivers/davl

MAN_DIR = /usr/local/man

GTK_VER = GTK2

# If you want to use GTK1.2 for the GUI tool, comment out the above line,

# and uncomment the below line.

#GTK_VER = GTK1

-------------------------------------------------------------------------------

From the davl README- source section: (I figured with Gentoo, we aren't concerned with all of these steps, but possibly some of them, particularly the KERN_DIR = line  ?).

B. Install from source code

---------------------------

1. Preparing kernel source code.

   (optional for preparing the davl_liveinfo build --- case of Fedora Core 2)

  a. Change to super-user

     $ su

     (input super-user password here)

  b. Download and install the kernel source package.

     # rpm -i kernel-2.6.5-1.358.src.rpm

  c. Extract the kernel source

     # rpmbuild -bp --target i686 /usr/src/redhat/SPECS/kernel-2.6.spec

  d. Move to the kernel source directory

     # cd /usr/src/redhat/BUILD/kernel-2.6.5/linux-2.6.5

  e. Edit Makefile, and modify "EXTRAVERSION" line

     # vi Makefile

       ("EXTRAVERSION = -1.358"

        '-1.358' is part of the "uname -r" command result)

  f. Prepare the kernel source for module build

     # make prepare-all

  g. Change to normal-user

     # exit

2. Download davl-XXX.tar.bz2 from the project page.

   (XXX is version number of davl)

3. Extract archive file.

  $ cd $(WHERE_YOUR_WORK_DIRECTORY)

  $ tar jxvf davl-XXX.tar.bz2

  $ cd davl

4. Modify the macro values in the "path_file" according to your system.

       macro     meaning

       --------  ---------------------------------

     * KERN_DIR  kernel source directory

     * BIN_DIR   executable file install directory

     * DRV_DIR   kernel module install directory

     * MAN_DIR   man page install directory

     * GTK_VER   GTK version (use "GTK1" or "GTK2")

5. Build executable file.

  $ make

  or If you want to build the kernel module too, do next

  $ make WITH_DRV=1

6. Install executable file.

  $ su

  (input super-user password here)

  # make install

  or If you want to install the kernel module too, do next

  # make WITH_DRV=1 install

III. How to use

===============

1. If you installed the kernel module, load the davl_liveinfo module before

   executing cdavl or gdavl.

  # modprobe davl_liveinfo

2. Use cdavl or gdavl. For displaying the usage, use -h option.

----------

## XenoTerraCide

although you've all prolly read this I thought it would be good to post the link. http://www.gentoo.org/doc/en/articles/l-afig-p8.xml it further explain's our ext3 optimization's.

----------

## BeteNoire

I decided to try ext3 tips again. I've created one big partition and made ext3 on it and tuned it.  I've done what is written in the first post of this thread. 

```
Filesystem features:      has_journal dir_index filetype needs_recovery sparse_super

Default mount options:    journal_data
```

One thing is still unacceptable for me : long time when changing directory to that partition in MC/Krusader. Takes about 5-8 seconds!

Partition is 70GB large and stores about 12 tousands files.

What would you advise me to do to decrease access time to this dir/partition?

----------

## XenoTerraCide

I got nothing for your MC problem. wish i could help. 

in further news. I am currently conducting benchmark test of putting /usr/portage and /usr/portage/distfiles in there own partition's vs not doing so.

----------

## XenoTerraCide

all of these test's were done on a freshly wiped /usr/portage directory. The first date is the time the sync command executed the second is the time it finished

in this one  /usr/portage is part of the / filesystem which has a 4096 blocksize and (dir_index,has_journal,data_journal)

```

Thu Jan 26 06:34:01 EST 2006

Thu Jan 26 06:40:03 EST 2006

```

this one was done with a seperate 1GB /usr/portage, and a seperate distfiles directory. blocksize 1024 (dir_index,has_journal,data_journal) (created with -i 1024 "don't want to run out of inodes")

```

Thu Jan 26 06:42:05 EST 2006

Thu Jan 26 06:47:46 EST 2006

```

this is an ext2 test, blocksize 1024 (created with -i 1024) same 1GB partition

```

Thu Jan 26 06:51:21 EST 2006

Thu Jan 26 06:57:00 EST 2006

```

I added dir_index to the same fs as prior test. 

```
 

Thu Jan 26 07:02:55 EST 2006

Thu Jan 26 07:08:36 EST 2006

```

one variable I am unable to control is the speed I get them from the mirror. they all came out pretty close. but I think...  the second test one over the last by a hair. but I'm tired and I'm doing fuzzy math in my head. I may be wrong the numbers are there if anyone want's to do the math.

----------

## RuiP

They all take 5 to 6 minutes, with that average times the variations on seconds are irrelevant (as they are usually in any process that involves network connections). 

The diferences in time are note relevant too, taking in count that you do only one sync for measure something that had huge fluctuations and a lot of variables associated. You should have done measures over a certain period of time, like 1-2 weeks each case (you can check times on emerge.log) and make a estimative of averages. 

Benchmarks are a fastidious and boring thing to do and delicate cases take a lot of time, if one wants some accuracy.

Anyway the main purpose to separate distfiles from portage was avoid fragmentation. The prejudice or beneficts from that could only be measuread on systems after a certain period of time. 

Only moving files to another partition wont affect imediattly fragmentation on that partition.

----------

## carpman

 *XenoTerraCide wrote:*   

> all of these test's were done on a freshly wiped /usr/portage directory. The first date is the time the sync command executed the second is the time it finished
> 
> in this one  /usr/portage is part of the / filesystem which has a 4096 blocksize and (dir_index,has_journal,data_journal)
> 
> ```
> ...

 

You should set up a local sync server, as i already have, this mean you can control the speed of sync as you know it is on local network and will not have had any changes, provided you don't sync master.

----------

## XenoTerraCide

problem is the machine I would use as the server is the one I did the test on. the only other machine I own is a 64-bit laptop with a 4200 rpm HD. it's not really the ideal system for a benchmark test.

----------

## wrc1944

Question for experienced ext3 users:

I just switched over from reiser about a month ago, and left my tuned ext3 / partition set at the default boot check count, and disabled the boot check on my four other ext3 partitions, as recommended on several websites I looked at.

I reboot at least once a day, and when the first check came up on /, it had to fix 20-30 inodes (like "inode 1110604 was 336, should be 328, fixed- reboot"). This had me concerned, and led me to question the wisdom of disabling the boot check, so I enabled it all partitions to check every 3 boots, to see what was going on.

Since then, at nearly every boot, at least one partition has an inode fix or two to be done.  Is this normal ext3 behavior, or is there something wrong with my system causing this?  This never happened with reiserfs (or at least it never was on the bootup screen that I noticed)

----------

## carpman

 *XenoTerraCide wrote:*   

> problem is the machine I would use as the server is the one I did the test on. the only other machine I own is a 64-bit laptop with a 4200 rpm HD. it's not really the ideal system for a benchmark test.

 

use the 64bit a sync server, you know that syncing will then be consistant.

http://gentoo-wiki.com/HOWTO_Local_Rsync_Mirror

----------

## wrc1944

Hmmmm.  No responses on my ext3 question posted above? I thought this would be an easy one for our ext3 gurus. :Wink:  Maybe some excerpts from my var/log/boot.msg log will help. The problems on hda7 this boot are typical, and appear randomly on all the other ext3 partitions nearly every boot.  It's always that the i_blocks are always exactly 8 blocks more than they should be, whatever the Inode is, or on whatever partition they need fixing. I'm very curious as to what this means, and if I need to figure out how to correct it. Any insight is much appreciated.  :Very Happy: 

(hda7 is on /mnt/rwstorage, which contains /var and /tmp, and I just did one little emerge, so it appears activity stimulates the fixes.  On the other hand, hda6 is /mnt/portage, ext2, and had no fixes needed this time)

* Checking root filesystem ...

/dev/hda3: clean, 221607/1221600 files, 842732/2441376 blocks (check after next mount)

* Checking all filesystems ...

/dev/hda1: clean, 352/24480 files, 23472/97744 blocks

/dev/hda5: clean, 121310/610432 files, 606374/1220680 blocks (check after next mount)

/dev/hda6 has been mounted 5 times without being checked, check forced.

/dev/hda6: 134299/1465920 files (0.1% non-contiguous), 255079/1464860 blocks

/dev/hda7 has been mounted 2 times without being checked, check forced.

/dev/hda7: Inode 136520, i_blocks is 232, should be 224.  FIXED.

/dev/hda7: Inode 132551, i_blocks is 216, should be 208.  FIXED.

/dev/hda7: Inode 132522, i_blocks is 456, should be 448.  FIXED.

/dev/hda7: Inode 132569, i_blocks is 144, should be 136.  FIXED.

/dev/hda8: 133997/488640 files (0.1% non-contiguous), 344067/976492 blocks

Reiserfs super block in block 16 on 0x309 of format 3.6 with standard journal

Blocks (total/free): 3173792/1405058 by 4096 bytes

Filesystem is clean

Replaying journal..

Reiserfs journal '/dev/hda9' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..finished

 * Filesystem errors corrected.

  [ !! ]

 * Mounting local filesystems ...

  [ ok ]

----------

## PaulBredbury

 *wrc1944 wrote:*   

> Since then, at nearly every boot, at least one partition has an inode fix or two to be done.  Is this normal ext3 behavior, or is there something wrong with my system causing this?

 

I don't have a huge amount of insight to give. I've been using ext3 exclusively for about 4 years, but I expect that my comments apply to any filesystem:  If, after a clean reboot, the filesystem regularly has "problems", then of course something is wrong. As to exactly what is wrong, and how to fix it, those are extremely difficult questions. It could be hardware or software, or a combination of both  :Confused: 

Recommendation: Backup, reformat, restore. If your instincts suspect the hardware, then buy a new drive - the good news is, they're getting better and cheaper all the time.

----------

## codergeek42

@wrc1944: Did you put grub or another bootloader on the MBR of that partition? That's the only time I've personally ever seen such an error.

----------

## wrc1944

codergeek42 & PaulBredbury,

Thanks for the responses! All my drives are fairly new and have been rock solid with many distros, and I just recently set these partitions up on this box, so this box has just gone through "backup, reformat, restore."  They show no other problems, and run normally. I'm very experienced, and have built many computers and installed Linux and windows many hundreds of times, on countless systems, so I'm virtually certain this is not hardware related. 

There doesn't seem to be any actual problem in the normal operation of the computer, or the fine-tuned Gentoo installation, except that every 2-3 boots one partition or another apparently needs to fix that 8 block "oversized" detected thing.

I'm really becoming convinced this is perfectly normal ext3 behavior (in fact, very beneficial- like early prevention), and I just never noticed it before, since I hadn't used ext3 for years before last month, when I converted everything, and tuned it up.

Grub is on the MBR- I've always done it that way, as well as when I use Lilo, and/or dual-boot.  This particular box is not dual-booted.

Maybe a few other ext3 users wouldn't mind temporarily setting their partitions to force a boot check every 2-3 mounts and see if this ocurrs on their systems?  :Laughing: 

----------

## PaulBredbury

 *wrc1944 wrote:*   

> Maybe a few other ext3 users wouldn't mind temporarily setting their partitions to force a boot check every 2-3 mounts and see if this ocurrs on their systems? 

 

There's no need - my hard drives get checked every few (I think it's 25-ish) reboots by default (which happened a lot when I was experimenting with kernel options), and I didn't have any problems shown.

The cause could be something as trivial as a slight kink in the hard drive cable (i.e. a hardware problem).

----------

## codergeek42

I've recently noticed that the Gentoo Handbook has added a note about using -O dir_index when creating the filesystems if you wish to use Ext3. Thanks, docs team!   :Very Happy: 

----------

## wrc1944

PaulBredbury wrote:

 *Quote:*   

> There's no need - my hard drives get checked every few (I think it's 25-ish) reboots by default (which happened a lot when I was experimenting with kernel options), and I didn't have any problems shown. 
> 
> The cause could be something as trivial as a slight kink in the hard drive cable (i.e. a hardware problem).

 

I tried your suggestion about the cable and replaced it with a new one, but I still have the errors. I also removed the nodiratime option, and reduced commit=600 to commit=60 from all partitions in fstab & grub.conf- still no improvement. Guess I'll try going back to the default 5 second commit for the next boot. These partitions are data=writeback- could that be a factor? I still have no clue as to why the i_block fix is always 8 less for each seemingly random inode, and not some other number.

 Actually, since I noticed this, I haven't hooked up my other "backup" box to check on my other ext3 similarly tuned drives (I only have one monitor), so if that box doesn't exhibit this "problem," maybe it really is as you suggest- a specific hard drive problem, and not ext3 or my tuning options related at all.

One other point: If people have the default 25-ish mount check (mine seems to be 30), it seems to me they probably would never realize they had these inode problems being fixed, unless they happened to be watching the boot screen on the 25th mount of the particular partition. I know in my case that was the only reason I noticed, which prompted me set the checks to every 2 mounts to see if this was happening regularly, and sure enough, it is.

----------

## Massimo B.

Eventhough I have / and /home as ext3 I decided to have a 4GB file, formatted with reiserfs, and mounted as loop to /var. Also /usr/portage I moved to /var/spool/portage. So I have things like portage, ccache, and proxy cache with lots of small files on a reiserfs file. I prefered a file (on an ext3 fs) to be flexible. portage/ccache is running much quicker now.

Question about atime: Is that atime important for a desktop system? I've tried it also for / in fstab, but it's not going to be mounted with 'atime'.

----------

## mudrii

This ext3 Optimisation work with RAID 0 and RAID 1 volume ?

----------

## Massimo B.

That is what I meant about ext3 with Raid0. I think with mirrored Raid1 this problem doesn't exist, because the write process is the same on each drive.

----------

## mudrii

If a stripe size of 64 it should not be a problem thought.

----------

## Sachankara

"journal_data" has worse performance on large files than ReiserFS. Don't use it unless you want worse performance. Try it for yourselves if you don't believe me.

----------

## Ainvar

I have switched to the ext3 with tweaks from reiserfs and have seen increases in a lot of areas except for unpacking large compressed files. This is the only place I have seen a performance hit.

----------

## Xk2c

 *Sachankara wrote:*   

> "journal_data" has worse performance on large files than ReiserFS. Don't use it unless you want worse performance. Try it for yourselves if you don't believe me.

 

Yes you are right. I have tried it and my system is slightly slower.

How can i undo:

```
tune2fs -O has_journal -o journal_data /dev/hdXY
```

and revert to "normal Mode"?

----------

## Xk2c

 *Xk2c wrote:*   

> How can i undo:
> 
> ```
> tune2fs -O has_journal -o journal_data /dev/hdXY
> ```
> ...

 

```
# tune2fs -o ^journal_data /dev/hdXY

# tune2fs -O has_journal -o journal_data_ordered /dev/hdXY
```

Apart from that nice Howto codergeek42. Thanks.    :Very Happy:   :Cool: 

----------

## Massimo B.

Somewhere at redhat I read about benchmarks, that data=journal should give better performance than writeback, especially write processes. I can't find the article anymore.  Here  is the only thing I've found about it.

Moreover with data=journal I notice more cpu load from kjournald.

----------

## get sirius

I notice, from using tune2fs -l, that sparse_super is enabled by default.  From man tune2fs, I see that sparse_super is primarily aimed at partitions filled with large files.  From this, I infer that sparse_super would be most useful in /usr/portage/distfiles.  Has anyone experimented with turning sparse_super off and thereby perhaps making the filesystem more suitable for small files  :Question:  .  Turning this option off might make up some of the apparent performance difference between ext3 and reiser 3.6 on partitions containing large numbers of small files (like /usr/portage, for example).  :Smile: 

If no one has come across tests to measure the performance diff, could anyone suggest a test protocol that would?

----------

## wrc1944

I'm not sure sparse_super has anything to do with the size of an actual file(s) on a partition.

Man tune2fs:

sparse_super 

Limit the number of backup superblocks to save space on large filesystems. 

-s [0|1] 

Turn the sparse super feature off or on. Turning this feature on saves space on really big filesystems. This is the same as using the -O sparse_super option. 

Unless I'm mistaken, I read this to mean it only limits the number of backup superblocks, for a really large partition and it's files, whatever the size and number of files there may be stored on the specific partition.

I don't know what they mean by "large filesystems," other than a filesystem on a very large partition, and they don't specify what that means. I guess it might be 4GB, or 20GB plus- who knows?

----------

## Drysh

sparse_super has nothing to do with file size, but only with partition size. I tested turning it on and off, and I started to notice some change after 100 GB. I don't recall exactly how much. You should set it if you are partitionng the whole disk as a single ext3 (or using a huge disk array), but I don't think it will hurt if you set sparse_super for anything larger than 5 GB.

----------

## get sirius

I see what you (both of you  :Smile:  ) mean - I read it wrong, forgetting the distinction between filesystem and file  :Embarassed: .  Thanks for setting me straight!

----------

## pointers

Hello,

    Yesterday I have shutdown my linux web server after 194 days. We haven't rebooted the server since we have bought and set up the server and when I reboot the server I have seen 24.5% con-contiguous partition.  Read speed was so low and I hadn't realised it yet. Shame on me... I have read some documents Drobbins document and others about fragmentation but 

it is really bad if you have a web server and the only way you can avoid this is to archive entire partition then restore files again on a production server, in fact in some partitions we have 1-2 million image files and if you have 2nd level of subdirectory, archiving is a pain...

    I just would like to ask there will be any way of avoiding fragmentation or to keep the ratio on acceptable levels. Is there any 

work for this lots of fragmentation such as dynamic relocation etc. (in the near future)

Thanks a lot.

----------

## codergeek42

@pointers, none that I am aware of, aside from re-indexing the directory structure if it's unmounted:

```
# /sbin/e2fsck -Dy /dev/partition
```

----------

## pointers

Hello,

   I think the root of the problem is putting files into non-contiguous blocks. If files are distributed into non-contiguous blocks 

index's help is descreasing because seeks are not eliminated that much with index right?  I hope one day dynamic relocation can be possible

Thanks for your suggestion.

----------

## codergeek42

@pointers: Yes. From what I understand, directory-indexing merely helps the VFS/Ext3 subsystem to locate the file block(s) on disk faster (Logarithmic versus polynomial search/sort running time, you're familiar with Big O Notation and the like). The disk itself still requires the same seek time in most cases.

If I'm not mistaken, Extents and Dynamic-Relocation are in planning for the next incarnation of Ext3 (Ext4?).

----------

## XenoTerraCide

yes it's ext4. I wonder when it's going to be ready and how long it will take before it's stable...

----------

## Special K

when using ext3 on a server which will reboot let's say every 200days only, should I run fsck sometimes?

Is there anything wich sould be done so ext3 is not defragmenting? collecting errors until crash?

Does reiserfs need any maintenance during 24/7?

----------

## XenoTerraCide

lol I switched to reiserfs because I kept getting errors I couldn't recover the lost data from... I don't recommend reiserfs, for anything that can't easily be replaced.

----------

## Q-collective

ext4? I missed something?

----------

## XenoTerraCide

ext4 is under development. It hasn't been released yet.

----------

## codergeek42

 *Special K wrote:*   

> when using ext3 on a server which will reboot let's say every 200days only, should I run fsck sometimes?
> 
> Is there anything wich sould be done so ext3 is not defragmenting? collecting errors until crash?

 As mentioned, Ext3 does not suffer any serious performance loss due to fragmentation until it (the filesystem) becomes near-fully consumed with regards to block/space usage, so you should be fine (as Ext3 reserves about 5% of the available blocks by default to help prevent this). ReiserFS I don't know much about, sorry...

----------

## pv

Hi, again!

I've just seen that while creating a filesystem e2fsprogs-1.38 set the default journal size of ext3 partitions to 32K blocks (that's 128M) instead of 8K blocks (that's 32M) as of e2fsprogs-1.37.

Is such a large journal is intended for using with large partitions? What journal size may be the best for my 8G partitions?

----------

## wrex

 *codergeek42 wrote:*   

>  *XenoTerraCide wrote:*   is there a way to resize ext2/3 partitions? Yes. You can add the option while creating the fileystem:
> 
> ```
> # /sbin/mkfs.ext3 -O dir_index -E resize=max_online_resize /dev/hdXY
> ```
> ...

 

Better yet, try going to google, typing "ext2 resize", clicking "I'm Feeling Lucky", chuckling with pleasure, typing "emerge --search ext2resize", gasping with joy, then typing "emerge ext2resize" and smiling with smug satisfaction.  :-)

md, lvm2, and ext2resize make for a quite feature-rich storage management solution.

Good question, and great thread - thanks CoderGeek!

----------

## wrex

Has anyone experimented with the journal on a separate physical device?  I would expect this to provide a pretty big performance win at the cost of a little extra complexity.

I'm deploying a new server with four physical SATA drives configured as a pair of mirrors.  I've got the resources to dedicate a pair of drives just to journal my busiest filesystem (it's a database server).  This will obviously waste a lot of space, but keeping all the sequential journal writes to one pair of spindles, with the random I/O to the other pair of spindles .  I've never actually tried this with ext3, though.

Has anyone else experimented with a separate journal device?

TIA

----------

## codergeek42

 *wrex wrote:*   

> Better yet, try going to google, typing "ext2 resize", clicking "I'm Feeling Lucky", chuckling with pleasure, typing "emerge --search ext2resize", gasping with joy, then typing "emerge ext2resize" and smiling with smug satisfaction.  

 That's cool!  Thanks.

 *wrex wrote:*   

> Good question, and great thread - thanks CoderGeek!

 Glad to be of help.  :Smile: 

 *pv wrote:*   

> Is such a large journal is intended for using with large partitions? What journal size may be the best for my 8G partitions?

 From what I've read in the man pages, a larger journal is supposed to help increase sustained throughput; but I'm not certain about this, nor have I actually played around with changing the journal size, sorry. I'll play with that later this week.  :Smile: 

----------

## XenoTerraCide

I ended up using gparted. It was easier for me to do the resizing that way because I could see it. However I was aware of the CLI option that ext3 supports natively, although I'm not sure I was at the time I asked that question... That was a while ago.

----------

## wrc1944

Here's a post on our subject I made to the nesl247 guide for advanced gcc-4.1 Gentoo installs (I've migrated all my Gentoo boxes to this method- works perfectly, no problems with the latest version, and nesl247 recommends tuned ext3). Their forum is new and very small, so perhaps I'll get more feedback here. I think this is on point to the ext3 discussion.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In the wiki text I quoted, they didn't mention it at all, so apparently they don't know about nodiratime either. It isn't even in the linux kernel Documentation (linux-2.6.16-beyond1/Documentation/filesystems/ext3.txt). I ran across it about 6 months ago while googling for filesystems info, after Codergeek42 on the Gentoo forum ext3 tips thread sold me on switching to ext3 tuned from reiserfs.

Previously, I had also played around with the ext3 commit= value (change from the default of every 5 seconds to a higher number). I became convinced that in order to assure a pretty current journal being written, something between  60 and 180 seconds was preferable to the wildly high numbers (like 9999) recommended by some. I hadn't done this on my last few installs, and will go back to it right away. (This is also good for saving power on laptops)

Since this is a "performance" ext3 install guide, maybe this tweak could also be considered for those not absolutely requiring the default every 5 seconds journal commit.  

 *Quote:*   

>  From kernel Documentation files:
> 
> commit=nrsec	(*)	Ext3 can be told to sync all its data and metadata
> 
> 			every 'nrsec' seconds. The default value is 5 seconds.
> ...

 

This is a mount option added to the fstab partition lines, and IIRC, you also have to add rootflags=commit=60 (or whatever number you choose) to the grub kernel line for the / partition. I guess /usr/portage, /var, and /tmp could safely be set way higher,  or even just use ext2 with dir_index for those three.

EDIT:  Had to refresh my memory. It seems that /boot can't recognize the commit=n option, so don't use it on the fstab /boot partition. I've gone ahead and set / and /home to =60, and /var, /tmp, /mnt/data, /mnt/data2, and /mnt/portage to=600. I'll see how that goes.  

Any feedback on commit=n settings, pro and con, about intervals for specific partitions related to Gentoo usage is greatly appreciated. My own thinking is that / and /home need to be reliably journaled, while /var and /tmp are OK with less frequent commits, and basically "storage" partitions like /mnt/data that are rarely touched need commit even less frequently. Apparently, /boot must remain the ext3 default of 5 secs.

I'm also reconsidering using data=writeback instead of data=journal. The LinuxGazette filesystems tests author (Justin Piszcz) was kind enough to let me use his script, and I ran all his tests on my own system, comparing all ext3 modes to reiserfs, and data=writeback values came out ahead of all other ext3 modes across the board, and on most tests beat reiserfs (also reiser4, XFS, and JFS). One caveat- this is only on my system- YMMV. Of course with writeback mode, journalling is less reliable. I'm not sure how his battery of tests specifically relates to Gentoo usage, but they do give a good general idea of how the modes compare in performance.

To simplify, I'm also considering putting the portage directory in /var instead of /usr/portage or /mnt/portage (I already have distfiles directory there). In other words, things that tend to fragment need to go into the /var directory, so one operation can defrag when needed, and it keeps / relatively clean. This is probably Gentoo (and other source distro) specific, but in two weeks my ext3 tuned /var partition has gone from 0.0% non-contiguous, to 21.0% non-contiguous- an alarming observation, that convinced me this is a big problem, even with ext3. I had to copy all /var directories to another partition, and restore in order to defrag. In only two days (and two emerge --syncs), /var has already gone from 0.0% to 0.4% fragmented.  I think this really clarifies the importance of getting /var and portage off of /, and dispells the myth that Linux filesystems don't fragment.   :Exclamation: 

I recall when running windows systems defragmenter, where it shows the graphical representation of clusters and fragmentation, even a 2-4% partition fragmentation level appears as horribly fragmented- so much so that you wonder how the hard drive even functions. I can't imagine how 21% or higher would appear, but this is what is happening on my gentoo /var partition. and to a lesser extent, /tmp and /home.

I've installed many hundreds of Linux systems in the last 5 years, and my conclusion is that for all future Gentoo installations, I will have /var on it's own partition (at least 10GB), with the distfiles and portage directories also on /var (requires changing the /etc/make.profile symlink to point to /var/portage). I will also always have a spare /data partion in which to cp -a the /var (and other directory) contents to, and restore from, when defragging is needed.

After much hard experience, here's my recommended basic minimal default Gentoo specific partitioning scheme (I'll leave the filesystem choices to the user, but my current recommendation is ext3, tuned):

/boot   80mb      ext2

/swap  512mb-1GB

/          10GB 

/var     10-12GB (put portage and distfiles directories here, edit /etc/make.conf appropriately, change /etc/make.profile symlink)

/tmp     2-3GB

/home  5-10GB (or more, user's choice)

/data    20-30GB (or more, for backups, misc. tmp stuff, etc., and copy/restore operations (defragging) on /var and /tmp.

Freespace-  Room for expanding /home, adding more /data partitions, or another distro installation.

I don't put /usr on it's own partition, because without portage there, fragmentation potential is much smaller.

/boot, /, and swap should be large enough for most users.

----------

## Massimo B.

I fully agree with your suggestion about where to place the portage tree, eventhough I preferred a loop file for /var. Especially /var runs better on reiserfs in my opinion, but I can't approve that by benchmarks.

Nice information about the commit= option, didn't know that.

Concerning data=writeback or journal I think a determining factor is the cpu power. For my 600Mhz journal was just too much load whereas writeback is not.

----------

## ruben

 *wrc1944 wrote:*   

> I'm also reconsidering using data=writeback instead of data=journal. The LinuxGazette filesystems tests author (Justin Piszcz) was kind enough to let me use his script, and I ran all his tests on my own system, comparing all ext3 modes to reiserfs, and data=writeback values came out ahead of all other ext3 modes across the board, and on most tests beat reiserfs (also reiser4, XFS, and JFS). One caveat- this is only on my system- YMMV. Of course with writeback mode, journalling is less reliable. I'm not sure how his battery of tests specifically relates to Gentoo usage, but they do give a good general idea of how the modes compare in performance.

 

You're really doing serious investigation on those file systems  :Smile:  I think I've seen the LinuxGazette filesystem tests before, but those are quite old, isn't it?  So I think it'd be great if you could post the results if it's not too much work. I'd appreciate it to see how the different configs compare now.

As for the high fragmentation in /var... one of the reasons is probably "/var/tmp/portage" where compiles are done and the ccache cache for emerge.

About the "commit=" value: on my desktop I've chosen 30 seconds, but didn't do any real tests or anything. I just thought it'd increase throughput when copying a lot of data, while still I could only potentially lose 30 seconds of work. On my laptop, I want to disk to be spinned down as much as possible, and I use laptop-mode-tools which delays writes up to half an hour, and for that i also let it remount my partitions with a commit of 1800 seconds. I'm comfortable with this on my laptop, since my laptop is very stable and "can't" shut down because of losing power: the battery works fine, and when my laptop is starting to run out of battery, it simply remounts my partitions with a commit value of 5 seconds, and it goes into sleep when the battery has only 5 minutes left. So, in that sense, I consider it safe. It's happened before that the electricity is going down, and that makes me stay with the 30 seconds on the desktop. I never measured the influence of the commit value on performance though.

----------

## wrc1944

ruben wrote:  *Quote:*   

>  I think I've seen the LinuxGazette filesystem tests before, but those are quite old, isn't it? So I think it'd be great if you could post the results if it's not too much work. I'd appreciate it to see how the different configs compare now. 

  Justin Piszcz recently did another run updating his tests using a 2.6.14 kernel, and they are also on LinuxGazette. After I saw them, I contacted him and he sent me his script, but was very adamant about not releasing it on a public forum. IIRC, I believe I emailed him back an asked if I could post any results here, but never heard back. In any case, I feel I must abide by his wishes, as he was extremely helpful when I had some questions about his script. I did send him my results, in hopes he would write another LinuxGazette article, complete with the charts. 

What would really be interesting from a Gentoo perspective would be to devise a script that would run multiple tests of things like emerge --sync, revdep-rebuild, and other gentoo related operations, and compare the filesystem mode performance. 

I can say that data=writeback times were about 2-5% faster than data=journal, and both were considerably faster than the ext3 defaults. However, I really think that adding dir_index is the big factor and makes the most difference, compared to defaults. I ran the ext3  defaults, reiserfs, and then the other modes with dir_index, all on the same empty partition (reformatting for each run).

Very interesting comments on your commit= settings. I just switched back to using commt=60 and 600 as mentioned above. My subjective impression is that windows generally open slightly faster than they did, maybe due to the fact that with the default 5 seconds, there is a better chance that a commit is being written when opening a window, or doing something else. I guess this could also be affected by the choice of schedulers in kernel config. 

Hmmmm.... I just realized about /boot mentioned above in an edit. The reason it didn't recognize commit=n is because I forgot it's an ext2 partition.  :Embarassed: 

----------

## skyfolly

too lazy to read the post through, but I am wondering if small block size can help performance? I believe small block size helps to save harddisk space. But other than that, I am not sure.

----------

## codergeek42

 *skyfolly wrote:*   

> too lazy to read the post through, but I am wondering if small block size can help performance? I believe small block size helps to save harddisk space. But other than that, I am not sure.

 As I understand it, the block size argument is a trade off between performance and space usage, as a larger block size means that the VFS layer can read or write data in large chunks, instead of a series of smaller I/O calls; but it also means that each used block will be larger, so even if your file is, for example, 780 bytes, it could still consume (by not using the remainder of its reserved space) 4 kilobytes of disk space.

----------

## micr0c0sm

A friend of mine recommended using soft-updates but I am having trouble finding any information on the topic, can anyone explain to me what that is?

----------

## codergeek42

 *micr0c0sm wrote:*   

> A friend of mine recommended using soft-updates but I am having trouble finding any information on the topic, can anyone explain to me what that is?

 Soft updates is different from journalling in that it does not keep backup metadata/object copies of transactions, but simply maintains metadata writes in the proper order. As far as I know, this is really only implemented in the UFS incarnations (*BSD). 

As I've only played with FreeBSD in a qemu VM instance, I can't say with certainty how soft updates compares to journalling in terms of performance (though journalling as I explained here does gaurantee consistent file data as well, which may be very important).

----------

## PaulBredbury

Here's another ext3 performance tip:  Upgrade to kernel 2.6.17  :Smile: 

----------

## codergeek42

 *PaulBredbury wrote:*   

> Here's another ext3 performance tip:  Upgrade to kernel 2.6.17 

 Cool! Thanks!   :Cool: 

----------

## DarkMind

 *codergeek42 wrote:*   

> Copyright (c) 2005 Peter Gordon
> 
> Permission is granted to copy, distribute and/or modify this document
> 
> under the terms of the GNU Free Documentation License, Version 1.2
> ...

 

or use reiserfs??   :Smile:   :Cool: 

----------

## synss

 *DarkMind wrote:*   

> or use reiserfs??   

 

Oh my God... Let us not have this rather interesting thread degenerate into a flame war, please... 

Just use reiserfs if it makes you happy for all I care.  :Evil or Very Mad: 

----------

## codergeek42

Thank you, synss.

DarkMind: This is meant to be a thread about Ext3 tweaking. If you use ReiserFS and want to encourage its use or discuss its tweaking capabilites, then by all means go ahead: But not in this thread. 

Thanks.

----------

## ppurka

<edited> (some stupid crap I had posted)Last edited by ppurka on Tue Jun 27, 2006 5:51 am; edited 1 time in total

----------

## Doogman

Just a heads-up on being careful with setting ext3 with full data journaling.  I've used it in the past without any noticable speed problems, but I've noticed that lately I've been getting a big slowdown with  journal_data.  It was most apparent this weekend when I installed a new 500G SATA HD and I couldn't figure out why the drive benchmarks were so slow.  After spending a few hours going down the wrong patch assuming it was a problem with libsata, I swithced to using the default ext3 settings with dir_index and drive performance returned to normal.

```

With full data journal

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

ghidorah         1G           19430  11 15519   8           74570  17 163.2 2

Latency                        1920ms    1555ms             39679us 281ms

And now without

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

ghidorah         1G           39035  18 24111   9           74431  16 203.2 4

Latency                        3253ms     211ms             36911us 509ms

```

Write performance was cut in half with full data journaling!

I would recommend if you get away from "mkfs.ext3 -O dir_index" that you run a few benchmarks.

----------

## XenoTerraCide

on your data_journal partition did you have dir_index turned on? it is recommended. how did you benchmark it? just out of curiousity.

----------

## Doogman

 *XenoTerraCide wrote:*   

> on your data_journal partition did you have dir_index turned on? it is recommended. how did you benchmark it? just out of curiousity.

 

Yeah, I always use dir_index.  The benchmarks I posted were from bonnie++.

An alternative to bonnie++ was to simple use "mc" to copy files into the data_journal_partition.  On a big file (like a 700MB xvid), it will give you a MB/S reading which backed-up bonnie++'s results.

----------

## codergeek42

 *Doogman wrote:*   

> [...]
> 
> Write performance was cut in half with full data journaling!

 Those benchmarks are rather surprising. I'll have to try that out on my own.  :Surprised: 

Thanks.

----------

## XenoTerraCide

one other important thing. what version of the kernel was this done on?

----------

## enderandrew

I want to throw things right now.

I normally use Reiser4 all day long on all my boxes.  I keep hearing the ext3 is more stable, and offers just as good performance when properly configured.

My first two boxes I put ext3 on, I had crashes and lost data within 2 weeks.

So I decided to go ahead on give ext3 one more try.

Brand new install.  I just spent 4 days compiling everything.  Now it won't boot, and it says /dev/root is corrupted.  the fschk program spots the errors, attempts to fix them, but doesn't do any good.  I know I am probably the exception to the rule, but 3 times out of 3, ext3 has taken a crap on me within 2 weeks of using it.  I've never lost data with reiser4.

I'm *really* hoping I don't have to format and start over again, because my free time is next to nonexistant these days, but I thought ext3 was supposed to be absolutely stable as a rock.

----------

## Doogman

More info:

doug@ghidorah ~ $ uname -a

Linux ghidorah 2.6.16-ck12n #2 SMP Sat Jun 24 12:34:26 EDT 2006 i686 Dual Core AMD Opteron(tm) Processor 175 GNU/Linux

I seem to be having the same slow-write problem with a box running 2.6.16-gentoo-r7.  I would like to see some other people's benchmarks!  Bonnie++ isn't that hard to run.  :Smile: 

Question enderandrew... why move from Reiser when you weren't having problems?

Obviously people who use ext3 don't have to reformat partitions every other week as I think you would see some red flags raised somewhere. 

<Testimonial> I've been using ext3 since it's inception and my filesystems survived numerious hard crashes and such while trying out new kernels and hardware.  Heck, my last system had a hardware problem that when the MB had high IO it would occasionally corrupt the disk writes.  Nice.  Even so, I never lost a filesystem and in fact only noticied it when I seen the errors corrected during the monthly fsck.  Moving to a Tyan K8E fixed all those problems. </Testimonial>

----------

## enderandrew

I was doing new installs on two more laptops.

----------

## devsk

 *PaulBredbury wrote:*   

> Here's another ext3 performance tip:  Upgrade to kernel 2.6.17 

 

not really, just see these comparisons I made using bonnie++:

```

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

2.6.15-r8    1800M   331  99   97570  39 73378  25   844    98   253583  47  2728  36

2.6.17-r1    1800M   304  99   96719  35 61476  16   777    97   184444  22  2490  28

```

These are done using a dual boot install of 2.6.15-r8 and 2.6.17-r1 on the same ext3 filesystem which resides on nvraid, RAID0 using the same bonnie package and same kernel config (make oldconfig using 2.6.15-r8 .config in 2.6.17-r1). Seq rewrite and read rates have fallen by as much as 15-25%. Rewrite may have suffered because of reduced seq. read performance. Upgrades are not always what they are made out to be.

----------

## darklegion

 *devsk wrote:*   

>  *PaulBredbury wrote:*   Here's another ext3 performance tip:  Upgrade to kernel 2.6.17  
> 
> not really, just see these comparisons I made using bonnie++:
> 
> ```
> ...

 

Did you enable "CONFIG_ADAPTIVE_READAHEAD" while doing "make oldconfig"?. I think this option is needed to get the performance increase (I haven't tested the kernel as of yet).

----------

## devsk

where does this config parameter lie? I can't find it.

PS: I found out why its not there. Its in a patch for mm kernel, which is not present in the latest gentoo-sources (at least that string CONFIG_ADAPTIVE doesn't figure in any of files under /usr/src/linux) that I am running. But the poster (and the link he posted) actually said it was in mainline kernel.

----------

## devsk

 *darklegion wrote:*   

> 
> 
> Did you enable "CONFIG_ADAPTIVE_READAHEAD" while doing "make oldconfig"?. I think this option is needed to get the performance increase (I haven't tested the kernel as of yet).

 It seems that the article talks about a different patch than the one which contains the new enhanced ADAPTIVE_READAHEAD. This commit is different from the patch in mm kernel:

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=89747d369d34e333b9b60f10f333a0b727b4e4e2

----------

## devsk

if it makes anybody happy knowing that they are not missing out on holy-grail, here it is:

```

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1         -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine           Size    K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

van-2.6.17     1800M   356  98 33834  13 28018  18   934  90 64460  23 186.2   6

mm6-2.6.17   1800M  495  98 33363  11 25761  13  1238  93 64946  18 276.4  13

```

no gains for adaptive readahead in mm6... :Sad: 

Both tests done under a VM using same install with dual boot setup.

----------

## Sedrik

How much improvement can one expect these tweaks(full journaling, indexing) to give?

Is it like 0.5% increase? more? less?

I'm a ext3 user already, have been since my friend pointed me to a post somewhere pointing out the faults of reiserfs (which I used before since it was the popular chooice).

Nice guide =)

*Edit

Please continue to add more tweaks and safe modes to run to the first post as time goes by =)

----------

## Sachankara

Full journaling on Ext3 decreases the performance. People should try running with the default to see the speed difference.

Personally I've started to using XFS because it feels faster on everything but really large files. People should try XFS for their /usr/portage partition, then they might notice how fast it stays over time.  :Smile: 

----------

## devsk

 *Sachankara wrote:*   

> People should try running with the default to see the speed difference.

  what is default? I thought with -j, the default was full (metadata as well as data) journalling.

 *Sachankara wrote:*   

> 
> 
> Personally I've started to using XFS because it feels faster on everything but really large files. People should try XFS for their /usr/portage partition, then they might notice how fast it stays over time. 

 Tried it. Its freaky fast (mostly because of no data journalling) in benchmarks but my boot times increased on the same hardware in a dual boot setup, meaning real life read performance was worse than ext3. And this was with all the optimizations that I could find. Moreover, no data journalling kinda keeps me away from it.

----------

## adsmith

If you want real data, here are some good and generally accepted benchmarks:

http://linuxgazette.net/122/TWDT.html#piszcz

They are probably posted several pages back as well.

JFS and XFS are both excellent FS's, but as the previous posted said, neither support full-data journalling.  With that fact and these benchmarks, EXT3 really shines.

By the way, for a while it was true that data journalling in EXT3 was somehow *faster* than just metadata journalling.  See:

http://www-128.ibm.com/developerworks/linux/library/l-fs8.html

I'm not sure if this is still true.

----------

## codergeek42

 *Sedrik wrote:*   

> How much improvement can one expect these tweaks(full journaling, indexing) to give?
> 
> Is it like 0.5% increase? more? less?

 Honestly, I can't say for certain (though if you'd like, give me some time and I'm sure I could come up with some benchmarks). It certainly feels faster, or at least more responsive, though.

 *Sedrik wrote:*   

> Nice guide =)

 Thanks.  :Smile: 

 *Sedrik wrote:*   

> *Edit
> 
> Please continue to add more tweaks and safe modes to run to the first post as time goes by =)

 If I think of any or find out anything, I'll surely add it.  :Smile: 

----------

## mrcs

 *Sedrik wrote:*   

> HNice guide =)

 

I second this! Without this thread I would'nt even have looked at ext3 after always hearing how slow and basically useless it was, but you showed me the light  :Smile: 

----------

## codergeek42

 *mrcs wrote:*   

>  *Sedrik wrote:*   HNice guide =) 
> 
> I second this! Without this thread I would'nt even have looked at ext3 after always hearing how slow and basically useless it was, but you showed me the light 

 Wee. Another Ext3 convert!    :Very Happy: 

----------

## mrcs

 *codergeek42 wrote:*   

> Wee. Another Ext3 convert!   

 

 :Smile: 

Just one question though. When I set the size of ccache to 1G or greater, it's just a matter of time before ext3 breaks down, spits out tons of errors and remount / as read only.

If I set ccache to 512 or less, that never happens. What's up with that?

----------

## neuron

 *mrcs wrote:*   

>  *codergeek42 wrote:*   Wee. Another Ext3 convert!    
> 
> Just one question though. When I set the size of ccache to 1G or greater, it's just a matter of time before ext3 breaks down, spits out tons of errors and remount / as read only.
> 
> If I set ccache to 512 or less, that never happens. What's up with that?

 

hardware error, and your simply not hitting the error when using a smaller ccache.

----------

## mrcs

 *neuron wrote:*   

> hardware error, and your simply not hitting the error when using a smaller ccache.

 

Well, I have the same exact error on two machines, but I guess it could be the hardware. But shouldn't a hardware error pop up when doing other operations too? Like now, my laptop hardrive is 99% full and still no error like that.

EDIT: Um, actually it's nowhere near full, my bad, wrong drive, but the question remains.

----------

## neuron

 *mrcs wrote:*   

>  *neuron wrote:*   hardware error, and your simply not hitting the error when using a smaller ccache. 
> 
> Well, I have the same exact error on two machines, but I guess it could be the hardware. But shouldn't a hardware error pop up when doing other operations too? Like now, my laptop hardrive is 99% full and still no error like that.
> 
> EDIT: Um, actually it's nowhere near full, my bad, wrong drive, but the question remains.

 

it can be a lot of things, but all ccache does is stress the hardware, I very seriously doubt you'll find stability problems in ext3.  (atleast without experimental patches) as it's being used on enterprise level all over the world.

----------

## mrcs

 *neuron wrote:*   

> it can be a lot of things, but all ccache does is stress the hardware, I very seriously doubt you'll find stability problems in ext3.  (atleast without experimental patches) as it's being used on enterprise level all over the world.

 

Yeah, I'm suspecting that my slow laptop hard drive can't cope with that many seeks when the cache is to big. It does work flawlessly when the cache is set to <512M so you're probably right. The other one's a SCSI disc but that one might actually being dying.

----------

## neuron

 *mrcs wrote:*   

>  *neuron wrote:*   it can be a lot of things, but all ccache does is stress the hardware, I very seriously doubt you'll find stability problems in ext3.  (atleast without experimental patches) as it's being used on enterprise level all over the world. 
> 
> Yeah, I'm suspecting that my slow laptop hard drive can't cope with that many seeks when the cache is to big. It does work flawlessly when the cache is set to <512M so you're probably right. The other one's a SCSI disc but that one might actually being dying.

 

you might wanna look into smartmontools and check dmesg.

----------

## wrc1944

adsmith wrote:  *Quote:*   

> By the way, for a while it was true that data journalling in EXT3 was somehow *faster* than just metadata journalling. See: http://www-128.ibm.com/developerworks/linux/library/l-fs8.html I'm not sure if this is still true.

  Considering that the above article is from Dec 2001 and with 2.4 kernels, I've often wondered about this since I read page one of this thread, when I became an ext3 convert, although I went ahead and am still using data=journal to this day. 

Seeing as how it was never entirely clear why this was the case (at least to me), and it simply goes against all logic (and the kernel fs docs), I'm changing all my ext3 partitions to data=writeback. I also ran Justin's LinuxGazette newer tests (with his script, and a 2.6.14 kernel) on my own system, comparing ext3 data=journal and data=writeback modes, and confirmed what logic predicts- writeback indeed produced better scores.

From the kernel Documentation/filesystems/ext3.txt file:

 *Quote:*   

> 
> 
> Data Mode
> 
> ---------
> ...

 

----------

## feld

but some of us are more worried about filesystem integrity than being a ricer.

it's not like full journaled mode annihilates our performance in any considerable way...   :Rolling Eyes: 

----------

## XenoTerraCide

yep integrity. writes are noted to be slower, but reads should be as fast normally and faster (than writeback) while writing. I don't know about others but I read data more than i write it. the only thing I have on reiserfs is portage because I don't need integrity and it handles small files better. the rest I need integrity on. /var might be a good candidate for writeback because logs are constantly being written. and aren't read as much.

----------

## 1bitmemory

Has anyone here played with the ext3 patch set mentioned here ( http://lkml.org/lkml/2006/6/6/65 ) ? Some more explanations here http://lwn.net/Articles/80285/

1bm

----------

## neuron

 *1bitmemory wrote:*   

> Has anyone here played with the ext3 patch set mentioned here ( http://lkml.org/lkml/2006/6/6/65 ) ? Some more explanations here http://lwn.net/Articles/80285/
> 
> 1bm

 

Looks extremely interesting, found this aswell:

http://www.ussg.iu.edu/hypermail/linux/kernel/0606.0/1580.html

//edit and this:

https://mail.clusterfs.com/pipermail/lustre-discuss/2006-June/001601.html

//edit and this showing performance gains:

http://www.bullopensource.org/ext4/lowmemory/index.html

----------

## onesandzeros

I looked through this thread for this issue, but I may have missed it in the 13 pages, heh.  When I applied the changes in the first post, I had some weirdness in KDE.  Specifically, I often had kio_slaves crashes, and the occasional total Konq failure.  I was running 3.5.5 and 3.5.6 during that period.  Prior to the changes, I never had those troubles (numerous KDE versions) and now that I've undone them, 3.5.6 is as stable as I'd expect.  Has anyone else had this trouble?

----------

## adsmith

That's interesting. 

I've had some sporadic  KDE crashing [Krashing?], but I just assumed it was KDE.

Have you done any experiments to determine which options are correlated with the wonkiness?

----------

## onesandzeros

No, unfortunately I didn't.  I might in the future, but my gentoo box is my *only* box, and ext3's default settings are fine.  The dir_index is the option that's supposed to speed things up, right?  I didn't notice much of a difference at all.  Maybe it's my hardware.

Also, I didn't mean to imply that I had other KDE problems (or any other problems at all).  Only konq (sometimes) and kio_slaves (often) were wacky.  I don't recall any other problems.

----------

## ppurka

 *onesandzeros wrote:*   

> No, unfortunately I didn't.  I might in the future, but my gentoo box is my *only* box, and ext3's default settings are fine.  The dir_index is the option that's supposed to speed things up, right?  I didn't notice much of a difference at all.  Maybe it's my hardware.
> 
> Also, I didn't mean to imply that I had other KDE problems (or any other problems at all).  Only konq (sometimes) and kio_slaves (often) were wacky.  I don't recall any other problems.

 Do you have the problem of kopete just not responding all of a sudden (even in the middle of a chat)? And that the problem does not go away until you kill kopete, and then killall -9 dcopserver?

----------

## onesandzeros

 *ppurka wrote:*   

>  *onesandzeros wrote:*   No, unfortunately I didn't.  I might in the future, but my gentoo box is my *only* box, and ext3's default settings are fine.  The dir_index is the option that's supposed to speed things up, right?  I didn't notice much of a difference at all.  Maybe it's my hardware.
> 
> Also, I didn't mean to imply that I had other KDE problems (or any other problems at all).  Only konq (sometimes) and kio_slaves (often) were wacky.  I don't recall any other problems. Do you have the problem of kopete just not responding all of a sudden (even in the middle of a chat)? And that the problem does not go away until you kill kopete, and then killall -9 dcopserver?

 

I don't use Kopete much (don't think I used it all while I had the ext3 mods in place), but I've got it going now.  Seems ok.  Feel free to contact me at my jabber address, heheh.

----------

## carpman

Hello, ok i have 200 GB partition on my backup server that is only going to be used for backups.

I thinking that ext3 is going to be best for this but was wondering if that is correct and if so what are best settings to use on it?

cheers

----------

## vanten

Thanks there codergeek42, Your first post worked like a charm now.

( And thanks to all people thats brought up those things that made it that way. )

----------

## XenoTerraCide

hmm... I had an filesystem consistency problem on /home 2 directories had lost there permissions. so I rebooted and of course ext3 put things right... but I decide to run further checks, and I found out journal_data was no longer on any of my filesystems. this is the first thing I do after making ext3 in a new install and I remember doing it. so at some point it disappeared. no idea why how or when. but I would be interested to know if other's have had this problem? secondly. I don't really want to burn a recovery disk right now. and I'm not sure I actually have a good one. would enabling it before I reboot and then forcing an fsck on the next boot fix it? how do I force an fsck on 2 filesytems on reboot?

----------

## onesandzeros

 *XenoTerraCide wrote:*   

> how do I force an fsck on 2 filesytems on reboot?

 

Well, I'm not sure about your other problems (it was good to post them, someone will probably know), but as for the fsck: shutdown from a console with 'shutdown -rF now'.  The -r will reboot, the -F will force a fsck of any partition that mounts automatically at boot (that is, a partition with 'auto' in its fs_mntops field in fstab).

As I wrote in an earlier post in this thread, I was getting some real oddball behavior from konqueror and the kio daemon.  Undoing the fs modifications that were listed in the first post of this thread seems to have taken care of those issues.  Why those problems came about is still beyond me.

----------

## Maf

Guys, how can I check whether the system is mounted at the moment with 'journal' option to make me certain?

----------

## devsk

 *Maf wrote:*   

> Guys, how can I check whether the system is mounted at the moment with 'journal' option to make me certain?

 

```
tune2fs -l <device_with_ext3_filesystem>

grep <device_with_ext3_filesystem> /etc/mtab

dmesg | grep "EXT3-fs: mounted filesystem"

```

PS: I forgot one.... :Wink: Last edited by devsk on Wed Feb 28, 2007 6:32 pm; edited 1 time in total

----------

## XenoTerraCide

it's also in part v? of the tutorial on page 1. I had codegeeker add it ages ago.

----------

## Maf

 *devsk wrote:*   

>  *Maf wrote:*   Guys, how can I check whether the system is mounted at the moment with 'journal' option to make me certain? 
> 
> ```
> tune2fs -l <device_with_ext3_filesystem>
> 
> ...

 

Well what I mean is, my mount says:

```

/dev/sda3 on /home/maf type ext3 (rw,noatime,data=journal)

```

And my kernel options in grub.conf:

```

kernel /boot/linux-2.6.20 root=/dev/sda1 rootflags=data=journal vga=792 video=vesafb:mtrr,ywrap verbose mce

```

But I kinda' can't believe the "journal-mode" is on  :Wink:  And unfortunatelly tune2fs -l doesn't contain this kind of information. Is there any other way?

----------

## XenoTerraCide

part of tune2fs -l should say 

```
Filesystem features:      has_journal dir_index filetype needs_recovery sparse_super

Default mount options:    journal_data

```

 if you followed the tutorial. it may not doing journal data the way you are. I've never done it that way. has_journal will be in there though. dmesg should tell you how it's mounting it when it boots. that's how I cought that somehow a bunch of my filesystems on a box were in ordered mode.

----------

## devsk

 *Maf wrote:*   

> But I kinda' can't believe the "journal-mode" is on  And unfortunatelly tune2fs -l doesn't contain this kind of information. Is there any other way?

 

```
dmesg | grep "EXT3-fs: mounted filesystem"
```

----------

## Maf

 *devsk wrote:*   

> 
> 
> ```
> dmesg | grep "EXT3-fs: mounted filesystem"
> ```
> ...

 

Sweet, thx  :Smile: 

----------

## devsk

One of the most ignored fact about ext[23] is that its default number of inodes calculation is pretty conservative. It assumes the larger the partition, larger the number of files will be stored on it by the user. e.g. for a 150GB partition, it formats the partition with 23 million inodes. Now, I know I am never going to create that many files/dirs on this partition. The result of large number of inodes is that fragmentation of large files increases drastically because a large file's blocks can not use the blocks that are reserved for inodes and have to be jumped over.

So, what is the cure? Use -N option to specify roughly how many inodes you want on this partition. For example, if I am storing my mp3's on this 150GB partition, and everage size of an mp3 is 5MB, I would store about 30,000 files on it. Account for some small files and dirs, I multiply by 16 and pass "-N 480,000" to mke2fs. Even 1 (as opposed to default 23 in this case) million inodes will reduce your fragmentation by leaps and bound.

Now, the disadvantage of doing this: Because the inode blocks are marked during formatting, you can not change the number of inodes later. So, make a good guess depending upon your situation and multiply it by a factor of 2, 4, 8, or 16 to make sure that you are able to grow, but still keep the number much below the default used by mke2fs.

----------

## XenoTerraCide

in reply to devsk observation.  anyone know a good way to find the average size and number of files in a directory?

----------

## devsk

 *XenoTerraCide wrote:*   

> in reply to devsk observation.  anyone know a good way to find the average size and number of files in a directory?

 you mean in an already populated and mounted file system? if so, following should do:

```

#change dir to where you want count and avg size

sum=0

count=0

for i in `find .`;do sz=`ls -ald "$i"|awk '{print $5}'`;sum=$((sum+sz));count=$((count+1));done

echo "Count = $count Average Size = $((sum/count)) Bytes"

```

----------

## XenoTerraCide

yeah I meant populated. The best way to make a guess is to make it educated. In my case I wish to use my existing systems to get an idea what size and kind of files I have in each directory off /. I use reiserfs for things like ccache and portage.

----------

## rickrick

I am using the tuned ext3 and am curious about the speed of seeing how big directories are.  When I bring up the properties of say /usr it takes a while (couple minutes) to for ext3 to count up the size of it.  On some other filesystems is was a lot quicker (i think it was xfs or ntfs).  Anyway... I was just wondering if I was missing a setting... I did everything from the first post.  I have a journal size of 128M and I commit=60 (I've used the defaults too).  This isn't meant to be a this fs is better than that one... I'm just curious if anyone else notices this.

----------

## mno

Sorry, maybe this was discussed in the 13+ pages, but before I play around with my partitions, wanted to confirm:

I already have several years ago set up my partitions and formatted everything to use ext3 (except boot). If I switch the journaling mode now, is it safe? I am running a RAID1 setup using a 3ware 8506 card. 

Thanks,

Max

----------

## XenoTerraCide

yes. but you'll need to run a fsck afterwards.

----------

## mno

ugh then i'd rather not. it's a server, and if fsck hangs..... :/

----------

## likewhoa

 *devsk wrote:*   

>  *Maf wrote:*   But I kinda' can't believe the "journal-mode" is on  And unfortunatelly tune2fs -l doesn't contain this kind of information. Is there any other way? 
> 
> ```
> dmesg | grep "EXT3-fs: mounted filesystem"
> ```
> ...

 

a more detailed view of any extfs would be to use dumpe2fs, here's how to show only superblock info.

```

dumpe2fs -h /dev/sda1

```

enjoy.   :Cool: 

----------

## codergeek42

Nifty tip! Thanks, likewhoa.  :Smile: 

----------

## XenoTerraCide

although not saying that dumpe2fs isn't usefull what does dumpe2fs -h show that tune2fs -l  doesn't? it seems to me they show the same thing?

----------

## likewhoa

 *XenoTerraCide wrote:*   

> although not saying that dumpe2fs isn't usefull what does dumpe2fs -h show that tune2fs -l  doesn't? it seems to me they show the same thing?

 

they show almost the same output other than dumpe2fs which shows the journal size.

----------

## satanskin

How might one recover from running the following:

# tune2fs -O dir_index /dev/hdXY

# e2fsck -D /dev/hdXY 

It seems to have pretty much fucked most things up, especially portage and python

----------

## i92guboj

 *satanskin wrote:*   

> 
> 
> # e2fsck -D /dev/hdXY 
> 
> 

 

You didn't run fsck while the filesystem was mounted. Did you?

If affirmative, remember never to do so again, umount it, and then run fsck again on it to fix it. If some files have been screwed there is not much you can do. Fsck can seriously screw up things if you run it on mounted filesystems. You should never do that. 

If the filesystem wasn't mounted, then forget about my post.

----------

## satanskin

 *i92guboj wrote:*   

>  *satanskin wrote:*   
> 
> # e2fsck -D /dev/hdXY 
> 
>  
> ...

 

It was indeed mounted. I will surely give your suggestion a shot. Thank you.

----------

## i92guboj

 *satanskin wrote:*   

> 
> 
> It was indeed mounted. I will surely give your suggestion a shot. Thank you.

 

Then I hope that it did not damage something critical. You might need to emerge some packages if something fails. Python might be a problem if it is broken enough that portage can't work. In that case, you will have to rescue your Gentoo system using a prebuilt python package.

----------

## padoor

i am happy to see this thread contains lot of helpful informations.

----------

## purpler

somebody have to say that activating noatime (last access time) option in fstab could noticeably improove file system responsiveness too..

 *Quote:*   

> daemon% cat /etc/fstab|grep noatime
> 
> /dev/hdb1       /      ext3    noatime,data=journal     0 1

 

i converted from XFS too and can't say anything else except, excelent  :Smile: 

----------

## azp

Maybe it's time to add this guide to the gentoo-wiki? It's a bit hard to read through 14 pages of answers to find out what changes and tips have been reported. I just managed to read through the first two pages, and the first post seems to be updated according to the reported errors!

Good guide to have, I was looking for the dir_index when I found it =)

----------

## steveL

One minor point: tune2fs -O has_journal is not needed if the filesystem is made with mke2fs -j as outlined in the handbook.

Also, resize_node (which is a default) is not necessary if it's a fixed-size partition (which is handy for some purposes.)

```
mke2fs -O dir_index,^resize_inode -j /dev/blah

tune2fs -o journal_data /dev/blah
```

tune2fs -l showed it as has_journal correctly on my box.

Thanks for an excellent HowTo :-)

I was thinking (after looking at man mke2fs.conf) that it'd be nice to have some defaults specifically for Gentoo purposes: ie /usr/portage (which might be reiser) /usr/portage/distfiles and /var/tmp/portage. These could be set as portage, distfiles or tmp so that we run mke2fs -T distfiles for example. Any suggestions on what those defaults could entail?

Personally I'm thinking of setting a similar one for home, which I imagine would be similar to distfiles, but it'd be cool to have, say video, audio and multimedia (for generic use) settings as well as usr, var (and tmp).

----------

## azp

I don't think you want /usr/portage as reiserFS. I once thought I wanted it until I learned that ReiserFS fragments like a sonofabitch, and the only way to defrag it is to either tar, move, delete all, and untar back. Sure, you could just delete your whole portage partition every once in a while, it takes about a year for ReiserFS to become unusable on a fs with rather high i/o.

----------

## steveL

Reiser has excellent perfomance, and even better space usage with small files. It's just not 100% reliable IMO. As such it's perfect for portage tree, since all files have md5 sums and they can easily be resync'ed. If you do this, it's advisable to have distfiles separate; although the files are also checksummed, they tend to be larger (pain to download again) and less frequently accessed, so reiser doesn't gain much. This also helps with the defrag issue, since the filetypes and access patterns are so different.

I guess I should reformat the portage partition at some point, and see if it speeds up though :)

----------

## likewhoa

 *steveL wrote:*   

> 
> 
> I was thinking (after looking at man mke2fs.conf) that it'd be nice to have some defaults specifically for Gentoo purposes: ie /usr/portage (which might be reiser) /usr/portage/distfiles and /var/tmp/portage. These could be set as portage, distfiles or tmp so that we run mke2fs -T distfiles for example. Any suggestions on what those defaults could entail?

 

for me i break everything up into several partitions under lvm2. i.e

/usr/portage <- mke2fs -b 1024 -N 200000

/usr/portage/distfiles <- mke2fs -b 4096 -T largefile

I used to keep /usr/portage on reiserfs but that eventually became slow, i get better performance overtime with extfs. 

best think would be to have /usr/portage on a raid0 2x CF array, read time on that would be insane.

that's the plan for me.

----------

## nowshining

did it all from the live Gutsy Cd - i have feisty updated to Gutsy by the way and it did make things a bit faster well at most/least a noticeable difference. thanks..  :Smile:  and NO I am NOT a gentoo user...

edit: if anyone is wondering yes I made the changes to my MAIN HardDrive /boot up drive from the LiveCD via sudo in the terminal.  :Smile: 

----------

## likewhoa

for those of you people currently using RAID, you can optimized ext2/3 for use in raid with the extended stride option, the way to calculate this value is simple but not trivial. first you need to multiply all the drives in the array against the chunk value of the array then you divide that with the block value for your ext3 file system for example.

remember that stride values are only good with raid0,5,6 and above it is not needed with raid1.

for example:

you setup and array out of 4 drives like so.

```
# mdadm --create /dev/md0 /dev/sd[abcd]1 --level=0 --chunk=256 --raid-devices=4
```

note the chunk value you will need this to calculate the final stride value.

now that we created our raid0 array it's time to multiply the chunk value against the block value that we will use with the ext2/3 file system. for this example we will use 4096 which is the default value.

4 = number of drives.

256 = chunk value.

4 = block value.

results = stride value.

```
# a=$((4*256/4)); echo $a
```

ok our stride value is 256, so now lets create the file system.

```
# mke2fs -b 4096 -E stride=256 /dev/md0
```

that's all folks.

----------

## neuron

 *likewhoa wrote:*   

> for those of you people currently using RAID, you can optimized ext2/3 for use in raid with the extended stride option, the way to calculate this value is simple but not trivial. first you need to multiply all the drives in the array against the chunk value of the array then you divide that with the block value for your ext3 file system for example.

 

That's not the information I've found about raid5/ext3 chunk size.  The algorithm I found everywhere was simply:

stride = chunk / blocksize

So 256/4k in your case, = 64

----------

## Cyker

Are you sure?

The calculation I got was [Optimum Stride]=[Array Stripe Chunk Size] / [FS Inode Blocksize]

There wasn't anything about dividing by the no' of drives...!

If this is the case, then the stride I set for my RAID array should be 64, and not 16 as I have set it!

But to my knowledge, the RAID chunk size is not dependent on the no. of disks in the system, and neither is the inode blocksize or stride.

My understanding of the stride, is this:

The RAID Chunk||stripe size is how many bytes of continuous data will be on one disk (The default being 64k.). No matter how many disks, each one will still get this same-sized chunk for each stripe.

The stride value tells mkfs how many inode blocks (def. 4k) will fit into one of those chunks - In my case, you can fit 16 4k inode blocks into a single RAID chunk (64/4=16)

Going on this, the stride value in your example should be 64, and not 256...

Edit: NB: Was replying to likewoah, but was checking Google/TLDP/genwiki to make sure I wasn't being stupid, and that cunning knave neuron cut in!  :Razz: 

----------

## likewhoa

Well i'm in the process of doing some benchmarks to show the difference in various chunk values and stride values with raid, my first understanding was that just doing the calculation "chunk/block size" would equal stride size but giving the number os drives in a giving raid array it makes more sense to put the number of drives into the equation. The only way to find this out is by running benchmarks which i should be doing at benchmarks.gentooexperimental.org soon.  :Smile: 

so far i'm getting good results with the "raid drives*chunk/block size" equation but i can't give a conclusion until the benchmarks are fully done.

----------

## Cyker

Well, using the common calculation, the size of 1 stride = size of 1 stripe.

In yours, the stride would be spread over all 4 disks, so 1 stride = 4 stripes...

I must admit, I do hope yours is not the optimal - It would mean you can't add another disk to the array because that would change the 'optimal' stride, and you can't change the stride size without re-formatting the array.

The problem with this sort of thing is it tends to be dependent on access patterns and file size/spread.

For large contiguous files, setting the array chunk, inode and stride sizes to numbers much bigger than the norm (e.g. 512 vs 64, 32 vs 4 and 512 vs 16) would give very good performance, but as soon as random-access, fragmentation and small files are thrown in, the performance drops like a lead balloon.

But I look forward to seeing the benchmarks; I've not found any decent ones so far so it'd be interesting to see what results you get!  :Smile: 

----------

## likewhoa

yea, I agree with you that this method is optimal until the number of drives increases and this is the first time I have come into contact with such method, anyways the performance can be seeing in the numbers, I manage to put some time into running a few benchmarks on a 8 drive raid0 array 3.6TB in size to see what the numbers say. I myself still use the old method of "chunk/block size" with all previous arrays and I have only been implementing this method because it was something that seems to give good performance on large arrays. anyways, here are some dd,mount,mke2fs benchmarks in raw form. once the full benchmarks which will be done on smaller size arrays, 50,75,125,150GB using various chunk values starting from the default 64k up to 8096, all those benchmarks will feature values from bonnie++,tiobench,dd, & hdparm. P.S the current benchmarks below were run on 8GB of ram.

/dev/md0 512 chunk & ext3fs stride=512 (8*512/4) 8 being number of drives in array.

filesystem mounted with journal data ordered.

```

# time mke2fs -j -b 4096 -E stride=512 /dev/md0

real    7m34.245s

user    0m0.530s

sys     0m45.233s

# time mount /dev/md0 /mnt/gentoo/;time sync

real    0m0.914s

user    0m0.000s

sys     0m0.009s

real    0m0.001s

user    0m0.000s

sys     0m0.001s

# time dd if=/dev/zero of=/mnt/gentoo/1g bs=1024k count=1k;time sync

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 2.04643 s, 525 MB/s

real    0m2.781s

user    0m0.003s

sys     0m1.548s

real    0m2.604s

user    0m0.001s

sys     0m0.000s

# time dd if=/dev/zero of=/mnt/gentoo/4g bs=1024k count=4k;time sync

4096+0 records in

4096+0 records out

4294967296 bytes (4.3 GB) copied, 13.0373 s, 329 MB/s

real    0m13.039s

user    0m0.002s

sys     0m6.794s

real    0m4.153s

user    0m0.000s

sys     0m0.345s

# hdparm -Tt /dev/md0

/dev/md0:

 Timing cached reads:   8514 MB in  1.99 seconds = 4268.86 MB/sec

 Timing buffered disk reads:  1188 MB in  3.01 seconds = 395.32 MB/sec

# time umount /mnt/gentoo

real    0m4.785s

user    0m0.001s

sys     0m0.734s

```

/dev/md0 512 chunk & ext3fs stride=128 (512/4) old method.

```

# time mke2fs -j -b 4096 -E stride=128 /dev/md0

real    6m28.650s

user    0m0.540s

sys     0m45.361s

# time mount /dev/md0 /mnt/gentoo/;time sync

real    0m0.216s

user    0m0.001s

sys     0m0.010s

real    0m0.001s

user    0m0.001s

sys     0m0.000s

# time dd if=/dev/zero of=/mnt/gentoo/1g bs=1024k count=1k;time sync

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 1.91196 s, 562 MB/s

real    0m3.597s

user    0m0.000s

sys     0m1.637s

real    0m2.761s

user    0m0.001s

sys     0m0.075s

# time dd if=/dev/zero of=/mnt/gentoo/4g bs=1024k count=4k;time sync

4096+0 records in

4096+0 records out

4294967296 bytes (4.3 GB) copied, 13.7545 s, 312 MB/s

real    0m20.004s

user    0m0.000s

sys     0m7.374s

real    0m3.352s

user    0m0.000s

sys     0m0.167s

# hdparm -Tt /dev/md0

/dev/md0:

 Timing cached reads:   8630 MB in  1.99 seconds = 4327.93 MB/sec

 Timing buffered disk reads:  1100 MB in  3.01 seconds = 364.93 MB/sec

# time umount /mnt/gentoo

real    0m4.775s

user    0m0.000s

sys     0m0.735s

```

I still believe that "chunk/block size" is optimal, lets just hope other benchmarks go in it's favor.

EDIT: did some early runs today here are the results.

```

Raid Level : raid0

Array Size : 3125665792 (2980.87 GiB 3200.68 GB)

Raid Devices : 8

Chunk Size : 1024K

# time mkfs.ext3 -b 4096 -j -E stride=2048 /dev/md0;time sync

mke2fs 1.40.2 (12-Jul-2007)

real    12m0.932s

user    0m0.485s

sys     0m39.654s

real    0m4.624s

user    0m0.001s

sys     0m0.005s

# time mount /dev/md0 /mnt/gentoo/;time sync

real    0m0.127s

user    0m0.002s

sys     0m0.006s

real    0m0.001s

user    0m0.000s

sys     0m0.001s

# time dd if=/dev/zero of=/mnt/gentoo/test bs=1024k count=16k;time sync

16384+0 records in

16384+0 records out

17179869184 bytes (17 GB) copied, 65.4599 s, 262 MB/s

real    1m10.561s

user    0m0.009s

sys     0m31.235s

real    0m4.611s

user    0m0.000s

sys     0m0.053s

# time umount /mnt/gentoo/;time sync

real    0m4.656s

user    0m0.000s

sys     0m0.587s

real    0m0.001s

user    0m0.002s

sys     0m0.000s

.: Stride Value Set To 256 :.

# time mkfs.ext3 -b 4096 -j -E stride=256 /dev/md0;time sync

mke2fs 1.40.2 (12-Jul-2007)

real    11m45.907s

user    0m0.495s

sys     0m39.765s

real    0m4.065s

user    0m0.000s

sys     0m0.001s

# time mount /dev/md0 /mnt/gentoo/

real    0m0.594s

user    0m0.001s

sys     0m0.005s

# time dd if=/dev/zero of=/mnt/gentoo/test bs=1024k count=16k;time sync

16384+0 records in

16384+0 records out

17179869184 bytes (17 GB) copied, 64.8825 s, 265 MB/s

real    1m5.374s

user    0m0.013s

sys     0m30.310s

real    0m4.733s

user    0m0.000s

sys     0m0.055s

# time umount /mnt/gentoo/;time sync

real    0m4.661s

user    0m0.000s

sys     0m0.594s

real    0m0.001s

user    0m0.001s

sys     0m0.000s

-- Reiser3.6 --

# time mkfs.reiserfs -q /dev/md0

mkfs.reiserfs 3.6.19 (2003 www.namesys.com)

real    1m43.844s

user    0m0.074s

sys     0m0.250s

# time mount /dev/md0 /mnt/gentoo

real    0m4.939s

user    0m0.001s

sys     0m0.027s

# time dd if=/dev/zero of=/mnt/gentoo/test bs=1024k count=16k;time sync

16384+0 records in

16384+0 records out

17179869184 bytes (17 GB) copied, 63.5959 s, 270 MB/s

real    1m4.085s

user    0m0.021s

sys     0m19.515s

real    0m5.133s

user    0m0.000s

sys     0m0.066s

# time umount /mnt/gentoo

real    0m5.938s

user    0m0.000s

sys     0m0.674s

-- JFS --

# time mkfs.jfs -q /dev/md0

mkfs.jfs version 1.1.12, 24-Aug-2007

real    0m1.791s

user    0m0.022s

sys     0m0.346s

# time mount /dev/md0 /mnt/gentoo

real    0m4.633s

user    0m0.000s

sys     0m0.001s

# time dd if=/dev/zero of=/mnt/gentoo/test bs=1024k count=16k;time sync

16384+0 records in

16384+0 records out

17179869184 bytes (17 GB) copied, 63.1783 s, 272 MB/s

real    1m3.876s

user    0m0.006s

sys     0m15.049s

real    0m5.094s

user    0m0.000s

sys     0m0.002s

# time umount /mnt/gentoo/

real    0m5.199s

user    0m0.000s

sys     0m0.296s

-- XFS --

# time mkfs.xfs -q -f /dev/md0

real    0m1.496s

user    0m0.003s

sys     0m0.033s

# time mount /dev/md0 /mnt/gentoo

real    0m4.451s

user    0m0.000s

sys     0m0.004s

# time dd if=/dev/zero of=/mnt/gentoo/test bs=1024k count=16k;time sync

16384+0 records in

16384+0 records out

17179869184 bytes (17 GB) copied, 65.2181 s, 263 MB/s

real    1m5.710s

user    0m0.007s

sys     0m20.209s

real    0m5.064s

user    0m0.000s

sys     0m0.008s

# time umount /mnt/gentoo/

real    0m5.458s

user    0m0.000s

sys     0m0.583s

```

----------

## StarDragon

I implemented this method on my laptop, and it worked like a charm. I have an older model and it ussualy chugs when doing a lot of tasks at once. But now, it seems to hum along just fine.  :Smile: 

----------

## Schizoid

I have switched a few of my partitions from XFS to ext3. I was wondering if it is safe to delete the lost+found directories that it creates in every partition? I would think that if there was some lost data it found that it would recreate that directory as needed?

----------

## i92guboj

 *Schizoid wrote:*   

> I have switched a few of my partitions from XFS to ext3. I was wondering if it is safe to delete the lost+found directories that it creates in every partition? I would think that if there was some lost data it found that it would recreate that directory as needed?

 

I am not 100% sure, but I think that that directory is re-created anyway each time you run fsck on the partition. In other words: if this is true, that directory would be recreated each time the partition is checked. This might be on each startup or after a number of mounts or after a given time. It all depends on how did you format your partition. tune2fs can be used to change those parameters without a reformat.

EDIT: you can also create it by hand by using "mklost+found". I don't know why would anyone want to delete that directory, though...

----------

## JeliJami

 *Schizoid wrote:*   

> I would think that if there was some lost data it found that it would recreate that directory as needed?

 

No it doesn't. If your lucky, it will restore a complete file into the lost+found directory, but with some predefined name; FSCK00001 for example, don't remember exactly (DOS's checkdisk utility did something similar, with CHCKDSK.001, I think)

But most of the time, you will only get partial files, without any clue to the original filename, or its original path.

----------

## pactoo

>I don't know why would anyone want to delete that directory, though...

Probably because nobody (except those few 20+ years Unix experienced die hard ubergeeks) knows how to actually recover those files, that are put in there after and by fsck.

----------

## number_nine

Are these Ext3 tips still relevant?

I've done some testing on my computer with gentoo-sources-2.6.23-r9 and bonnie++ (v1.03 compiled from source).

At this point, I'm most concerned with iowait associated with write performance:

```

                      Version  1.03      ------Sequential Output------

                                         -Per Chr- --Block-- -Rewrite-

                            Machine Size K/sec %CP K/sec %CP K/sec %CP

ext3-defaults               gentoo    8G 54120  78 62062  15 28364   5

ext3-noatime,journal_data   gentoo    8G 22792  34 30598   8 19971   4

ext3-noatime,writeback_data gentoo    8G 59236  85 60514  13 27127   4

ext2-noatime                gentoo    8G 63030  84 65542   6 27931   4

```

Notice how when I use journal=data (as suggested in this thread), I have the least write performance, but also decreased CPU usage.

The best performance appears to be with ext2, followed by data=writeback with ext3.

Thoughts?

----------

## XenoTerraCide

write is slower with data=journal than writeback. however write is faster that writeback while reading. and of course ext2 is faster it doesn't have journaling, which means less overhead. FAT is probably faster 2. but I would use it because I love my data.

----------

## XenoTerraCide

http://www.linuxplanet.com/linuxplanet/tutorials/6480/1/

interesting updates to ext3 that is causing problems with grub. perhaps something should be added to our tips? also does anyone know if these updates otherwise affect our tips. (such as whether data_journal is still the king of read)

----------

## XenoTerraCide

anyone know if data_journal offers the same benefits in ext4 (faster reads while writing)? or if there are any new enhancements and tweaks we can make?

----------

## arnuld

hey, what about noatime and reltime options  ?    I searched  some places and and found that people were praising these 2 mount options:

http://kerneltrap.org/node/14148

http://www.pervasivecode.com/blog/2008/05/15/recommended-mount-options-for-ext3/

----------

## XenoTerraCide

yeah... but noatime isn't really ext(x) specific. That applies to everyfs  and as far as I can tell there is never a reason not to use it. I've never had one.

----------

## BlackB1rd

Is it correct to assume that ext3 with data=journal performs better than default (ordered) on a server platform with multiple databases and "normal files" accessed by many concurrent users? The articles found on the internet are all written many years ago and I'm not sure if those results are still valid. And my biggest question would be why the default has changed from journal to ordered when the latter performs better on servers? Shouldn't that be the default setting when performing a server installation?

----------

## Cyker

 *BlackB1rd wrote:*   

> Is it correct to assume that ext3 with data=journal performs better than default (ordered) on a server platform with multiple databases and "normal files" accessed by many concurrent users? The articles found on the internet are all written many years ago and I'm not sure if those results are still valid. And my biggest question would be why the default has changed from journal to ordered when the latter performs better on servers? Shouldn't that be the default setting when performing a server installation?

 

AFAIK, the default journal mode has always been data=ordered.

data=journal is only faster when the filesystem is having to do lots of reads AND writes at the same time.

Formostly-reads, the other two are faster.

I used to use data=journal, but had to go back to data=ordered as data=journal makes ext4 throw out some warnings about disabling some of its features.

Note 'tho, that 'faster' is not some order of magnitude thing; It's the sort of 'faster' that is only really noticable in benchmarks  :Smile: 

----------

## XenoTerraCide

delayed allocation hasn't been written for data=journal yet in ext4, this is only a temporary thing. so far as I know the rest of the ext4 advantages work.

----------

## Strowi

hi,

i've been looking into this thread from time to time... does anyone have a clue about stride size for dmraid raid0? I am an using an nvidia-dmraid 2hdds and 1 350GB ext3-partition. I chose 64kb as stripe-size in the controller-bios and "dmraid -s" reports a stride-size of 128, should i use 128 as stride-option for mkfs?

greetings and thx for all the tips,

----------

