# XFS on steroids

## jsosic

Hi guys! I think it's about time to start another "lovers"-like thread  :Smile:  Subject is offcourse - SGI's brightest open source star - XFS. It has been around for a while, it has been used, tested, and generally people love it. It don't have bad history with unexplainable bugs and irecoverable partition corruptions, like Reiserfs did (except this 2.6.17.1 bug...) It's designed with speed in mind - primary focus being large files performance. AND It has long history with IRIX servers and workstations. Linux fork is as strong as the original, wich is not the case with some other filesystems (JFS for example), and as far as I know, they are fully compatible.

Stuff aren't so green after all, XFS is human made, thus is not perfect. It's flaws in term of desktop performance are deleting and small files. But, some good news  :Smile:  Flaws are only present in default instance, things can be easily fixed with some mount and/or mkfs options. XFS is very flexible and tuneable fs. Enough talk.... Let's go to business.

Why stock XFS may not be a good choice for a root partition of desktop OS? Because of forementioned deletion penalties and small files performance (which is not as good as JFS or ReiserFS's one). I'll mention few thing I encountered that explained how to overcome these flaws. If anyone has any other XFS tips, they're welcome to speak out loud  :Smile: 

Note: I'll regard to XFS with default settings which you get when running "mkfs.xfs /dev/hd?" as "stock XFS".

1. Block size

This option can be set only when formatting a partition. Default size is 4096 bytes. General rule in this case is that large allocation blocks improve performance - they reduce number of operations needed to retrieve file, and reduce fragmentation. XFS supports larger values, but currently it's impossible to use them because of Linux kernel limitation.

It's good to mention block size, because some people might think it would increase preformance if they format partition for small files with smaller block size, but that isn't the case. It's even opposite! Only benefit of smaller block sizes is in term of disk space, but these days, that really isn't a big problem.

2. Inodes

Bigger vs smaller inode size? Stock XFS sets inode size to it's minimum value - 256b. This is not the full inode size, but size of inode's variable part. So what it is for? XFS stores all kind of data in it: attribute sets, symbolic link data, extent list, root of a tree describing the location of extents. There are very rare cases in which you would benefit from larger inode, and if you choose to increase it, don't go over 512b, it's a real overkill. I'd advise staying with defaults here too, because if you increase inode size, HDD will waste much of it's precious time getting filesystem structures and not real user data.... Be carefull when increasing this! Some sites do advice it, and even I tought it is a good thing, but because XFS really doesn't pack smaller files into an inode, this really isn't necessary. It only stores FS data, and 256kB seems to be big enough.

3. Allocation groups

XFS filesystem is fragmented into subgroups. These groups are something like smaller partitions in a bigger one. This allows kernel to use paralellism - it can write to several parts of filesystem at the same time. Offcourse, disk head will still write data in one place after another, because disks have only one head - so this technology gives benefits before data is sent to a disk. If you have too much a. groups, your FS will be divided in many sections, and then it's very likely files will get fragmented in between two or even three section. Next bad thing is that when you fill up your FS, it'll start to use too much CPU. Those things slows things down dramaticly... It has been tought that at least 1 a. group is needed per 4GB, but some XFS developers denied that information, and marked it as obsolete on LKML recently. So, what to choose here? Depends on how much parallelism do you really need. This thing has it's benefits in server usage, but for desktop, 2 allocation groups per one CPU seems quite OK. If you're the lucky one, and have duo core, than choose 4 a.g.-s and be done with it. # mkfs.xfs -d agcount=2. You can set this option both as -d size, where you'll tell mkfs how large ag's you want to be, or agcount, which tells it how many groups you want. I prefer the second option. Note 2 - this option doesn't give much performance boost if your FS isn't filled, so you can safely leave it out and allow XFS to choose it's own value, if you want.

4. Blogs everywhere, errm, ... , I mean journals  :Wink: 

XFS is journaling FS, so - it has a journal. You can set it to reside on another partition, or you can even use one partition as a shared journal for another XYZ partitions. But, interesting thing is size journal. It has quite a big impact on FS performance when you're doing lots of I/O-s. Bigger journal means more space for metadata of metadata, and that can improve transactions. Drawback of larger journal is that less space is availabe for data. Stock XFS sets journal size quite low, so it's a really good investment to increase it. 128MB's seems fine tradeoff between space/performance. Also note that journal is used only when you write/delete data to disk, so this does not increase read performance. 

```
# mkfs.xfs -l size=128M /dev/hd?
```

 Another cool thing is logbufs mount option. Logbufs tell xfs how much blocks of 32K of journal information to keep in RAM. Default (and minimum) is 2, and maximum is 8. We'll choose 8, which will use 8x32KB of our precious memory  :Smile:  These two tweaks solve one XFS issue - poor deleting performance. Now we're getting somewhere!  :Smile: 

5. Fragmentation issues

XFS really shines in this area. It fragments madly as hell when a partition is > 85% full, but this can still be solved. To check your xfs partition's fragmentation level online (while it's mounted), use the following: 

```
# xfs_db -c frag -r /dev/hdXY 
```

. If you want to lower fragmentation level, simply use: 

```
# xfs_fsr /dev/hdXY 
```

.

Conclusion

Already? Yes  :Sad:  Let's see what stock XFS sets at all these fancy options:

```
# mkfs.xfs -isize=2k -dfile,name=/dev/null,size=214670562b -N

meta-data=/dev/null    isize=2048  agcount=32, agsize=6708455 blks
```

Seems nice? Now, let's take a look at what we have so far:

```
# mkfs.xfs -l internal,size=128m -d agcount=2
```

Cool  :Smile:  And don't forget noatime, logbufs=8 mount options!!

I'm sorry that I haven't provided any tests from real life yet, but I'll get my hands on one Maxtor 120GB hard drive soon, and I'll need a help on suggesting tests for filesystems. My primary interest is to see how JFS stands up against XFS on steroids, altough I'm willing to try Reiser3.6 and ext3+dir_index too.

XFS lovers, please join!Last edited by jsosic on Mon Nov 06, 2006 11:12 pm; edited 2 times in total

----------

## Enlight

Well known lover in here ;o)

as for the allocation group stuff you can also alterate the behaviour via the fs.xfs.rotorstep sysctl. But cares! One file can't be fragmented between different AG's, in fact all files belonging to a directory will go in the same AG, generally you only switch AG's while entering another directory or a subdirectory, that's what the sysctl is for. As for the 4gb limitation becoming obsolete, could you please point me to the llink?

concerning jfs, it know seems really far from others filesystems when it comes about performances. 

Also something I love about xfs is creating an 4gb image... it takes 3 extents with ext3 It falls somewhere inbetween 350 or 550 extents...

edit just verified, on linux you still can't mount a partition with blocsize > arch's page size! I.e. 4k!

----------

## feld

stock XFS keeps auto unmounting on me and dmesg is spitting an error about corrupted memory. My memory is 100% fine, i've tested. 2.6.18 rc's is where this happens.

Anyway, XFS is nice, but I like ext3 a lot too.

----------

## jsosic

 *Enlight wrote:*   

> Well known lover in here ;o)
> 
> as for the allocation group stuff you can also alterate the behaviour via the fs.xfs.rotorstep sysctl. But cares! One file can't be fragmented between different AG's, in fact all files belonging to a directory will go in the same AG, generally you only switch AG's while entering another directory or a subdirectory, that's what the sysctl is for.

  I'll try it!.

 *Quote:*   

> As for the 4gb limitation becoming obsolete, could you please point me to the llink?

 

http://marc.theaimsgroup.com/?l=linux-kernel&m=114843765813339&w=2

```
On Tue, May 23, 2006 at 06:41:36PM -0700, fitzboy wrote:

> I read online in multiple places that the largest allocation groups 

> should get is 4g,

Thats not correct (for a few years now).
```

That was written by Nathan Scot, SGI's XFS developer on LKML...

 *Quote:*   

> concerning jfs, it know seems really far from others filesystems when it comes about performances. 

 

Well, I don't know... I've had it and it on one machine and it seemed pretty fast to me...

 *Quote:*   

> edit just verified, on linux you still can't mount a partition with blocsize > arch's page size! I.e. 4k!

 

Thanx!

----------

## brazzmonkey

this should go in "documentation, tips & tricks" !! thanks for this !

----------

## pactoo

Just some Sidenotes, hopefully not too off topic:

```

(except this 2.6.17.1 bug...)

```

```

"To add insult to injury, xfs_repair(8) is currently not correcting these directories on detection of this corrupt state either. This xfs_repair issue is actively being worked on, and a fixed version will be available shortly.

Update: a fixed xfs_repair is now available; version 2.8.10 or later of the xfsprogs package contains the fixed version"

```

http://oss.sgi.com/projects/xfs/faq.html#dir2

Unfortunately, not yet in portage

```

If you want, you can set it to reside on another partition

```

```

"In fact using an external log, will disable XFS' write barrier support"

```

http://oss.sgi.com/projects/xfs/faq.html#wcache_fix

...just in case, write barrier is desired

----------

## jsosic

pactoo thanx for your help! Could you please explain what do those barriers do? Is it worth it to mount fs with nobarriers option?

I'm trying to figure out a way to test reading speed of filesystems... Like copying to another hard drive, but destionation in this case is /dev/null. Only way I figured so far is 

```
time tar cvf - . | cat > /dev/null
```

. Is this OK for those kind of tests? Thanx.

----------

## Enlight

you can create empty files of given size using dd if=/dev/zero of=$my_file bs=$my_size count=1 then cat them to /dev/null with something like : 

for i in $file_1 $file_2 ... file_n; do cat $i > /dev/null;done

Other than this, my references tests were extracting a stage3 tarball or moving a portage subtree (without distfiles as an example) then the time needed to delete all this. I think both give a general idea of the fs's performance.

----------

## jsosic

I've tested this tar method, and it seems that it really works... 8 minutes for reading out my 3.5 GB /usr partition, which seems ok.

----------

## jsosic

I've begin a very intense FS performance test, and after testing only few filesystems, I've encountered really wierd results... I was planning to publish it in documentation and tips/tricks section of Gentoo Forums. My first test is writing speed test -> "time cp -a /usr /partition". /usr and /partition are on the different HDDs. Second test is reading speed test "time tar -cf - /partition | cat > /dev/null". One of the last tests is bonnie++, and for what it seems, bonnie++ gives me totally oposite results of the two previous tests. For example, copying 4.1gb /usr dir on stock XFS lasts 11m, and on JFS 14min, but JFS gets 5-10% better results on bonnie++ test. What do you think, why is this happening?

I've opened new thread with discussion about this:

https://forums.gentoo.org/viewtopic-t-489408.html

----------

## all-inc.

hi,

i just bought an amd64 notebook and i want to test xfs on it! up to now i just used reiserfs and ext3...i'll give xfs a try.

my question is now, what could be the best setting for the -s option (sector size)? and what about the -n Naming options? i'll post some benchmarks on my this way optimised created xfs filesystem compared to an optimised created ext3 fs soon.  :Wink: 

if u have any other peformance recomments, just say!

thank you, all-inc.

----------

## jsosic

I think block size is maximum 8192 if you use 64bit kernel.... Anyway, try it yourself, format partition in 8k or 16k and try to mount it. If you can mount it, than that block size is supported. Under 32bit kernel maximum blocksize is 4096bytes, so I presume 8k is the maximum under 64bit kernel.

I would suggest 8k naming (two 4kB blocks on 32bit system)...

Try something like this:

```
# mkfs.xfs -l version=1,size=64M -n size=8k -i size=1024 -f /dev/hdX 
```

Mount it with nodiratime,noatime,logbufs=8 options. AFAIK version 1 of log is faster than version 2, 64M log size good choice. Other options are 8kB dir size (naming options) and 1024 bytes inodes.

And for ext3 try this:

```
mkfs.ext3 -J size=100 -m 1 -O dir_index,filetype,has_journal /dev/hdc1

tune2fs -o journal_data_writeback /dev/hdc1
```

This makes journal of 25600 blocks (100MB on system with 4kB blocks)

Please post your results afterwards. In my tests, this ext3 slams door with XFS  :Sad: 

----------

## all-inc.

hi,

thank you, i think i'll add -b size=8192 -d agcount={size/8} like mentioned in the initial post ^^

and like i wrote, i have a 64-bit system and i'll use a kernel with 8kB paging...

unfortunately you didn't answer my important question  :Wink:  -> whats about the sector size( -s )?

it would be nice if u could tell me more precises informations and results about your test/benchmark. did u use bonnie++? or how did u test? what was the size of the hd you used?

other question: if v1 of logging is faster, what are the advantages of v2, who use it?!

thanks again, all-inc.

----------

## jsosic

To tell you the truth, I don't know about sector size.... I can test it though  :Smile:  Yes, I was using bonnie++ for testing fs.

If you use version2 logging you can set bigger buffers (logbsize=64k, logbsize=128k or even 256k), but in the tests it didn't gave me any significant performance advantages...

----------

## all-inc.

hi, i just ran bonnie++ ^^

the results aren't nice when you look at the create/deletion times..

if someone can tell me if i'm doing something wrong with xfs, would be great  :Smile: 

partition size is about 24GB, cpu AMD64 Turion 1,6GHz, 1GB DDR2 RAM, live-cd (!) kernel 2.6.15-gentoo-r5

xfs mount options: noatime,nodiratime,logbufs=8

xfs_info output:

```
meta-data=/dev/hdc7              isize=2048   agcount=3, agsize=2020841 blks

         =                       sectsz=512

data     =                       bsize=4096   blocks=6062521, imaxpct=25

         =                       sunit=0      swidth=0 blks, unwritten=1

naming   =version 2              bsize=4096

log      =internal log           bsize=4096   blocks=16384, version=2

         =                       sectsz=512   sunit=0 blks

realtime =none                   extsz=65536  blocks=0, rtextents=0
```

tune2fs -l output(important things):

```
Filesystem OS type:       Linux

Inode count:              3031040

Block count:              6060513

Reserved block count:     60605

Free blocks:              5939758

Free inodes:              3031029

First block:              0

Block size:               4096

Fragment size:            4096

Blocks per group:         32768

Fragments per group:      32768

Inodes per group:         16384

Inode blocks per group:   512

Mount count:              1

Maximum mount count:      26

First inode:              11

Inode size:        128

Journal inode:            8

Default directory hash:   tea

Journal backup:           inode blocks
```

reiserfs 3.6 is used, i'm too lazy to write the infos right now ^^ if someone wants to know, feel free to ask, then i'll post

at last but not least, the bonnie results:

```
---xfs---

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

ksjuscha      2000M   450  98 30239   8 14232   4   774  97 31358   5 170.9   2

Latency             20650us    9148ms     185ms     122ms   27909us     285ms

Version 1.93c       ------Sequential Create------ --------Random Create--------

ksjuscha            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

                 16  1901  14 +++++ +++  2289  15  1436  13 +++++ +++   511   3

Latency               163ms      75us     229ms     116ms      42us    1766ms

---reiserfs---

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

ksjuscha      2000M   136  99 28853  16 13939   5  1023  99 29960   7 179.3   4

Latency               160ms    4121ms    1781ms   19500us   57976us    2839ms

Version 1.93c       ------Sequential Create------ --------Random Create--------

ksjuscha            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

                 16 11049  89 +++++ +++ 11669  99 11334  92 +++++ +++ 11034  99

Latency             13572us    1791us    1934us     308us      42us    1579us

---ext3---

Version 1.93c       ------Sequential Output------ --Sequential Input- --Random-

Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

ksjuscha      2000M   205  98 11841   9  8356   5  1013  96 26845   5 171.9   2

Latency               127ms    2067ms    2350ms   55418us     103ms    1042ms

Version 1.93c       ------Sequential Create------ --------Random Create--------

ksjuscha            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

                 16 22648  80 +++++ +++ 23784  79 22316  78 +++++ +++ 24884  83

Latency             13549us     339us     423us   21742us      41us      74us
```

have fun with it... this is a new system, as u can see, no hdparm tuning or so has taken account here  :Wink:  i just wanted to show a rough result.

good night, all-inc.

----------

## jsosic

What parameters did you run bonnie with? Here's what I've tested:

```
# bonnie++ -u root:root -b -x 1 -s 2016m -n 16:100000:16:64 -d /bonnie/
```

After writing 2GB of data, bonnie++ makes 16x1024 files distributed in 64 directories sized between 16b and 100kB. It's approximate test for the situation FS encounters when loading programs on Unix system (/bin, /usr, /opt, /etc ....)

I'll upload results later, but from what I can see, ReiserFS and ext3 preform very very poorly against even badly optimized XFS... JFS showed as the fastest FS on the test, which may have something to do with it's low CPU usage.....

----------

## jsosic

Here are my tests:

http://adria.fesb.hr/~jsosic/mojbench.html

If you prefer OO.org 2.0, just change extension from ".html" to ".ods".

I'm awaiting comments  :Smile:  JFS rocks!

----------

## zAfi

this may be a little off-topic, but for what do you need the mount options nodiratime and logbufs?? Does it improve anything? 'Cause i didn't found anything in the man page nor somewhere else, so plz a brief explanation!!  :Very Happy: 

thx....

----------

## jsosic

It's not offtopic  :Smile: 

"nodiratime" is equal as "noatime", but it has effects on directories (as oposed to noatime which effects files). Brief explanation is in mount( :Cool: . I don't know if noatime implies nodiratime or not, so I always mount with both of them.

Logbufs is XFS specific feature, and explanation is in mount( :Cool: .

 *Quote:*   

> logbufs=value  
> 
>               Set the number of in-memory log buffers.   Valid  numbers  range
> 
>               from 2-8 inclusive.  The default value is 8 buffers for filesys-
> ...

 

More log buffers - better troughput when filesystem is conducting lots of writes.... Because every write in journal costs time, head of HDD must be relocated on journal blocks, and than back on data blocks. So, as you can see from my tests - to tune XFS it's enough to format it larger journal and mount it with logbufs=8.

----------

## zAfi

thx for the explanation and yes, I did found the answers in man mount now too as well!  :Wink: 

I have an external HDD with XFS on it. This is the output of xfs_info:

```
meta-data=/dev/seagate           isize=256    agcount=16, agsize=2441879 blks

         =                       sectsz=512   attr=0

data     =                       bsize=4096   blocks=39070064, imaxpct=25

         =                       sunit=0      swidth=0 blks, unwritten=1

naming   =version 2              bsize=4096

log      =internal               bsize=4096   blocks=19077, version=1

         =                       sectsz=512   sunit=0 blks

realtime =none                   extsz=65536  blocks=0, rtextents=0

```

It was formatted under SuSE Linux 10.0, I think with default values. To make it xfs_on_steroids, thats all I have to format it with??

```
mkfs.xfs -l internal,size=64m -i size=2048 -d agcount=19
```

thx...

----------

## jsosic

You'll loose all your data if you format disk, you know that? Also, you can set 128m journal, even bigger than 64. I was mistaken in my first post, as these test show, inode sizes don't increase speed of FS, they degrade it instead  :Sad:  I misunderstood man page, and saw advice on one page explaining XFS, so in fact inode is slowing it not speeding.

If you use your external disk only for large files and don't write/delete lots of files often (like compiling, installing programs, emerge sync...), than there's no need to reformat your drive. These tweaks are only to speed up XFS with small files.....

----------

## all-inc.

puh, this benchmarks confuse me...

(i just wanted to know which FS has the best performance for /. so it has to handle well with much small files and compiling. my /home dir will be on a separate partition. i'm also asking what's best for this part, now it has to handle bigger files...)

why didn't you use -b size=8196 in your tests, jsosic? not supported by your kernel?

and why did you set mountoption nointegrity to jfs? this disables writing to the journal, isn't that bad? *g*

i ran bonnie default, without options (only -s 2000m). why did you use 2016 instead of 2000?? i will rerun bonnie with your options soon, and present my results. did you use any batch tool for creating this nice table? or did you manually reformat and rerun bonnie for every test(and then manually put it in a table)?

ok, results will follow soon   :Cool: 

EDIT: won't you change the indoe size part in your initial thread, so nobody reads only the first post and creates a slow fs ^^ ?

----------

## zAfi

 *jsosic wrote:*   

> You'll loose all your data if you format disk, you know that? 

 

Yes yes, I know!   :Very Happy: 

 *jsosic wrote:*   

> ... your external disk only for large files and don't write/delete lots of files often (like compiling, installing programs, emerge sync...), than there's no need to reformat your drive. These tweaks are only to speed up XFS with small files.....

 

What is a "small file" for you? Some small text files with some KB or standard mp3 with 3-5 MB?

----------

## jsosic

zAfi, small file is everything <50kB  :Smile:  2-5MB is a big file!

all-inc, I too wanted to know what is the fastest FS for /. But note this - root partition FS should have fast reading and not fast writing! Cause, you emerge new program now and then, but you use it every day, so reading is the key to best preformance.

Also, to minimize fragmentation, you should move portage, /usr/src and tmp directories from your root partition. This is my scheme I came up with after these tests:

```
/dev/hda1      /boot      ext2   defaults,noauto         1 2

/dev/hda2      /      jfs   defaults,noatime,nodiratime   0 1

/dev/hda3      /var      xfs   logbufs=8,noatime,nodiratime   0 2

/dev/hda6      /home      xfs   logbufs=8,noatime,nodiratime   0 2
```

Also, I've relocated portage tree and distfiles from /usr to /var, I've relocated kernel sources from /usr/src to /var/src, and all compilation is going on on this /var partition. Root (/) is only for programs, and nothing more. To make all this reallity, I've used symlinks and make.conf variables, to connect system with original locations of files. Now, you may get the idea why I tested JFS with nointegrity - because there's no evil in using FS without journal for a partiiton that has no valuable data (portage tree, kernel sources, tmp files......), and to measure it against ext2 (which I've used so far for this purpose). I've decided after all to format this partition in xfs (hda3), because it's utils have defragmenter utility (xfs_fsr), and it will be needed on that partition here and then  :Smile: 

I'm on x86 kernel (AthlonXP Barton Mobile @ 2000mhz, 1GB of RAM), so that's why I couldn't test 8kB blocks  :Sad:  I've used 2016, cause my system reported 1006MB of ram (or something like that), and for bonnie tests to function, you need to put -s [doubleofram]. As far as tables are concerned, I did it by hand myself.... Yes, I've manually reformat, remount with specified options, and rerun bonnie tests. I've had a free day with nothing to do except learn, so I was just running tests and copying results to txt file, which I later formatted in openoffice spreadsheet tables and export it to HTML. I'll edit my original post and remove inodes info, and when I got time, I'll rewrite complete "XFS on steroids" and post it under documentation and settings with parts of man pages included for explanations of all options.

I'm kinda confused with results too because I tought Reiser handles much better with small files. I wanted to test Reiser4, and I have it in my kernel, but for some strange reason, bonnie test was still running after half an hour, while on other FSes was over in less than 15 minutes, so I decided to spare HDD  :Smile: 

Here's my box:

AthlonXP Barton mobile 2000mhz

Kernel 2.6.16 (beyond 4.1 patch)

1024MB of ram, Maxtor 120GB (30GB partition).

If you want to rerun tests, please do so, but you only need to do it for a few filesystems, no point in repeating all this XFS testing.....

1. ext3 with dir_index and data=writeback

2. jfs with double journal size (-s [0.8% of your partition size])

3. ReiserFS v3.6

4. ReiserFS v4

5. XFS with 128mb log and logbufs=8 option

----------

## zAfi

hvala!

I'll leave it as is cause it works great and even better now with those 3 new mount options!! thx again...

----------

## Cinquero

Well, I really cannot reproduce some of the results here. To be honest, I don't like synthetic benchmarks at all and usually run more realistic tasks. For example, I compared tar'ing two different portage trees -- each three times -- in parallel on ext3 and xfs. That is, 6 parallel "tar cf" commands taring the two portage trees from the benchmark disk to the same disk.

On ext3, it took 847 seconds. On xfs, only 40% of the full tar file size had been reached after the same amount of time.

I don't think I will try the extra mount options as everyone should really know by now that XFS is not suited for desktop use.

Use XFS for your DV scratch disk. ext3 should still be the best solution for the usual desktop use.

----------

## jsosic

Wanna talk syntetic? Well, untaring 3 tars at the same time on the FS is more syntetic than bonnie...That's action that will very rarely and I would dare to say never occour on a desktop system.... Also, filesystem you use for your desktop should have fast reading and low latencies, fast seeks, and bonnie tests just that. It creates files with names in sequential order (1234, 1235, 1236, 1237, 1238), but writes them randomly on disk (eg  { [1237] [1234] [1238] [1236] [1235] } ), and then reads them in numerical order, so seek times of FS come really into play. If files are of random size, you've got real life test. Untaring is pretty straightforward, and occours once in a lifetime of a program. You emerge program once, and run X times after that! I would change all these filesystems for a one that writes files 5 times slower, but organizes them ideally so read times are 1.5x to 2x faster than XFS/JFS/ext3 !!!  Desktop would benefit from such approach. Also, you forget importance of partition layout and placement of the files on partition(s), sometimes it's far more important than FS...

----------

## Cinquero

 *jsosic wrote:*   

> Wanna talk syntetic? Well, untaring 3 tars at the same time on the FS is more syntetic than bonnie...

 

I have not seen any concurrent access timings in your results. That's mainly why I call it synthetic. I often untar/tar large archives, sync the portage tree, run updatedb, some checksum operations, and edit large images in gimp at the same time (well, maybe not ALL of it at the same time, but some) and THAT bogs down my system extremely. And that is what I personally feel is the most critical situation for a desktop because the desktop latency then goes down to 1-2 minutes... although I am using ionice, CFQ, and the big kernel lock (and I have 1280 MB RAM).

----------

## jsosic

Well, if you use so much concurrent operations on the same partition, than ext3 with data=journal mount option is a way to go....

----------

## jsosic

Bonnie has option for defining number of concurrent requests  :Smile:  And as I've said earlier, I would give up all the writing performance on some partitions just for a few MB/s faster reads....

----------

## brazzmonkey

so, in short, for an xfs 32-bit desktop system, and a 16 gb partition, you would recommend :

```

mkfs.xfs -l version=1,size=64M -n size=8k -i size=1024 -d agcount=2
```

then i could mount my partition using the following options :

```

noatime,nodiratime,logbufs=8
```

would that be ok ?

----------

## all-inc.

DON'T use mkfs.xfs -i size=1024. this suggestion is wrong, as described in this thread, only a mistake in the initial post... -d agcount=n where n is the size of you partition in GB divided by 4 to 8. and you don't have to use the nodiratime mount option, it is implied by noatime.

BTW which way you all convert your root filesystems? i booted a livecd(2006.1) and run 

```
rsync -e rsh -aSq
```

 to put it all on free space to another box. i just had to type the rsh and rsync commands with their paths, because the gentoo minimal livecd doesn't provide them(/mnt/gentoo/usr/bin/{rsync,rsh}). i first tried tar clafS which seems also ok but doesn't handle sockets... my partition layout now(of course everything mounted with noatime):

```
/dev/hda8             6,8G  4,1G  2,8G  61% /   jfs (journal size=0.8%)

udev                  252M  296K  251M   1% /dev

/dev/hda9             3,8G  1,8G  2,1G  47% /var  xfs (logbufs=8,nobarrier(since 2.6.17, my hd doesn't support...))

/dev/hda10            3,8G  1,2G  2,7G  30% /home xfs

/dev/hda7              33G   17G   16G  53% /mnt/media  ntfs-3g(so win can access them...sometimes unfortunately neccesairy :( but the new ntfs-3g performance is nice)

none                  252M  4,0K  252M   1% /dev/shm

/dev/hda6              24G   22G  1,9G  93% /mnt/data  ntfs-3g
```

----------

## Don-DiZzLe

Ok, first of all do I first need to make an XFS partition with for example gparted and then enter the following code

```
mkfs.xfs -b size=8192 -l internal,size=128m -d agcount=20 /dev/sdb1
```

to have an XFS of steroids partition or do just input the code directly in the terminal without creating an XFS partition first?

----------

## brot

Thank you for your Tips. I am using XFS since 3 years now, and the first use of it was my router. From time to time it got its  power chord plugged out while running, but XFS survived until now, and i think it will the next 3 years  :Wink: 

----------

## jsosic

 *Don-DiZzLe wrote:*   

> Ok, first of all do I first need to make an XFS partition with for example gparted and then enter the following code

 

You can enter the code in the shell directly. I presume you're new to linux, so think of it like typing "format d:" in DOS  :Smile: 

BTW, for all of you who didn't knew, xfs utils incorporate xfs_fsr, it's a defragmenter (filesystem reorganizer). So, to check the currnet state of your XFS (online), type this as root:

```
xfs_db -c frag -r /dev/hdXY
```

That will tell you percent of fragmentation on that partition. After that, simply run:

```
xfs_fsr
```

to reorganize all XFS partition defined in fstab.

Good luck!  :Smile: 

----------

## sloppy

About allocation groups in XFS...  When I was reading a paper (warning: 8 MegaByte PDF!  See the chapter called "Exploring high bandwidth filesystems on large systems" by the SGI employees) about XFS scalability, it became apparent just what enormous workloads XFS was intended for.  The default number of allocation groups that mkfs.xfs creates, and that whole business about megabytes per allocation group, was probably intended for ridiculously-large scales.  If you're putting together a gigantic hundred-disk enterprise server used by thousands of concurrent users, then maybe the defaults make sense (but on the other hand, you're probably not using Gentoo).

For a desktop system or even a medium-business server, the defaults are way too high.  When choosing allocation groups, the thing to think about is how many parallel writes you're going to have going on at once, and I mean writes that extend files, not writes into middle of files as you would have with relational databases.  The size of your volume does not matter, so don't choose it by dividing Gigabytes by some constant.  Choose it by thinking about file-extending writes.

On desktops and light servers, I've never used an agcount higher than 4, and I've been pretty happy so far.

----------

## jsosic

Excellent point! We're all going to keep that in mind...

----------

## sloppy

Sorry about the link to the 8MB PDF.  I found a much smaller PDF that just includes the XFS scalability paper all by itself.

----------

## stahlsau

heya

thanks for the post. I'm running XFS for some years now and never had a problem, it's perfectly stable, it doesn't get corrupted when plugging out the power, and it's faster than any other fs i've tried before. Well, maybe jfs could be faster, but it bombed my hd and made me having to install a 3 months old backup  :Smile: 

The logbuf mount-option seems to help a lot - no more lightning slow deletes  :Wink: 

Anyway, maybe you could correct you're first posting with the new insights you got in this thread. This would help people a lot by not forcing them to reformat after reading the second page  :Wink: 

kthxbye

----------

## jsosic

I've fixed original post, and even included xfs_db & xfs_fsr hints.

----------

## vipernicus

Can you defrag XFS while it is online/mounted?

----------

## brot

yes you can  :Smile: 

(with xfs_fsr as root)

----------

## kos

This is strage, but xfs_fsr is missing in my system.

 *Quote:*   

> 
> 
> root@kos ~ $ equery f xfsprogs | grep bin
> 
> /sbin
> ...

 

Is there any other way to defragment XFS?

----------

## brot

I forgot: you have to emerge xfsdump first  :Wink: 

----------

## vipernicus

Do you guys use CFQ or Deadline with XFS?  And why?

----------

## kos

 *brot wrote:*   

> I forgot: you have to emerge xfsdump first 

 

thanks  :Smile: 

----------

## stahlsau

 *Quote:*   

> Do you guys use CFQ or Deadline with XFS? And why?

 

I use CFQ. Why? I like the name.

Seriously, i never noticed a difference when i switched schedulers, so i stayed with the default.

----------

## Don-DiZzLe

Hello,

I would like to make a 20GB partition using the XFS on steroids command;

ubuntu@ubuntu:~$ sudo mkfs.xfs -l internal,size=128m -d agcount=2 /dev/sda1

Cannot stat /dev/sda1: No such file or directory

How do I go at it?

----------

## stahlsau

 *Quote:*   

> ubuntu@ubuntu:~$ sudo mkfs.xfs -l internal,size=128m -d agcount=2 /dev/sda1
> 
> Cannot stat /dev/sda1: No such file or directory 

 

1st: use a REAL(tm) distro...for example gentoo  :Wink: 

2nd: try a "ls /dev". I bet /dev/sda1 isn't there, so you got probably something misconfigured in your kernel. Or maybe the drive is detected as ide, like /dev/hda or /dev/hdb? It's not xfs's fault that the device isn't there, it could be some udev-thing or something.

----------

## irondog

I'm using XFS on LVM2. Is this a problem?

```
Filesystem "dm-1": Disabling barriers, not supported by the underlying device

XFS mounting filesystem dm-1

Ending clean XFS mount for filesystem: dm-1
```

----------

## Sachankara

 *irondog wrote:*   

> I'm using XFS on LVM2. Is this a problem?
> 
> ```
> Filesystem "dm-1": Disabling barriers, not supported by the underlying device
> 
> ...

 Nope.  :Smile: 

----------

## irondog

 *Sachankara wrote:*   

> Nope. 

  Nope? Wouldn't I have barriers otherwise, I.E. when not using LVM, or are barriers worthless?

----------

## whiskas

I'm using XFS on top of some EVMS volumes.

I'm getting the same barriers not supported message when mounting, but otherwise everything works allright.

I'm a little bit curious about the in-kernel device-mapper implementation and how it doesn't know to implement write barriers on top of the real hardware. Perhaps someone with more knowledge could provide more information on this...?

----------

## Thaidog

What is the status on Linux and XFS realtime subvolumes? (GIO... etc)

Does 2.6 support any of that and if so how do you implement it?

----------

## irondog

What could be the possible consequence of not having write barriers enabled? Anyone?

I have to say there is very little information to find about JFS. So I decided to find it out myself. I'm testing a combination of JFS / XFS now. The overall feeling is quite good, but that's also the case whenever I start using a newly formatted ext3 filesystem. So lets see what happens when time goes by.

----------

## Thaidog

 *irondog wrote:*   

> What could be the possible consequence of not having write barriers enabled? Anyone?
> 
> I have to say there is very little information to find about JFS. So I decided to find it out myself. I'm testing a combination of JFS / XFS now. The overall feeling is quite good, but that's also the case whenever I start using a newly formatted ext3 filesystem. So lets see what happens when time goes by.

 

Some situations will keep the log data in cache instead of getting written to disk first. If your system crashes or has power loss your disk could lose serious data or metadata to the point of not being recoverable.

----------

## biggyL

Hello All,

I have Pentium II (Deschutes) with first 10GB (/dev/hda) and second 60GB(/dev/hdc) disk.

After reading this thread and some SGI docs and FAQs I came with this options for creating FS and mounting the disks:

1) To create XFS on hda:

```
# mkfs.xfs -l internal,size=128m -d agcount=2 /dev/hda
```

I've also seen "d unwritten=0" option:

mkfs  Unwritten Extents

 Unwritten extents are used to support pre-allocation.

 Default is enabled.

 To disable unwritten extents you would use:

# mkfs d unwritten=0 device

Filesystem write performance may be negatively affected for unwritten file extents,

since extra filesystem transactions are required to convert extent flags for the

range of the file written.

So my question:

Is it safe to add d unwritten=0 option to increase performance like this (or will I lose some essential functionality)?:

```
# mkfs.xfs -l internal,size=128m -d agcount=2 d unwritten=0 /dev/hda
```

2) To prevent data lost in case of power outage(Disabling the write back cache):

Add the following to local.start:

```
# hdparm -W0 /dev/hda

# hdparm -W0 /dev/hdc 

# blktool /dev/hda wcache off

# blktool /dev/hdc wcache off
```

Right?

3) Mount options:

On this thread it's suggested that the mount options should be "noatime,logbufs=8"

But what about "osyncisdsync" mount option.

 osyncisdsync

 Writes to files opened with the O_SYNC flag set will behave as if the O_DSYNC flag

had been used instead.

 This can result in better performance without compromising data safety.

 However timestamp updates from O_SYNC writes can be lost if the system crashes.

Use osyncisosync to disable this setting.

So do you think it is safe to add "osyncisdsync" mount option to fstab?

I'd appreciate any comments/answers.

----------

## jsosic

 *biggyL wrote:*   

> 
> 
> I've also seen "d unwritten=0" option:
> 
> mkfs  Unwritten Extents
> ...

 

I think that performance gains with unwritten extents are negligible, so I would rather pass that option out. Write performance is affected, because FS needs to convert extents from unwritten to written throughout the site of file that is being written. If someone could make a few tests, that would be nice, but I think it's not some magic option.

 *biggyL wrote:*   

> 
> 
> 2) To prevent data lost in case of power outage(Disabling the write back cache):
> 
> Add the following to local.start:
> ...

 

Disabling writeback cache also significantly reduces performance, but then again, test it first, maybe it's not such a big deal.

 *biggyL wrote:*   

> 3) Mount options:
> 
> On this thread it's suggested that the mount options should be "noatime,logbufs=8"
> 
> But what about "osyncisdsync" mount option.
> ...

 

I haven't tested this option either... Could someone run Bonnie and post results?

----------

## octoploid

 *jsosic wrote:*   

> 
> 
>  *biggyL wrote:*   3) Mount options:
> 
> On this thread it's suggested that the mount options should be "noatime,logbufs=8"
> ...

 

There is no need to add "osyncisdsync" to fstab, because it is enabled by default since 2002...

----------

## biggyL

octoploid

Right,

[esandeen@neon linux-2.6.20]$ grep -r osyncisdsync fs/xfs/xfs_vfsops.c

                 } else if (!strcmp(this_char, "osyncisdsync")) {

         "XFS: osyncisdsync is now the default, option is deprecated."); [esandeen@neon linux-2.6.20]$

jsosic

Here is a response from Timothy Shimmin from SGI (at least his e-mail addr. have @sgi.com    :Smile: ) on my "unwritten=0" query:

My understanding (although I'm not familiar with that code), is that unwritten extents are used in space preallocation.

So unless you reserve space for a file it will not have an effect.

And if you do, then setting "unwritten=0" will speed up writes because it doesn't need to flag the unwritten extents and write out the extra transactions for this.

If the unwritten extents aren't flagged as such then there can be a security issue where one can read old data (other's data) for these unwritten parts.

In fact, the security issue on preallocation (1997-98 sgi-pv#705217) was what motivated the idea of flagging extents as unwritten in the first place.

----------

So my choice is to set "unwritten=0" on this particular machine (PII with only one console access user - root )   :Very Happy: 

----------

## prymitive

I tried xfs but it was so slooooooooow, good thing I figure it out, mount Your xfs partition with nobarrier, it is default since 2.6.17 and it flushes write cache to often.

----------

## dentharg

I am really interested in data journaling in XFS. Is it in there somewhere? I have had quite a few lost files on XFS when power goes down (those files were filled with garbage). Did something change in that matter?

----------

## fik

 *prymitive wrote:*   

> I tried xfs but it was so slooooooooow, good thing I figure it out, mount Your xfs partition with nobarrier, it is default since 2.6.17 and it flushes write cache to often.

 

You are right, "nobarrier" speeds up the writing performance of xfs significantly  :Very Happy: , as evidenced by my small benchmark:

time emerge bash

nobarrier,logbufs=8: 2:48.237

logbufs=8              : 3:24.496

nobarrier               : 2:50.432

default                  : 3:23.414

(I added a code to fill the cache with other data before emerge and I did 4 runs, the real time of the quickest is reported)

But is the "nobarrier" safe?  :Confused: 

----------

## blkdragon

hey, i've been trying out xfs, but don't know what options to put in fstab for it...

I'm running a P4 1.6Ghz with a 20Gb, and a 6.5G HDD, and xfs is on the 6.5Gb hdd...

any suggestions?

----------

## biggyL

blkdragon

Below is mine fstab example:

```
/dev/hda1               /                       xfs             noatime,nodiratime,logbufs=8    1 1

/dev/hda2               none                    swap            sw                              0 0

/dev/hdc3               /var/tmp/portage        xfs             noatime,nodiratime,logbufs=8    0 0

/dev/hdc2               /data                   xfs             noatime,nodiratime,logbufs=8    0 0

/dev/hdc1               none                    swap            sw                              0 0

/dev/cdrom              /mnt/cdrom              iso9660         noauto,ro                       0 0

/dev/fd0                /mnt/floppy             auto            noauto                          0 0

# NOTE: The next line is critical for boot!

proc                    /proc           proc            defaults        0 0

# glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for

# POSIX shared memory (shm_open, shm_unlink).

# (tmpfs is a dynamically expandable/shrinkable ramdisk, and will

#  use almost no memory if not populated with files)

shm                     /dev/shm        tmpfs           nodev,nosuid,noexec     0 0
```

Last edited by biggyL on Tue May 01, 2007 8:31 am; edited 1 time in total

----------

## biggyL

Hello All,

I'd like to share xfsdump script (I wrote awhile ago) I'm using to route 1 full and 6 incremental backups during the week.

I'm using xfsdump to make dumps and xfsinvutil to prune (manage) sessions on the date of xfsdump.

```
# cat /scripts/xfsdump.sh

#!/bin/bash

DATE=`/usr/bin/date +%m/%d/%Y`; /usr/bin/xfsinvutil -n -m "file`/usr/bin/date +%w`" -M "`/usr/bin/uname -n`:/" $DATE

/usr/bin/rm /data/backups/backup`/usr/bin/date +%w`.file

/usr/bin/xfsdump -e -l `/usr/bin/date +%w` -L "dump hda1(/) of `/usr/bin/uname -n`.`/bin/dnsdomainname` at `/usr/bin/date +%F` level `/usr/bin/date +%w`" -M "file`/usr/bin/date +%w`" -f /data/backups/backup`/usr/bin/date +%w`.file /

/usr/bin/cp -R /var/lib/xfsdump/. /data/backups/xfs_inventory_backup/

```

This is a sample cronjob:

```
10 1 * * * (/scripts/xfsdump.sh | /bin/mailx -s "`/usr/bin/uname -n`.`/bin/dnsdomainname` daily xfsdump level `/usr/bin/date +\%w` status" leonk@mydomain.com)
```

Any comments very appreciated.

Enjoy

----------

## timbo

Is this bad?...

```

dinglemouse ~ # /usr/bin/xfs_db -c frag -r /dev/hda4

actual 321901, ideal 1129, fragmentation factor 99.65%

dinglemouse ~ # xfs_fsr /dev/hda4

/media start inode=0

insufficient freespace for: ino=1107: size=8175333596: ignoring

dinglemouse ~ # /usr/bin/xfs_db -c frag -r /dev/hda4

actual 246182, ideal 1126, fragmentation factor 99.54%

dinglemouse ~ #

```

After something lihe four hours of thrashing the HDD for a 0.09% improvement.... this is on my mythtv videos partition.

```

dinglemouse ~ # df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/hda3             9.2G  5.4G  3.5G  61% /

udev                  252M  2.7M  249M   2% /dev

/dev/hda4             223G  216G  7.1G  97% /media

/dev/sda1             459G  192G  244G  45% /media/TheStore

shm                   252M     0  252M   0% /dev/shm

dinglemouse ~ #      

```

Regards

Tim

 :Cool: 

----------

## fik

 *timbo wrote:*   

> Is this bad?...
> 
> ```
> 
> dinglemouse ~ # /usr/bin/xfs_db -c frag -r /dev/hda4
> ...

 

I think your partition is heavily fragmented. And you cannot defragment it, as there is not enough free space. First move some files to make space and then re-run xfs_fsr. It seems that you need at least 8175333596 bytes, i.e. 7.6 GB insted 7.1 GB actually free, but if you have much more free space, xfs_fsr will be faster.

Or you may try shake http://vleu.net/shake/

----------

## vipernicus

I don't see how XFS is faster than other filesystems.  

On ext3:

time tar -xvjf linux-2.6.21.tar.bz2

takes ~31s

On XFS:

time tar -xvjf linux-2.6.21.tar.bz2

takes ~58s

This is on a SATA-II drive.  For my everyday workload, it seems that XFS is the SLOWEST filesystem (next to NTFS that is).  

I tried standard XFS and then I tried to follow this guide as well, but I still get about the same performance either way.

The only thing my SATA controller does not support is NCQ.  Does XFS perform drastically better with NCQ support?

----------

## prymitive

 *vipernicus wrote:*   

> I don't see how XFS is faster than other filesystems.  
> 
> On ext3:
> 
> time tar -xvjf linux-2.6.21.tar.bz2
> ...

 

Did You tried with mount -o nobarrier ?

----------

## vipernicus

 *prymitive wrote:*   

>  *vipernicus wrote:*   I don't see how XFS is faster than other filesystems.  
> 
> On ext3:
> 
> time tar -xvjf linux-2.6.21.tar.bz2
> ...

 

Wasn't in the guide, what is its use?

----------

## a7thson

 *vipernicus wrote:*   

>  *prymitive wrote:*    *vipernicus wrote:*   I don't see how XFS is faster than other filesystems.   
> 
> Did You tried with mount -o nobarrier ? 
> 
> Wasn't in the guide, what is its use?

 

This became an issue at around kernel 2.6.17 where write barriers became a default option; some kind soul pointed me in this direction (http://lkml.org/lkml/2006/5/19/33), which is a thread on LKML where an XFS developer is discussing the merits/tradeoffs of write barriers and comments on some benchmark results.  They (SGI XFS devs) later addressed this in the XFS FAQ, where the official word 

from (http://oss.sgi.com/projects/xfs/faq.html) is:

```

Write barrier support.

Write barrier support is enabled by default in XFS since 2.6.17. It is disabled by mounting the filesystem with "nobarrier". Barrier support will flush the write back cache at the appropriate times (such as on XFS log writes). This is generally the recommended solution, however, you should check the system logs to ensure it was successful. Barriers will be disabled and reported in the log if any of the 3 scenarios occurs:

    * "Disabling barriers, not supported with external log device"

    * "Disabling barriers, not supported by the underlying device"

    * "Disabling barriers, trial barrier write failed" 

If the filesystem is mounted with an external log device then we currently don't support flushing to the data and log devices (this may change in the future). If the driver tells the block layer that the device does not support write cache flushing with the write cache enabled then it will report that the device doesn't support it. And finally we will actually test out a barrier write on the superblock and test its error state afterwards, reporting if it fails.

Q. Should barriers be enabled with storage which has a persistent write cache?

Many hardware RAID have a persistent write cache which preserves it across power failure, interface resets, system crashes, etc. Using write barriers in this instance is not warranted and will in fact lower performance. Therefore, it is recommended to turn off the barrier support and mount the filesystem with "nobarrier". 
```

Hope that's (somewhat) helpful and gives some context.  Not sure it's quite the answer you're looking for, however.

----------

## vipernicus

 *a7thson wrote:*   

> Not sure it's quite the answer you're looking for, however.

 

No, that's great information, I just needed to know if removing it would cause more harm than good.  

I mounted the partition with nobarrier, and wow, big difference.

real    0m31.367s

Edit: 

Although, ext3 w/ -o noatime,commit=60,data=writeback I get:

real    0m26.831s

----------

## a7thson

LOL - glad it helped.  I'm laughing because I nearly junked XFS as well back then after upgrading to a >2.6.16 kernel, not realizing that the new feature had been defaulted.  FYI - as you can read, it's not recommended to disable the write barrier on a system with no RAID/provisions for recovery but honestly I've run a laptop happily with most of the filesystem under XFS for a long time with little trouble and no recovery issues to speak of despite lockups, kernel oopses, and booting unstable/testing kernels [such as viper-sources  :Very Happy: ] on that machine.

 *vipernicus wrote:*   

>  *a7thson wrote:*   Not sure it's quite the answer you're looking for, however. 
> 
> No, that's great information, I just needed to know if removing it would cause more harm than good.  
> 
> I mounted the partition with nobarrier, and wow, big difference.
> ...

 

 *Quote:*   

> 
> 
> Edit:
> 
> Although, ext3 w/ -o noatime,commit=60,data=writeback I get: 
> ...

 

 :Very Happy:  I run / on ext3 with similar options actually.  Mostly I entrust larger files, torrent/p2p downloads, media and raw media, plus distfiles,packages and /home to XFS

----------

## vinboy

WAT THE HELL!!! I thought my HDD was going to die.

The HDD is brand new.

I formatted my exterhal HDD (500GB) connected through USB2.0.

With XFS (used settings in the first post):

-When writing to the HDD, the max speed was 20MB/s

-The HDD sounded like it was going to explode! The head was moving here and there all the time!

With EXT2:

-Mas speed 29MB/s <---- 50% improvement over XFS.

-The writing operation is so quiet, hardly notice anything.

please advise what was going on?

[/list]

----------

## Pse

 *vinboy wrote:*   

> WAT THE HELL!!! I thought my HDD was going to die.
> 
> The HDD is brand new.
> 
> I formatted my exterhal HDD (500GB) connected through USB2.0.
> ...

 

You could maybe try the "nobarrier" mount option. Be aware that doing that may put you data at risk in the events of a crash. Also, upgrading to the latest kernels may improve performance a bit.

----------

## pathfinder

i did the test of untarring the kernel (by ssh though)

well. first result, with reiserfs in /

real    3m41.672s

user    0m53.231s

sys     0m7.997s

Then, copying the tarball in the xfs partition mounted with barriers logbufs and noatime

real    3m58.457s

user    0m53.663s

sys     0m7.356s

then same thing with the nobarrier (it is not safe when power failure):

real    3m26.725s

user    0m49.515s

sys     0m6.824s

Is this test relevant? or are the kernel files small ones?

I expected something much more significative...

----------

## pilla

Moved from Other Things Gentoo to Documentation, Tips & Tricks.

----------

## rada

"Increasing the number of allocation groups will decrease the space available in each group. For most workloads, filesystem configurations with a very small or very large number of allocation groups should be avoided."

http://oss.sgi.com/projects/xfs/training/xfs_lab_02_mkfs.pdf

From this I gather that with a SMP system, more allocation groups are good. And according to it, too few is not good.  Maybe the default of 16 is just fine?

----------

## kanaric1

On my next system i'm leaning towards using XFS due to this topic.

Someone earlier said that are using, on their 64 bit system

```
mkfs.xfs -b size=8192 -l internal,size=128m -d agcount=20 /dev/sdb1
```

Would this be good on a 750GB HD, also have a quad core processor. Or could I maybe tweak it another way? Any suggestions?

----------

## rada

Bigger blocksize will increase performance but you will not be able to use that file system on a non-64bit system, ever.  Also if you have many small files (<8KiB) a lot of space will be wasted.  For allocation groups, too many will increase cpu usage when the file system is really full. Too few wont optimize usage across processors.  20 seems fine.

----------

## kanaric1

 *rada wrote:*   

> Bigger blocksize will increase performance but you will not be able to use that file system on a non-64bit system, ever.  Also if you have many small files (<8KiB) a lot of space will be wasted.  For allocation groups, too many will increase cpu usage when the file system is really full. Too few wont optimize usage across processors.  20 seems fine.

 

Well for the block size i will only ever be using 64 bit OS to use it.

Is the small files issue enough of a problem so that I should use the default setting or something lower? What would you recommend?

----------

## rada

It depends what this partition is for.  If its for /, use a 4k blocksize.  If its for mostly large data files, 8k blocksize is fine.

----------

## Taily

Great thread, I love XFS and seeing this thread warms my heart  :Smile: .

I was really into optimizing XFS a while back and found some good pointers from this article.

I'm going to play around with some ideas from this thread now, I live for optimization!

Cheers.

----------

## pdw_hu

Just a small addendum, I set up my XFS partitions with 

```
mkfs.xfs -l size=64m /dev/xyz
```

, so no further options and it put them on agcounts=4 by default, compared to what the initial post said. Otherwise it works just fine :)

I'm gonna experiment a bit with the nobarrier option, but I've been using xfs on lvm2 (which means no barriers) for a few months now, before gentoo on slackware and I did experience a few crashes here and there, but no corruption ever occured. (Or i just didn't notice :D)

----------

## rada

Are you able to set write barriers on LVM partitions, or Device-Mapper partitions in general?

----------

## pdw_hu

 *rada wrote:*   

> Are you able to set write barriers on LVM partitions, or Device-Mapper partitions in general?

 

Nope. I meant that when i used that LVM setup it didn't have barriers but i didn't lose any data either.

----------

## brfsa

this is how i have my XFS partition on fstab.

/dev/sdd1   /mnt/backup     xfs     logbufs=8,logbsize=262144,biosize=16,noatime,nodiratime 0 1

some info: 

# xfs_info /mnt/backup/

meta-data=/dev/sdd1       isize=256    agcount=4, agsize=19535700 blks

             =                       sectsz=512   attr=2

data       =                       bsize=4096   blocks=78142797, imaxpct=25

             =                        sunit=0      swidth=0 blks

naming   = version 2         bsize=4096  

log         = internal            bsize=4096   blocks=32768, version=2

             =                        sectsz=512   sunit=0 blks, lazy-count=1

realtime = none                extsz=4096   blocks=0, rtextents=0

----------

## kernelOfTruth

xfs really seems to offer nice performance (efficient space-usage is another story   :Laughing:  , with reiser4 only half of the space would be used)

is it normal that it needs around 100 minutes extracting an 4.1 GB stage4-tarball (2 different kernels directories, everything else is openoffice, kde3, kde4, gnome, xfce4)  ?   :Shocked: 

the options I used where 

```
noatime,nodiratime,biosize=16,logbufs=8
```

 (during mount)

and 

```
-l size=128m -b size=4096 -i size=512
```

 during creation, 

 *Quote:*   

> rootfs                 20G   14G  5.7G  71% /

 

kernel used during extract was 2.6.25

thanks

----------

## rada

Whats the cpu usage? Is the 'rootfs 20G 14G 5.7G 71% /' before or after extraction?

----------

## kernelOfTruth

 *rada wrote:*   

> Whats the cpu usage? Is the 'rootfs 20G 14G 5.7G 71% /' before or after extraction?

 

cpu usage is around 100% for 1 core (core2 duo), bzip2 + tar,

the space-usage is after extraction and pretty on par with reiserfs

----------

## rada

If cpu usage is at 100% then you need a faster cpu  :Razz: . It seems tar is only single threaded. XFS will start to use lots of cpu when it gets really full and fragments a lot, usually >85% usage.

----------

## kernelOfTruth

 *rada wrote:*   

> If cpu usage is at 100% then you need a faster cpu . It seems tar is only single threaded. XFS will start to use lots of cpu when it gets really full and fragments a lot, usually >85% usage.

 

naa, my cpu is fast enough for that kind of task   :Wink: 

yeah, it's a shame that tar + bzip2 aren't multi-threaded by default,

I just made the same extraction yesterday with reiserfs and it only took 20 minutes vs. 100 minutes (xfs), is there something I could tweak ?

since I often play back a stage4-tarball waiting 1.5 hours for the partition isn't really satisfying ...

thanks

----------

## brfsa

you saying that on the "same machine" it takes 20 minutes using reiserfs? 

same hardisk and hardware?

bzip2 taks a long time to compress and decompress if you use high level of compression... 

(try using "lzma -2"  very fast and still high compression)

what is agcount of your FS?

it should be 4 if you have dual core cpu. (CPU cores x 2).

can be set on fs creation only I believe. 

anyone knows if u can change agcount after creation?

----------

## brfsa

I also got very high CPU usage when I use XFS as root partition during emerges. (MAKEOPTS="-j2")

maybe because agcount=4 will use both CPU cores, thus using all cpu power...

It actually might be a good thing...

if setting MAKEOPTS="-j4" load reaches 4.45   :Shocked: 

----------

## Enlight

 *brfsa wrote:*   

> ...
> 
> what is agcount of your FS?
> 
> it should be 4 if you have dual core cpu. (CPU cores x 2).
> ...

 

WTH????????????????????   :Shocked: 

----------

## exif

@brfsa: The CPU spike during emerges is probably due to the compiling more than the FS.

----------

## prymitive

 *kernelOfTruth wrote:*   

> xfs really seems to offer nice performance (efficient space-usage is another story   , with reiser4 only half of the space would be used)
> 
> is it normal that it needs around 100 minutes extracting an 4.1 GB stage4-tarball (2 different kernels directories, everything else is openoffice, kde3, kde4, gnome, xfce4)  ?  
> 
> the options I used where 
> ...

 

You got write barriers by default, use mount -o nobarrier and You will see the difference. With barriers on (default), xfs doesn't really use much RAM for write buffers to be more power off safe.

----------

## jsosic

I'm glad to see that this thread is still alive and kickin' !  :Smile: 

----------

## TSP__

I been using XFS for a while, now...i am thinking in tweak a bit my fstab. i used mkfs.xfs without options to make my / and also for my /home. it's is safe to add

```

logbufs=8

```

right now? i only use noatime in fstab for xfs in both partitions. Any other hint?

Cheers!

----------

## prymitive

 *TSP__ wrote:*   

> I been using XFS for a while, now...i am thinking in tweak a bit my fstab. i used mkfs.xfs without options to make my / and also for my /home. it's is safe to add
> 
> ```
> 
> logbufs=8
> ...

 

Yes, it's safe. Putting nobarrier will speed up writes a lot, but in case of power loss You may lose more data as they will be in ram; if You got laptop You got backup power already included so it's safe then.

----------

## rada

Using XFS for my /home partition, I have problems whenever there is a crash, with aMule and uTorrent, the .part.met files for aMule [sources for a partially downloaded file] and the resume.dat for uTorrent [all the loaded torrents] files are written to 0. Any way around this? No, eh? I changed back to ext3 because of this issue for that partition.

----------

## kernelOfTruth

 *rada wrote:*   

> Using XFS for my /home partition, I have problems whenever there is a crash, with aMule and uTorrent, the .part.met files for aMule [sources for a partially downloaded file] and the resume.dat for uTorrent [all the loaded torrents] files are written to 0. Any way around this? No, eh? I changed back to ext3 because of this issue for that partition.

 

that's NOT an issue  :Wink: 

it's an specific feature of xfs

----------

## rada

figured as much. oh well.

----------

## TSP__

 *prymitive wrote:*   

>  *TSP__ wrote:*   I been using XFS for a while, now...i am thinking in tweak a bit my fstab. i used mkfs.xfs without options to make my / and also for my /home. it's is safe to add
> 
> ```
> 
> logbufs=8
> ...

 

Thanks for this info. BTW: nobarrier don't get colour in fstab using vim with syntax on, which is something who take my attention, since i am running Gentoo on a Laptop, this option seams good for me.

Cheers!

----------

## DigitalCorpus

I'm using Reiser4 and XFS for my partitions in my setup. I originally formatted my primary partition for server use with this:

```
mkfs.xfs -f -d agcount=24 -l internal,size=128m -L MediaServer /dev/sda7
```

The partition is a 500GiB chunk of disk space on my 640GB drive. I can into a problem when I was making use of the partition and copying files over to it. Throughput performance was good, but whether I was scp-ing a few large (4 to 7 GiB) files to the disk or actively making a mirror of a website, the latency I experienced was horrible! I have a Q6700 and the disk is a brand new Seagate Barracuda 7200.11 SATA 3Gb/s 640-GB. Since I've had no problems with Reiser4 under disk activity, I did some research into XFS. I'm using the Anticipatory Scheduler btw. Mount options remain the same in my fstab at:

```
logbufs=8,noatime
```

It is only stated once or twice in this thread, but agcount and agsize are more important when it comes to interactivity of the filesystem. Splitting up a 500 GiB partition 24 ways means it is separated into ~21 GiB logical chucks where it'll queue requests from active processes. for a server where you have a bunch of little files, I'd imagine having multiple pending requests in a 21 GiB region can take a while to process. XFS docs suggest having at least 1 agcount per 4 GiB of disk space used. The minimum size for agsize is 16 MiB. I considered the specs of my drive as it has a 32 MB cache, and at the beginning of the partition I get read speeds of 115 MiB/sec according to hdparm -t.

Taking all of this into account I backed up my partition and reformatted with the following settings:

```
mkfs.xfs -f -d agsize=128m -l size=32m -L MediaServer /dev/sda7
```

Note I did not specify internal for the journal since I only have one drive in the system. I realize that this may be a bit of overkill as it is equivalent to setting agcount=16016, but instead of taking 6 to 10 seconds to load my phpsysinfo page under disk activity, I get my usual .5-.6 seconds initial load time. I had a bunch of friends load the page as well with the same results. I also have noticed a lot less disk thrashing on large files.

I have not used bonnie to test throughput though, but the anecdotal evidence is strong. From the whole experience my opinion is to set agcount/agsize based on disk size, not by the number of CPUs/cores you have in your system.

Edited disk activity, scheduler, and agcount=16016.

----------

## arth1

 *biggyL wrote:*   

> Hello All,
> 
> I'd like to share xfsdump script (I wrote awhile ago) I'm using to route 1 full and 6 incremental backups during the week.
> 
> I'm using xfsdump to make dumps and xfsinvutil to prune (manage) sessions on the date of xfsdump.
> ...

 

Well, I have written a script myself, that does an automated dump of all xfs volumes on a system that has the dump flag set in /etc/fstab (that's the 0 or 1 in the second to last field), and handles incremental backups and automated pruning of the dump inventory.

It also calls /usr/local/sbin/xfsbackup.local, which is a user-provided script that can be used to set extended attributes for files and directories that should NOT be executed.

("chattr -R +d /var/tmp" is an example of what you might put there.)

Anyhow, you can find "xfsbackup" at http://www.broomstick.com/tech/xfsbackup

Place it in /usr/local/sbin (or location of choice), edit the commented defaults at the top (like destination for the backup, which I have set to /var/backup/ which is an NFS mount on my host, but obviously should be changed to suit your use), and set up a cron job for it.

I use a staggered Tower Of Hanoi approach for backups, which is a reasonable compromise between backup/restore time and disk usage.

```

1   3   1-31/16   *   *   /usr/local/sbin/xfsbackup -l 0

1   3   9-31/16   *   *   /usr/local/sbin/xfsbackup -l 2

1   3   5-31/8   *   *   /usr/local/sbin/xfsbackup -l 4

1   3   3-31/4   *   *   /usr/local/sbin/xfsbackup -l 6

1   3   2-31/2   *   *   /usr/local/sbin/xfsbackup -l 8

```

This makes a full copy at 3:01 AM on the 1st and 17th of every month, and a staggered incremental backup otherwise, (level 0, 8, 6, 8, 4, 8, 6, 8, 2, 8, 6, 8, 4, 8, 6, 8).

For now, there's no corresponding restore script, but the procedure is as follows:  If using compression, uncompress the backup files first.  Then look at the xfsrestore man page -- it's not that hard to figure out.  It's saved my neck a couple of times, when hard drives died.

----------

## arth1

 *vinboy wrote:*   

> WAT THE HELL!!! I thought my HDD was going to die.
> 
> The HDD is brand new.
> 
> I formatted my exterhal HDD (500GB) connected through USB2.0.
> ...

 

This is almost certainly due to allocation groups (AGs).  XFS divides each partition into several "sub-partitions", which improves speed on RAID systems with multiple CPUs, makes it less likely that you'll lose the entire volume in case of disk corruption, but only up to a fraction corresponding to number of allocation groups, and also reduces the chance of disk failure because the load will be spread out over the entire disk.

So far, so good.  But here's the problem you likely see:

A hard drive is much faster near the start of the disk than it is near the end.  The drive platters read much like an LP (remember those?), starting at the outside, and moving inwards.  The outermost tracks thus move much faster past the drive head, and can contain more data per rotation, which leads to faster speeds.  A drive being 3x as fast near the start of the disk as the end is not uncommon.

With lots of allocation groups, the load will be spread out over the entire disk.  So you write not only to the faster outer tracks, but also to the slower inner tracks.  This means that the drive will be slower when empty or near empty.  EXT2/3 will start at the start of the disk, and write inwards.  Thus, on an empty disk, EXT2/3 will often be faster, simply because it writes to the faster outer sectors first.

However, as the disk fills up, that advantage will disappear.  At around 1/2-2/3 full, the advantage is negated, and when close to full, XFS has a distinct advantage.

Speed tests should really not be performed on empty drives, unless you plan to keep the drive almost empty at all times.  Fill it up with as much random data as you expect having during normal operations, and then test the speed.  Then you get a far more realistic test of the speed.

Speaking of allocation groups...  I strongly recommend against lowering the number to 2 like the OP says.  The speed advantage is really only there for empty disks (because you write to the start and the middle of the disk with two AGs, and never near the slower end), but you risk losing up to half the drive if there is irrecoverable corruption to the b-tree.  For a drive that's 50-80% full, like most drives become over time, there is really no speed advantage to speak of, and even a speed slowdown as you get closer to 100%.  Plus, with a quad core (or dual core with hyperthreading), you lose the advantage of multiple operations being prepared in parallel.  At the very least, don't go below 4 AGs (or 8 with quad core with HT), and set it even higher if you think the disk will become very near full.

----------

## cpu

I've chosed XFS for my /home partition on server too but I have some problems with XFS:

1. Every power down causes a some data loss - I found this when I used xfs_repair - anyone have solution for this? How to examine XFS?

2. I often have data corruption when I transfer files (15-20GB) from my local network to server by SAMBA. Yesterday I even had hard lockup of my server at the end of files transfering...

Thanks in advance for help

----------

## arth1

 *cpu wrote:*   

> I've chosed XFS for my /home partition on server too but I have some problems with XFS:
> 
> 1. Every power down causes a some data loss - I found this when I used xfs_repair - anyone have solution for this? How to examine XFS?
> 
> 2. I often have data corruption when I transfer files (15-20GB) from my local network to server by SAMBA. Yesterday I even had hard lockup of my server at the end of files transfering...
> ...

 

Try asking in one of the appropriate fora -- the description for this one clearly states:

 *Quote:*   

> Unofficial documentation for various parts of Gentoo Linux. Note: This is not a support forum.

 

When you ask, don't forget to post the relevant /etc/fstab entry as well as the output from "xfs_info /".

----------

## DigitalCorpus

I switched to CFQ for my IO Scheduler to test things out. Much better responsiveness. So I thought I'd decrease the number of allocation groups to see if the advice given here was applicable. I set agsize to 1GiB on reformat and ran the same scenario I will be coming across several times a week: Copying 6 to 8 GiB files from one partition to another while serving files via apache. Well. Even with CFQ, the reduced allocation groups (down to about 1/4 of what I had for the 500GiB partition) I saw a visible increase in the latency of reading small files from the disk, regardless of filesystem type. I'm on amd64, gentoo-sources patched with Reiser4, on a SATA II 640GB Seagate disk. Given that I'm on amd64, This might just be the whole system responsiveness issue most have complained of though since it was easy to get rid of, I thought not. Does anyone have any suggestions? I'm thinking of switching these two large partitions over to Ext3 but the amount of space the super-block takes up detracts me.

In my anecdotal observations I see that, if the goal is interactivity and responsiveness of a system,  agcount/agsize should be set not based on the number of CPUs/core a user has or the theoretical number of simultaneous reads and writes, but based on a multiple of the read/write speed of the disk mechanism itself. I'm still new to Linux and Gentoo so I haven't gotten around to using bonnie to test for various scenarios.

With 6 SATA ports on my current motherboard and continuing plans of recording HDTV, XFS (over LVMS when I get there) seems to be the most logical solution so I'd like to learn to utilize this filesystem properly for performance in throughput and responsiveness. Guess it is analogous to a regular kernel and an fully preemtible kernel if I'm not mistaken.

----------

## snIP3r

hi all!

i have a question about the xfs_fsr tool. i use a partition that is encypted via dm-crypt. i have this fragmentation:

```

area52 ~ # xfs_db -c frag -r /dev/mapper/stuff

actual 128404, ideal 70103, fragmentation factor 45.40%

```

i want to know if it is save to defrag the whole encrypted partition? has someone finished something like my config successfully?

thx in advance

snIP3r

----------

## kowal

Good read

http://everything2.com/index.pl?node_id=1479435

----------

## snIP3r

thx for the interesting page but this does not answer my questions...

----------

## rada

using xfs_fsr should be fine on the encrypted xfs partition. as long as there is no corruption in the filesystem.

----------

## Master One

I always only sticked with ext3, but since it was now again advised in the workaround.org ISPmail tutorial, I am curious about XFS again:

1. Does XFS make any sense, when it comes down to a lot of small files (like /var/vmail with all the emails in maildir format)?

2. Anybody tried XFS on a netbook with an Atom N270, 1 or 2 GB RAM and a sloppy 16GB SSD?

BTW Nowadays I always only use the filesystem on top of LVM, which often sits on top of a luks-encrypted partition, which is either on a Software- or Hardware-RAID1. Does that have any influence? Wasn't it a typical recommendation in the past, NOT to use XFS on a Software-RAID?

For me, it's either ext3 or XFS for a general purpose filesystem, what's good for your server, should also be good for your workstation / laptop / netbook. Can it?

EDIT: Just played around a little, with the following conclusions:

- Still no "barriers" on LVM ("Disabling barriers, not supported by the underlying device")

- It is still not possible to change the log size after fs creation ("~# xfs_growfs -L 16384 MOUNTPOINT" -> "xfs_growfs: log growth not supported yet")

The first issue seems to be bad, because it was mentioned, that with "nobarriers" there is a larger dataloss to be expected in case of unclean shutdown / power-off, the second issue is bad, if you use a distro-installer on which you can not edit the filesystem creation options (I was thinking of the Debian Installer). Something like "not supported yet" does not really represent such an old and mature filesystem as a stable and production-ready one...

EDIT2: Did some read-up in the XFS FAQ on xfs.org, and found some interesting info concerning disk write cache and the barrier/nobarrier mountoption:

- If you have a single drive, it's good to leave barriers & disk write cache on.

- If you have a hardware-raid-controller with battery backed controller cache and cache in write back mode, it is advised to use the nobarrier mountoption, and to disable the individual disks' write caches.

Now what should one do concerning the barrier/nobarrier and disk write cache options, if using 

- a Software-RAID1?

- LVM on top of a Software-RAID1?

- LVM on top of a luks-encrypted Software-RAID1?

- LVM on a Hardware-RAID1 with battery backed controller cache and cache in write back mode?

- LVM on top of a luks-encrypted Hardware-RAID1 with battery backed controller cache and cache in write back mode?

As mentioned, XFS on top of LVM leads to disables barriers anyway, so are you supposed to disable disks' write caches in every case, where nobarrier is used?

It even gets more confusing, if virtualization is used, which makes me believe, that one better & safer sticks with good old ext3 instead...   :Rolling Eyes: 

----------

## erm67

Anyone has experimented with the lazy-counters feature of xfs? it looks promising and quite a recent addition.

 *Quote:*   

>    [XFS] Lazy Superblock Counters
> 
>     When we have a couple of hundred transactions on the fly at once, they all
> 
>     typically modify the on disk superblock in some way.
> ...

 

 *mkfs.xfs wrote:*   

>     lazy-count=value 
> 
> This changes the method of logging various persistent counters in the superblock. Under metadata intensive workloads, these counters are updated and logged frequently enough that the superblock updates become a serialisation point in the filesystem. The value can be either 0 or 1.
> 
> With lazy-count=1, the superblock is not modified or logged on every change of the persistent counters. Instead, enough information is kept in other parts of the filesystem to be able to maintain the persistent counter values without needed to keep them in the superblock. This gives significant improvements in performance on some configurations. The default value is 0 (off) so you must specify lazy-count=1 if you want to make use of this feature. 

 

 *xfs_admin wrote:*   

>        -c 0|1 Enable (1) or disable (0) lazy-counters in the filesystem.  This
> 
>               operation  may  take quite a bit of time on large filesystems as
> 
>               the entire filesystem needs to be scanned when  this  option  is
> ...

 

----------

## lightvhawk0

I turned on lazy counters and restored my backup to my xfs drive

```
meta-data=/dev/md0               isize=256    agcount=16, agsize=7599984 blks

         =                       sectsz=512   attr=2

data     =                       bsize=4096   blocks=121599744, imaxpct=25

         =                       sunit=16     swidth=32 blks

naming   =version 2              bsize=4096   ascii-ci=0

log      =external               bsize=4096   blocks=59367, version=2

         =                       sectsz=512   sunit=0 blks, lazy-count=1

realtime =none                   extsz=131072 blocks=0, rtextents=0

```

it took about nine minutes before I turned on lazy-count=1

after I reformatted and returned my backup it knocked an entire minute off

EDIT just a note I moved the log to an external device and now my system is much more silent.

----------

## kernelOfTruth

for those folks that use one (or more) of those new harddrives with the "Advanced Sector Format" (4 KiB sectors)

make sure you have set:

mkfs.xfs -s size=4096

when creating the partition and

before that:

that your partitions are aligned to MiB (via gparted) or multiples of 4 KiB

----------

## Enlight

Hi folks,

For those of you who don't know it, since 2.6.35, xfs had a new mount option '-o delaylog', which improved a lot metadata operations. From 2.6.39 this option is on by default and basicaly the default setups are probably the best you can get for a non specific usage (even noatime is useless because filesystems are all using relatime by default now).

results of decent benchmarks can be seen on slides 24 and next of this paper : http://www.redhat.com/summit/2011/presentations/summit/decoding_the_code/thursday/wheeler_t_0310_billion_files_2011.pdf

basicaly, now xfs competes with btrfs and ext4 at file creation and we mean small files here (investigations are made to make it even better) and xfs is now the fastets filesystem for deletion

 (no, i  made no typo).

edit : btw xfs was already 3 times faster than ext4 and 6 times faster than btrfs at iterating through the created files (see slide 20) so if you get that each created file is supposed to be read at least once, the conclusions should be obvious!

Hope you will enjoy your xfs filesystem even more than before!

----------

## rada

This link has some interesting ideas for optimizing XFS for a raid partition. https://raid.wiki.kernel.org/index.php/RAID_setup#XFS

----------

## JeffBlair

OK,  I'm about to re-do my RAID5 array, and want to tweek my drive.

This will be for serving out blue-ray/DVD's for a couple of PC's. So, here's what I'm going to run so far

It's on a Dual Core Intel by the way running x64, and the array will be about 10T at the largest..unless I get a new server case.  :Wink: 

mkfs.xfs -l internal,size=128m -d agcount=20 -b size=8192 lazy-count=1 /dev/sdc1

and, of course the normal "noatime,logbufs=8,nobarrier,nodiratime" in fstab

So, does that look right for serving out 5gig files? 

Also, would it be better to move the journal file to another drive? And, if so, how would I do that?

Thanks for all the help guys.

----------

