# New filesystem: ZFS on Linux

## devsk

KQ Infotech had the GA on Jan 14. So, I decided to give native ZFS a go. After few tweaks to genkernel, I have 2.6.35.10 booted on a zfs rootfs inside a VM from a USB drive. I am going to play around with it and then move my spare root on desktop to it.

Let's see how it goes. I will be posting the ebuild, genkernel patches once it stabilizes. This is a very exciting time!

----------

## devsk

As soon as I booted into kde, kinks begin to appear... :Smile:  Every KDE app is crashing. Enlightenment seems to have no issues with ZFS. So, I will use that for now.

Weird!

----------

## aCOSwt

Good initiative indeed.

BTW, did you download a binary from http://zfs.kqinfotech.com/download.php ?

Which one did you select ?

Or did you get the sources from somewhere else ?

----------

## devsk

 *aCOSwt wrote:*   

> Good initiative indeed.
> 
> BTW, did you download a binary from http://zfs.kqinfotech.com/download.php ?
> 
> Which one did you select ?
> ...

 source from git. tarred it up and created ebuild around the tar

----------

## devsk

Looks like the KDE issue may be to do with OOM kills I am getting because my VM is running out of memory.

----------

## Etal

 *devsk wrote:*   

>  *aCOSwt wrote:*   Good initiative indeed.
> 
> BTW, did you download a binary from http://zfs.kqinfotech.com/download.php ?
> 
> Which one did you select ?
> ...

 

Mind sharing the git location and the ebuild?  :Smile: 

----------

## devsk

ohh...I thought...never mind. The instructions for build etc. are there on KQ pages. But anyway,

```
Getting and Building from source

The ZFS on linux functionality is provided by three modules which are maintained in seperate source

trees. These are

spl (solaris porting layer)

zfs (core dmu/dsl functionality)

lzfs (linux posix layer)

You need to retrive the sources for all three and compile them. If any one of them are missing zfs won't

function. The three repositories can be accessed at the following url https://github.com/zfs-linux

The commands and procedures required to build fresh modules from source are listed below. Please note

that some of the tools used in the procedure might not be installed on your machine and the error that result

don't always clearly indicate that the package was missing.

/tmp$ git clone git://github.com/zfs-linux/spl.git

Initialized empty Git repository in /tmp/spl/.git/

remote: Counting objects: 4266, done.

remote: Compressing objects: 100% (1144/1144), done.

remote: Total 4266 (delta 3155), reused 4162 (delta 3078)

Receiving objects: 100% (4266/4266), 1.70 MiB | 123 KiB/s, done.

Resolving deltas: 100% (3155/3155), done.

1

Getting Started

/tmp$ git clone git://github.com/zfs-linux/zfs.git

Initialized empty Git repository in /tmp/zfs/.git/

remote: Counting objects: 68496, done.

remote: Compressing objects:

3% (631/21029)

....

/tmp$ git clone git://github.com/zfs-linux/lzfs.git

Initialized empty Git repository in /tmp/lzfs/.git/

remote: Counting objects: 173, done.

remote: Compressing objects: 100% (152/152), done.

remote: Total 173 (delta 92), reused 38 (delta 16)

Receiving objects: 100% (173/173), 199.19 KiB | 103 KiB/s, done.

Resolving deltas: 100% (92/92), done.

/tmp$ cd spl

/tmp/spl$ ./configure --with-linux=/lib/modules/2.6.32-24-server/build

checking metadata... yes

checking build system type... x86_64-unknown-linux-gnu

checking host system type... x86_64-unknown-linux-gnu

checking target system type... x86_64-unknown-linux-gnu

checking whether to enable maintainer-specific portions of Makefiles... no

checking for a BSD-compatible install... /usr/bin/install -c

....

/tmp/spl$ make

make all-recursive

make[1]: Entering directory `/tmp/spl'

Making all in lib

make[2]: Entering directory `/tmp/spl/lib'

/bin/bash ../libtool --tag=CC --silent --mode=compile gcc -DHAVE_CONFIG_H -includ

../spl_config.h

-Wall -Wshadow -Wstrict-prototypes -fno-strict-aliasing

-D__USE_LARGEFILE64 -DNDEBUG -g -O

....

/tmp/spl$ cd ../zfs/

/tmp/zfs$ ./configure --with-linux=/lib/modules/2.6.32-24-server/build

--with-spl=/tmp/spl/

checking metadata... yes

checking build system type... x86_64-unknown-linux-gnu

checking host system type... x86_64-unknown-linux-gnu

checking target system type... x86_64-unknown-linux-gnu

checking whether to enable maintainer-specific portions of Makefiles... no

....

/tmp/zfs$ make

make all-recursive

make[1]: Entering directory `/tmp/zfs'

Making all in etc

make[2]: Entering directory `/tmp/zfs/etc'

make[2]: Nothing to be done for `all'.

make[2]: Leaving directory `/tmp/zfs/etc'

....

/tmp/zfs$ cd ../lzfs/

/tmp/lzfs$ ./configure --with-linux=/lib/modules/2.6.32-24-server/build

Getting Started

checking target system type... x86_64-unknown-linux-gnu

checking for a BSD-compatible install... /usr/bin/install -c

checking whether build environment is sane... yes

....

/tmp/lzfs$ make

make all-recursive

make[1]: Entering directory `/tmp/lzfs'

Making all in module

make[2]: Entering directory `/tmp/lzfs/module'

make -C /lib/modules/2.6.32-24-server/build SUBDIRS=`pwd` V=1 modules

....

/tmp/lzfs$ cd ../zfs/scripts/

/tmp/zfs/scripts$ ./zfs.sh -v

Loading zlib_deflate (/lib/modules/2.6.32-24-server/kernel/lib/

zlib_deflate/zlib_deflate.ko)

Loading spl (/tmp/spl//module/spl/spl.ko)

Loading splat (/tmp/spl//module/splat/splat.ko)

Loading zavl (/tmp/zfs/module/avl/zavl.ko)

Loading znvpair (/tmp/zfs/module/nvpair/znvpair.ko)

....

/tmp/zfs/scripts$ insmod /tmp/lzfs/module/lzfs.ko

/tmp/zfs/scripts$ cd /tmp/spl/

/tmp/spl$ make install

Making install in lib

make[1]: Entering directory `/tmp/spl/lib'

make[2]: Entering directory `/tmp/spl/lib'

make[2]: Nothing to be done for `install-exec-am'.

make[2]: Nothing to be done for `install-data-am'.

....

/tmp/spl$ cd ../zfs/

/tmp/zfs$ make install

Making install in etc

make[1]: Entering directory `/tmp/zfs/etc'

make[2]: Entering directory `/tmp/zfs/etc'

test -z "/etc" || /bin/mkdir -p "/etc"

/bin/mkdir -p '/etc/../etc/udev/rules.d'

....

/tmp/zfs$ cd ../lzfs/

/tmp/lzfs$ make install

Making install in module

make[1]: Entering directory `/tmp/lzfs/module'

make -C /lib/modules/2.6.32-24-server/build SUBDIRS=`pwd` \

INSTALL_MOD_PATH= \

INSTALL_MOD_DIR=addon/lzfs modules_install

....

/tmp/lzfs$ lsmod | grep lzfs

lzfs

28371 0

zfs

964150 1 lzfs

spl

120247 7 lzfs,zfs,zcommon,zunicode,znvpair,zavl,splat

3

Getting Started

Installing Startup Scripts

The binaries have been installed. Currently the make system does not intall the startup scritps these have

to be done manually.

Follow this procedure for Fedora

/tmp$

chkconfig --add zfsload

Follow this procedure for Ubuntu

/tmp$

/tmp$

/tmp$

/tmp$

/tmp$

cp lzfs/scripts/zfsload-ubuntu /etc/init.d/zfsload

chown root /etc/init.d/zfsload

chmod +x /etc/init.d/zfsload

update-rc.d zfsload defaults

service zfsload start

```

I know its not very readable but I cut&paste from the PDF. Sorry!

Download the tar for 2.6.35. There is a user guide PDF in there. That has a link to developer PDF, which is what I cut&paste above.

Ebuild is pretty rough. You can imagine I started this morning on this and I have been doing everything to just to get it to work. But here it is:

```

# Copyright 1999-2010 Gentoo Foundation

# Distributed under the terms of the GNU General Public License v2

# $Header: /var/cvsroot/gentoo-x86/sys-fs/zfs-fuse/zfs-fuse-0.6.9-r1.ebuild,v 1.3 2010/06/24 11:18:02 ssuominen Exp $

EAPI=2

inherit bash-completion

DESCRIPTION="An implementation of the ZFS filesystem for FUSE/Linux"

HOMEPAGE="http://zfs-fuse.net/"

SRC_URI="http://zfs-fuse.net/releases/${PV}/source-tar-ball -> ${P}.tar.bz2"

LICENSE="CDDL"

SLOT="0"

KEYWORDS="~amd64 ~x86"

IUSE="debug"

RDEPEND="

    sys-libs/zlib

    dev-libs/libaio

    dev-libs/openssl"

S=${WORKDIR}

linux_ver=linux-2.6.35.10

src_compile() {

    unset ARCH

    cd ${S}/spl && ./configure --prefix=/usr --with-linux=/usr/src/${linux_ver} && emake || die "Failed spl compile"

    cd ${S}/zfs && ./configure --prefix=/usr --with-linux=/usr/src/${linux_ver} --with-spl=${S}/spl && emake || die "Failed zfs compile"

    cd ${S}/lzfs && ./configure --prefix=/usr --with-linux=/usr/src/${linux_ver} --with-spl=${S}/spl --with-zfs=${S}/zfs && emake || die "Failed zfs compile"

}

src_install() {

    cd ${S}/spl && make DESTDIR="${D}" install || die "make install failed"

    cd ${S}/zfs && make DESTDIR="${D}" install || die "make install failed"

    cd ${S}/lzfs && make DESTDIR="${D}" install || die "make install failed"

    /bin/rm -rf ${D}/usr/src

    #dodoc ../{BUGS,CHANGES,HACKING,README*,STATUS,TESTING,TODO}

}

```

See I told you. It barely installs the damn thing! It assumes that you have created a tar after you git cloned the source. The tar.bz2 should contain three folders spl, zfs, lzfs gotten from clone. I gave it name nzfs-0.5.2.ebuild and named the tar nzfs-0.5.2.tar.bz2.

It also assumes that you have installed, compiled and module_install'ed vanilla-source-2.6.35.10 (which is not in portage, so, just copy over 2.6.35.9 and digest).

Things are still getting cooked at this time...so everything is raw (see zfs-fuse reference above... :Very Happy:  that's what I use at this time on my home server).

----------

## devsk

The kde issues went away after I started with a new profile instead of the copied one. Got a null pointer de-reference in kernel during shutdown. Looks like the code has some stability issues.

This looks very promising. I will wait for update for 2.6.37 and then I will move my server to this.

----------

## NeddySeagoon

Moved from Gentoo Chat to Unsupported Software.

Just to make it clear that ZFS is not supported by Gentoo - yet

----------

## devsk

 *NeddySeagoon wrote:*   

> Moved from Gentoo Chat to Unsupported Software.
> 
> Just to make it clear that ZFS is not supported by Gentoo - yet

 Not even zfs-fuse?

IMO, steps to make ZFS on rootfs working are same for both, although zfs-fuse is sort of tough because of additional process. genkernel support will be added as part of https://bugs.gentoo.org/show_bug.cgi?id=351861

We will just need an initramfs overlay for genkernel for basic tools zpool, zfs (and zfs-fuse in case of zfs-fuse) and we are done!

----------

## nadir-san

Hey, I was using freebsd 8.1 and zfs, I however replaced my computer recently, and my new rig (which im still building) has gentoo installed, I was wondering if freebsd zpools can be detected by native linux zfs ?

This is going to be tricky, I guess my biggest fear is that linux zfs will corrupt my data.

----------

## xyzzyx

 *nadir-san wrote:*   

> Hey, I was using freebsd 8.1 and zfs, I however replaced my computer recently, and my new rig (which im still building) has gentoo installed, I was wondering if freebsd zpools can be detected by native linux zfs ?
> 
> This is going to be tricky, I guess my biggest fear is that linux zfs will corrupt my data.

 

you won't get data corruption.. but all newly created file/directory wil get 000 permission.  So if you want to keep your old pools wait until they fix this issue (last i heard, they are working on a fix).

----------

## psycho_driver

Is Raid-Z fully functional in the current GA release?

----------

## devsk

 *psycho_driver wrote:*   

> Is Raid-Z fully functional in the current GA release?

 Haven't tried RAID-Z but I don't see why it won't work. RAID-Z code won't even be touched by KQ folks because its a pool level concept and not the FS level.

----------

## psycho_driver

 *devsk wrote:*   

>  *psycho_driver wrote:*   Is Raid-Z fully functional in the current GA release? Haven't tried RAID-Z but I don't see why it won't work. RAID-Z code won't even be touched by KQ folks because its a pool level concept and not the FS level.

 

Well, I've got 4 2TB drives on the way that I intend to try it out with, so I guess I'll be the guinea pig since I haven't found anyone else who is using it yet.

----------

## psycho_driver

For the curious googler out there, I do have a working 5.75TB native zfs raidz array up and running.  Seamless compression works, everything is retained between restarts, etc.  As far as I know, everything is working normally (this is my first foray into zfs territory).

Sadly the write performance seems to be pretty horrific.  It's managed to copy about 265GB of data in the past 16 hours or so :\  I'm not sure how much having compression on hurts this, but it shouldn't be much since my cpu usage is peaking at around 30% on one core and 10% on the other.  Not much has been done toward read testing yet.  I might run some benchmarks on it once the file transfers finish.

----------

## devsk

 *psycho_driver wrote:*   

> For the curious googler out there, I do have a working 5.75TB native zfs raidz array up and running.  Seamless compression works, everything is retained between restarts, etc.  As far as I know, everything is working normally (this is my first foray into zfs territory).
> 
> Sadly the write performance seems to be pretty horrific.  It's managed to copy about 265GB of data in the past 16 hours or so :\  I'm not sure how much having compression on hurts this, but it shouldn't be much since my cpu usage is peaking at around 30% on one core and 10% on the other.  Not much has been done toward read testing yet.  I might run some benchmarks on it once the file transfers finish.

 Did u enable dedup by any chance? I tested a single disk configuration and the write performance was almost as good as any native FS. Something is definitely wrong!

How much RAM do u have  on this machine? ZFS is a RAM hungry beast. Feed it RAM and it will perform like a beast!

Also, note that ZFS spawns a lot of threads for concurrent IO. So, enable NCQ (check queue length in /sys/block/<drive>/queue/nr_requests) on the drives if it is not set already. Invest into some SSD drive(s) as the L2 cache, which will speed up random IO ops tremendously.

PS: the single disk tests I did not have L2 cache and experiment was mostly for sequential IO and large directory (with thousands of files) copy ('cp -a' kind).

----------

## psycho_driver

 *devsk wrote:*   

> Did u enable dedup by any chance? I tested a single disk configuration and the write performance was almost as good as any native FS. Something is definitely wrong!
> 
> How much RAM do u have  on this machine? ZFS is a RAM hungry beast. Feed it RAM and it will perform like a beast!
> 
> Also, note that ZFS spawns a lot of threads for concurrent IO. So, enable NCQ (check queue length in /sys/block/<drive>/queue/nr_requests) on the drives if it is not set already. Invest into some SSD drive(s) as the L2 cache, which will speed up random IO ops tremendously.
> ...

 

I did not explicitly enable dedup (I had to google to see what it was).  I'll check when I get home to make sure it wasn't done by default.

The machine is a core2 based celeron (e3400) @ 3.5ghz w/ 2GB ddr2/800 all running on a GF9300/730i board.  It's a dual purpose HTPC/Network file server.  The memory use during normal operation hovers around 1GB.  In addition to the 4 2TB samsung F4 drives which comprise the raidz array, the machine also had a 64GB SSD as the OS drive, a sata BD drive, and a 250GB 7200rpm ide drive.

I'll also check the current NCQ status after work.  Thanks for the tips.

----------

## psycho_driver

The NCQ parameter was 128.  Dedup is not set.  I was able to increase performance quite a bit by switching from "AHCI Linux" to "AHCI" within the bios.  Now I'm seeing write speeds up to 60MB/s, which is still a lot less than what one of those drives are capable of by itself, and not quite enough to saturate my GB network (67.1MB/s sustained), but good enough to keep me from banging my head against the wall while I transfer over some of my larger archives.  I do believe the compression setting does affect overall write speed by a fair margin.  I'll try to benchmark with compression on and off in between the next set of file transfers.

----------

## psycho_driver

I'm getting some pretty wildly varying results from iozone on the filesystem:

Run 1:

	File size set to 102400 KB

	Command line used: iozone -s 102400 test.io

	Output is in Kbytes/sec

	Time Resolution = 0.000001 seconds.

	Processor cache size set to 1024 Kbytes.

	Processor cache line size set to 32 bytes.

	File stride size set to 17 * record size.

                                                         random  random    bkwd     record   stride ->

              KB      reclen write    rewrite    read      reread     read       write     read      rewrite     read        fwrite     frewrite   fread     freread

          102400       4   12907   38077   920862   931475  655620   19268  903658   423158   824727   154482   151084  965109   966638

Run 2:

	File size set to 102400 KB

	Command line used: iozone -s 102400 test.io

	Output is in Kbytes/sec

	Time Resolution = 0.000001 seconds.

	Processor cache size set to 1024 Kbytes.

	Processor cache line size set to 32 bytes.

	File stride size set to 17 * record size.

                                                            random  random    bkwd    record   stride ->                                  

              KB     reclen   write    rewrite      read     reread      read      write     read      rewrite     read        fwrite    frewrite   fread      freread

          102400       4  161253  141186   932936   932800  705676   17092  995528   464007   892958   149212   154774  968650   966930

So far the data integrety of the FS has been fine, even after numerous hard locks and power cycles (new system, finding max stable overclock).  I do believe there may be some conflicts between the zfs linux modules and some other parts of the system, but time will tell as I tweak the system more.  I'm running a 2.6.35-gentoo-r8 kernel with a couple of modifications for my HTPC and some additional external modules pulled in.

----------

## psycho_driver

. . . just completed my first full filesystem backup.  ~17GB from the root filesystem on the SSD tarred directly to the raidz array.  The tarball ended up being ~26GB (~15GB actually used with compression=on).  The process took 87 minutes and 16 seconds.  Doing a byte->second computation reveals a write speed of just over 5MB/s.  Not exactly setting the world on fire, but again, as long as the reliability is there I'll live with low write speeds.

The htpc was in use with live tv for a bit and a game being played for a little while as well.  The backup happening in the background caused no hiccups in either activity.

----------

## psycho_driver

Bonnie w/ a 1GB file

29.4 MB/s write w/ putc

74.6 MB/s block write

59.5 MB/s rewrite

59.8 MB/s read w/ getc

412.3 MB/s block read

137.8 seeks per second

----------

## psycho_driver

Human readable iozone output:

iozone -s 512000 test.io

109 MB/s write

87 MB/s re-write

439.3 MB/s read

633.4 MB/s re-read

7.4 MB/s random read (ouch)

4.1 MB/s random write (double ouch)

855.7 MB/s bkwd read

440 MB/s record re-write

7 MB/s stride read

26.2 MB/s fwrite

132.9 MB/s frewrite

941.3 MB/s fread

935.2 MB/s freread

----------

## psycho_driver

Testing a 256MB file in iozone results in random reads/writes of 677.1 MB/s and 338.7 MB/s, respectively.  The problem must occur as larger files are utilized (which also may be my machine exceeding it's physical ram capacity).

----------

## kernelOfTruth

is  zfsonline  the same like the one from KQ Infotech ?

 issue tracker  (it even seems to support up to 2.6.36 or 2.6.37 - and pool version 28 !)

----------

## aCOSwt

 *kernelOfTruth wrote:*   

> is  zfsonline  the same like the one from KQ Infotech ?

 

My understanding was that zfsonline is the original product from the Lawrence Livermore National Laboratory

That this product was lacking some important features such as supporting a mountable filesystem.

While KQ's product was based on LLNL + adding the missing features.

But this could well be an outdated understanding as I read now that missing functionalities have been added to the future LLNL 0.6 release.

----------

## devsk

 *aCOSwt wrote:*   

>  *kernelOfTruth wrote:*   is  zfsonline  the same like the one from KQ Infotech ? 
> 
> My understanding was that zfsonline is the original product from the Lawrence Livermore National Laboratory
> 
> That this product was lacking some important features such as supporting a mountable filesystem.
> ...

 That is the correct understanding.

----------

## devsk

I just built LLNL's version of native ZFS and I am about to boot it. The good thing is it builds fine on 2.6.37.1 after a little bit of patching.

----------

## aCOSwt

 *devsk wrote:*   

> I just built LLNL's version of native ZFS and I am about to boot it.

 

Did you go with 0.5.2 or 0.6.0-rc1 ?

----------

## kernelOfTruth

 *aCOSwt wrote:*   

>  *devsk wrote:*   I just built LLNL's version of native ZFS and I am about to boot it. 
> 
> Did you go with 0.5.2 or 0.6.0-rc1 ?

 

I'd guess 0.6.0-rc1  :Wink: 

because 0.5.2 is rather feature incomplete

@devsk:

could you please post the steps if it was successful ?

I can hardly wait to use zfs with 2.6.37 on more partitions (natively !)   :Mr. Green: 

----------

## devsk

 *kernelOfTruth wrote:*   

>  *aCOSwt wrote:*    *devsk wrote:*   I just built LLNL's version of native ZFS and I am about to boot it. 
> 
> Did you go with 0.5.2 or 0.6.0-rc1 ? 
> 
> I'd guess 0.6.0-rc1 
> ...

 Yes, of course! I am running into an issue right now. Once I get past that, I will post here what I did.

----------

## devsk

Its not going very well. Follow the events as they happen at http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss... :Very Happy: 

----------

## Shining Arcanine

 *devsk wrote:*   

> Looks like the KDE issue may be to do with OOM kills I am getting because my VM is running out of memory.

 

Do you have an update on this?

 *devsk wrote:*   

> Its not going very well. Follow the events as they happen at http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss...

 

I imagine that ZFS issues are caused by ZFS' desire for a permanent physical memory allocation for use as dedicated cache while other filesystems use free RAM as cache while it is available, but give it back the moment that a userland application needs it.

A while back I tried asking in ##freebsd on FreeNode about how to make its memory usage behave similiarly to that of other filesystems. People seemed to think that letting ZFS hoard RAM indefinitely was okay and would not tell me how it could be made to share RAM. I would need to do more research to be sure, but I suspect that having ZFS use only unallocated memory until it is needed by something else is not possible.

----------

## aCOSwt

 *Shining Arcanine wrote:*   

> ...but I suspect that having ZFS use only unallocated memory until it is needed by something else is not possible.

 

It is actually possible under Solaris.

It is nevertheless true that I did not succeed tune my FreeBSDs accordingly. (I gave up searching after tuning in accordance with http://wiki.freebsd.org/ZFSTuningGuide )

----------

## devsk

 *Shining Arcanine wrote:*   

>  *devsk wrote:*   Looks like the KDE issue may be to do with OOM kills I am getting because my VM is running out of memory. 
> 
> Do you have an update on this?

 I don't remember how I resolved that but it was a runaway process. And the fix was in userspace. But this was with KQI code, which I got bored with because it got me stuck at 2.6.35, and I really wanted to move to 2.6.37 and beyond.

 *Shining Arcanine wrote:*   

> 
> 
>  *devsk wrote:*   Its not going very well. Follow the events as they happen at http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss... 
> 
> I imagine that ZFS issues are caused by ZFS' desire for a permanent physical memory allocation for use as dedicated cache while other filesystems use free RAM as cache while it is available, but give it back the moment that a userland application needs it.
> ...

 ZFS ARC  and page cache are going to duplicate stuff in memory and Brian (the author of Native ZFS) is aware of it and he has it on his agenda. zfs-fuse also had this issue but there it appears to kernel like an application is hoarding memory. But with native ZFS, its all the memory in the kernel.

----------

## psycho_driver

 *kernelOfTruth wrote:*   

> is  zfsonline  the same like the one from KQ Infotech ?
> 
>  issue tracker  (it even seems to support up to 2.6.36 or 2.6.37 - and pool version 28 !)

 

That is the underlying zfs infrastructure upon which the KQ Infotech compatibility module works.

-Edit-

Oops, didn't realize there was a half pages of responses after this one.

----------

## zefrer

@psycho - I would bet that the low write performance you're seeing is because the zpool creation used 512 byte sectors on your 2TB drives. Can you check? If my memory serves me right there is an option in zpool to force the sector size to be something else.

It should use 4k instead.

Are you also able to post results from the phoronix suite? We can then compare results, I have a similar setup.

----------

## psycho_driver

Unfortunately I don't have a SSD to devote solely to being a zil device (or a partition thereof, if that's possible).

I have determined that my really poor write speeds are from bumping up against physical memory limitations.  When copying a large file I get 40-50MB/s until physical memory is exhausted, at which point it drops to 4-5MB/s.  2GB really isn't enough.  I have 4GB on the way but the USPS is taking their sweet time with it.  KQI recommend a minimum of 4GB.  We'll see if that alleviates my memory issues.

I did a lot of tweaking with zfs_vdev_min_pending and zfs_vdev_max_pending and found that values of 8 and 18 respectively work the best overall for my particular drive/controller combination (4x Samsung F4's on an nforce 730 controller).  There's a thread over on hardforums where a guy claims the F4's don't work well with NCQ enabled, but I believe he was experiencing a controller issue.  These min_pending and max_pending values work substantially better for me than values of 1/1.

I did raw dd testing tonight after tweaking as much as I plan to.  It achieved write speeds of 490MB/s and read speeds of 894MB/s.  Not too shabby.

iozone is now consistently reporting writes in the 380MB/s range and reads around 1GB/s.  If I can get the memory issue under control I will be content with the setup (zfs_arc_max doesn't seem to do much of anything for me unfortunately).

----------

## Truzzone

 *psycho_driver wrote:*   

> ...
> 
> I did raw dd testing tonight after tweaking as much as I plan to.  It achieved write speeds of 490MB/s and read speeds of 894MB/s.  Not too shabby....

 

What is your setup?

Have you achieved this result with 4x Samsung F4 + zfs-fuse + raid-z (2)?

Best regards,

Truzzone

----------

## psycho_driver

 *Truzzone wrote:*   

>  *psycho_driver wrote:*   ...
> 
> I did raw dd testing tonight after tweaking as much as I plan to.  It achieved write speeds of 490MB/s and read speeds of 894MB/s.  Not too shabby.... 
> 
> What is your setup?
> ...

 

I think pretty much all my relevent hardware specs are listed in prior posts within the thread.  It's a core2 based celeron htpc currently with 2GB ram, 4x2TB samsung F4's, a 64GB SSD system drive, an sata BD-ROM drive, and a 250GB 7200rpm ide drive.  All hooked up to a Zotac 9300 board.

The results are raidz(1) through the native filesystem port from KQ Infotech (which still uses the zfs innards of the zfs-fuse project).

The synthetic results do not really match real-world performance.  As I was saying, when copying a big file from the 7200rpm ide drive to the raidz array, it's only going at 40-50MB/s until the memory issue crops up, and then it slows down drastically.

----------

## Truzzone

@psycho_driver: Sorry I thinked that already read, but I mistake with another thread XD

Thank you for your reply.

My newbie questions:

Waht is the difference from zfs (zfsonlinux.org) and zfs-fuse (zfs-fuse.net)?

Zfs can use an ssd as "speedy cache"?

Need an entire ssd or it's possible to partitioning: one small for system and the rest for zfs cache?

Best regards,

Truzzone   :Smile: 

----------

## zefrer

Hmm if you're getting higher rates when reading until memory is exhausted then the real transfer rate is only _after_ memory is exhausted. Anything prior to that includes caching which inflates the transfer rate.

If this also happens when writing then something else is wrong. There's no conceivable reason for zfs, or any filesystem, to be slower at writing when memory is full. 

Unless of course the filesystem tries to grab _all_ memory and leaves none for actually loading the data you want to write to disk into memory in the first place  :Smile: 

Have you checked what sector size was used for the zpool?

----------

## psycho_driver

 *zefrer wrote:*   

> Hmm if you're getting higher rates when reading until memory is exhausted then the real transfer rate is only _after_ memory is exhausted. Anything prior to that includes caching which inflates the transfer rate.
> 
> If this also happens when writing then something else is wrong. There's no conceivable reason for zfs, or any filesystem, to be slower at writing when memory is full. 
> 
> Unless of course the filesystem tries to grab _all_ memory and leaves none for actually loading the data you want to write to disk into memory in the first place 
> ...

 

Sector size is 512b, which is emulated on the F4's.  Recordsize is 128k.  Data is being written to disk because I'm transfering files > 1GB and I'm watching the destination dir with ls -l every couple of seconds.  My system idles at around 900mb of memory usage (htpc with lots of stuff going on).  When transfering, say, a 1.1GB file, it gets about 1GB transferred relatively quickly, but then it's at somewhere around 1.97/2GB memory used and the slow down happens.  Also, after that file is finished transferring, the memory that has been allocated is not freed in a reasonable time frame, and if I initiate another large file transfer, it starts off at the lower speed.

I agree that something isn't quite right.  I've tried it with the caches disabled, only with metadata being cached, and with various zfs_arc_max values, but none of them seem to change the behaviour.

----------

## Truzzone

 *psycho_driver wrote:*   

> ...
> 
> Sector size is 512b, which is emulated on the F4's.  Recordsize is 128k.  Data is being written to disk because I'm transfering files > 1GB and I'm watching the destination dir with ls -l every couple of seconds.
> 
> ...

 

While you transfer copy files, open two screen one with iotop and another with htop, it is usefull for check what is doing the system load  :Wink: 

Best regards,

Truzzone   :Smile: 

----------

## devsk

If the slowdown is using the LLNL code, then its understandable. There is an open issue which you can track here: https://github.com/behlendorf/zfs/issues#issue/130

If the slowdown is using the KQI code, then its not understandable and should likely be filed as a bug.

There is no known slowdown with zfs-fuse as such. It performs well within the parameters of the user space FS, which are known to be slower because of FUSE layer.

----------

## psycho_driver

Looks like I'm the latest horror story on EggSaver shipping.  Ordered the ram on 2/22, it was finally processed and 'out for delivery' yesterday . . . which was the last status update and there's no sign of it yet.

----------

## zefrer

 *psycho_driver wrote:*   

> 
> 
> Sector size is 512b, which is emulated on the F4's.  Recordsize is 128k.  

 

Then the slow transfer rate is definitely the result (at least partly) of the sector size. Sector aligned emulation mode is very costly for these advanced format drives. There are lots of benchmarks on the net that show this with transfer rates similar to yours.

Can you re-create the zpool with sector size set to 4k and test again?

----------

## psycho_driver

 *zefrer wrote:*   

>  *psycho_driver wrote:*   
> 
> Sector size is 512b, which is emulated on the F4's.  Recordsize is 128k.   
> 
> Then the slow transfer rate is definitely the result (at least partly) of the sector size. Sector aligned emulation mode is very costly for these advanced format drives. There are lots of benchmarks on the net that show this with transfer rates similar to yours.
> ...

 

As far as I know there is no way to specify sector size with zpool.  Sun users can create virtual drives with 4k sectors using gnop.  If I'm missing something obvious let me know.  I'm not sure I would go to the trouble regardless since I already have quite a bit of data on them.

I received the memory today and it seems to have alleviated the main problem I was experiencing.  I can now transfer a 4.5GB file at 55MB/s.  It uses memory up to the 4GB limit, but then it seems to effeciently recycle/reuse memory and keeps on chugging along at a decent clip.  So, I recommend avoiding the KQ Infotech solution if you have less than 4GB (maybe 3GB) of ram.  A 1GB file transferred at 65MB/s, which is close enough to saturating a GB ethernet line that I'm content with it.

----------

## zefrer

I havent used ZFS in a while but if I remember right there is an option to change sector size. I'll have a look once I have KQ zfs installed.

You're wasting a lot of performance like it is tho. As far as the drive is concerned you might as well be using win xp. Not to mention you'd get better performance by not using zfs in the first place and going with mdraid and any other filesystem.

Personally I don't consider a requirement that you have at least 3gb of ram just to see a reasonable performance out of it. ZFS that has lots of ram to play with should be screaming fast, not 60mb/s. 60mb/s should be the minimum transfer rate after memory is exhausted, not the max.

My 3 year old 2.5" drive can do that right now with no caching whatsoever. But hey, it's your hw  :Smile: 

----------

## devsk

As of March 4, I have started using LLNL's ZFS as my root filesystem on my laptop. Let's see how far it goes! It looks rock solid so far.

----------

## psycho_driver

A full system backup from the SSD via tar now run just under 40MB/s.  Vast improvement.  Also a large file copy from SSD->raidz went at 78MB/s.  I think the earlier transfers may have been partially limited by the IDE drive's read speed.

Speed was never what I was going for with this setup.  I want reliable software raid 5, and the built in compression made it a no brainer.  Since I'm confident I can come pretty close to keeping a GB network saturated with it, I'm happy.

----------

## spielc

 *devsk wrote:*   

> As of March 4, I have started using LLNL's ZFS as my root filesystem on my laptop. Let's see how far it goes! It looks rock solid so far.

 

Good luck and let us know about your findings! I'm sure a lot of ppl (myself included) will be interested in your results.

I do have another question tho: I use llnl's zfs-implementation to store the portage related stuff (portage, distfiles and layman) in a zpool. I wrote a small initskript that esentially just loads the spl and zfs modules and calls zfs mount -a. The first step works without any problems but the second fails with the following errors:

```

cannot mount 'gentoo/distfiles': No such device or address

cannot mount 'gentoo/overlays': No such device or address

cannot mount 'gentoo/portage': No such device or address

```

The interesting this is that if i execute zfs mount -a manually as root everything works as expected. It just fails if i call it from the initskript. Do you have any idea why it fails?

----------

## devsk

 *spielc wrote:*   

>  *devsk wrote:*   As of March 4, I have started using LLNL's ZFS as my root filesystem on my laptop. Let's see how far it goes! It looks rock solid so far. 
> 
> Good luck and let us know about your findings! I'm sure a lot of ppl (myself included) will be interested in your results.
> 
> I do have another question tho: I use llnl's zfs-implementation to store the portage related stuff (portage, distfiles and layman) in a zpool. I wrote a small initskript that esentially just loads the spl and zfs modules and calls zfs mount -a. The first step works without any problems but the second fails with the following errors:
> ...

 Try putting in some sleep after the module loading. It may be racing with the mount.

----------

## spielc

 *devsk wrote:*   

> Try putting in some sleep after the module loading. It may be racing with the mount.

 

Thanks for the tip but that doesn't seem to be the issue. i let it sleep for 30 seconds and i still get the same result.

Here's the initscript maybe you see something that i've missed but in my eyes it looks okey...

```

#!/sbin/runscript

# Copyright 1999-2006 Gentoo Foundation

# Distributed under the terms of the GNU General Public License v2

opts="start stop restart"

depend() {

        need modules

}

start() {

        ebegin "Modprobing zfs-mdule"

        modprobe zfs

        eend $?

        sleep 30

        ebegin "Mounting zfs filesystem(s)"

        zfs mount -a

        eend $?

}

stop() {

        ebegin "Unmounting zfs filesystems"

        zfs umount -a

        eend $?

}

```

----------

## devsk

Have a look at mount behavior issues: https://github.com/behlendorf/zfs/issues/107#comment_844398

----------

## spielc

 *devsk wrote:*   

> Have a look at mount behavior issues: https://github.com/behlendorf/zfs/issues/107#comment_844398

 

Ah thanks for the info, this really looks promising... I'm going to have a look at it

----------

## John R. Graham

Looks like the ebuilds for the LLNL ZFS implementation would be relatively trivial, being standard autoconf / automake / configure based projects, but I thought I'd ask before I went & did 'em if they exist in an overlay somewhere already.

- John

----------

## devsk

 *John R. Graham wrote:*   

> Looks like the ebuilds for the LLNL ZFS implementation would be relatively trivial, being standard autoconf / automake / configure based projects, but I thought I'd ask before I went & did 'em if they exist in an overlay somewhere already.
> 
> - John

 The science overlay.

----------

## John R. Graham

Thanks, devsk!   :Smile: 

- John

----------

## John R. Graham

I'm working through this new guide: Migrating Bootable Gentoo on ZFS Root. It makes, to me, the rather startling claim that native ZFS will not work right (neither compile nor build) unless you have a non-preemptable kernel (CONFIG_PREEMPT_NONE=y in kernel .config file). For those of you that are experimenting with native ZFS, have you found this to be the case?

- John

----------

## spielc

 *John R. Graham wrote:*   

> I'm working through this new guide: Migrating Bootable Gentoo on ZFS Root. It makes, to me, the rather startling claim that native ZFS will not work right (neither compile nor build) unless you have a non-preemptable kernel (CONFIG_PREEMPT_NONE=y in kernel .config file). For those of you that are experimenting with native ZFS, have you found this to be the case?
> 
> - John

 

I use ZFS for the gentoo-related stuff (distfiles, portage-tree and overlays) on my laptop. I have a preemptible kernel and it works. I do get lots of kernel-bug messages in the kernel-related logs but i have found no evidence that it doesn't work. You do have to patch the configure-script tho, as it bails out if you leave it as is and CONFIG_PREEMPT is enabled. At the moment i wouldn't use zfs as root-fs though, as it is known to have performance issues.

----------

## kernelOfTruth

*bump*

any news meanwhile ?

I'm currently trying to use ZFS for some of my smaller backup drives

the main issue is that I can't 100% (for me meaning: not at all) work productively while it's transferring files

every 2 seconds or so it's syncing to the harddrive and this interrupts keyboard and mouse

it's annoying to say the least

any ideas how to solve this ?

I remember having seen this start with zfs-fuse 0.7+, 0.6.9 was fine

could this be a nasty regression in ZFS ?

----------

## spielc

I left the tests with zfs on linux some time ago, as it got worse and worse for me. Till the point when i wasn't able to mount the fs at all. That was the point for me to give up. Furthermore i still follow the bugreport tracking the CONFIG_PREEMPTIBLE-issues and from what i've seen that this is still not resolved and thus this is still a no-go for desktop systems...

----------

## ryao

 *kernelOfTruth wrote:*   

> the main issue is that I can't 100% (for me meaning: not at all) work productively while it's transferring files
> 
> every 2 seconds or so it's syncing to the harddrive and this interrupts keyboard and mouse
> 
> it's annoying to say the least
> ...

 

This is a regression caused by the following commit:

https://github.com/zfsonlinux/zfs/commit/302f753f1657c05a4287226eeda1f53ae431b8a7

Adding swap should help. I also have some patches in an upstream pull request that might also help:

https://github.com/zfsonlinux/zfs/pull/726

 *spielc wrote:*   

> I left the tests with zfs on linux some time ago, as it got worse and worse for me. Till the point when i wasn't able to mount the fs at all. That was the point for me to give up. Furthermore i still follow the bugreport tracking the CONFIG_PREEMPTIBLE-issues and from what i've seen that this is still not resolved and thus this is still a no-go for desktop systems...

 

See Gentoo on ZFS.

----------

## kernelOfTruth

 *ryao wrote:*   

>  *kernelOfTruth wrote:*   the main issue is that I can't 100% (for me meaning: not at all) work productively while it's transferring files
> 
> every 2 seconds or so it's syncing to the harddrive and this interrupts keyboard and mouse
> 
> it's annoying to say the least
> ...

 

thanks a lot for your work on ZFS ryao  :Smile: 

by "adding swap" you mean a swap partition ? or the swap branch ?

cause I'm already using a fairly big one (9 GB) albeit not on ZFS

is it supposed to run on ZFS ?

I'll give ZFS another try on July when exams are over - right now btrfs seems to work stable enough (can't be sure when it's not accessible the next time   :Laughing:  )

----------

## ryao

 *kernelOfTruth wrote:*   

> by "adding swap" you mean a swap partition ? or the swap branch ?
> 
> cause I'm already using a fairly big one (9 GB) albeit not on ZFS
> 
> is it supposed to run on ZFS ?

 

Linux swap on ZFS zvols works with my latest patches. They are in a pull request at upstream. ZFS is meant to eliminate the need for multiple partitions, traditional raid and logical volume managers. That includes separate partitions for swap.

Currently, Solaris puts swap on a zvol. swap on zvols being perfectly stable in my testing of FreeBSD's 9.0 release, but I believe that they will not recommend it until their 9.1 release. With these patches, Linux swap on zvols is possible. There is currently one lingering issue in it. You can read a description of it in the Gentoo on ZFS for the details.

----------

## ryao

I have keyworded sys-kernel/spl-0.6.0_rc9 and sys-fs/zfs-0.6.0_rc9 on ~amd64. They include patches for hardened support and deadlock fixes that make swap work. The preemption support patches have been omitted pending some additional changes. The 9999 versions apply no patches.

By the way, we might want to re-evaluate the decision to put this thread in the "Unsupported Software" forum.

----------

## John R. Graham

Moved from Unsupported Software to Kernel & Hardware in honor of Ryao joining us to support (among other things) ZFS.

- John

----------

