# ext4 disk bloat

## Gentree

HI,

I have been holding back on kernel updates in order to retain several partitions I have using Reiser4 but this implies holding back other packages and is not longer really practicable.

So I decide to migrate those partitions to ext4. However, having converted a couple of them I note a huge bloat of the used disk space for the same data. 

I'm using /one as a temporary storage to copy the fs contents 

```

pwd

/tmpd

time cp -a * /one

real    4m50.358s

user    0m0.696s

sys     0m22.923s

df -h

/dev/hda16     reiser4  3.8G  3.5G  277M  93% /tmpd

/dev/hda11     ext4      13G  6.3G  5.7G  53% /one

```

So what reiser4 managed to store in 3.5GB ext4 is managing to bloat to 6.3GB.

WTF? That's not far off DOUBLE ! 

Do I need to tune the default options or something?

TIA, Gentree.   :Cool: 

----------

## morpheus2051

Hi,

do you have many small files on the new ext4? First guess is that I would look into the block size of the new filesystem. 

See: http://kernelnewbies.org/Linux_3.2#head-6587eeb43f3afa2b24306c24b679cfe77f6da357.

Hope this helps.

Greetings 

morpheus

----------

## Gentree

Thanks ,

-m 1  option may reduce the wastage a bit, but I can't find -C indicated in that link.

 *Quote:*   

> using the mkfs -C option (requires e2fsprogs 1.42)

 

I have 1.42.4 yet don't find -C documented in man  mkfs.ext4

However, I'm running  2.6.32-hh1 maybe that kernel is too old. 

 :Confused: 

----------

## morpheus2051

I have sys-fs/e2fsprogs-1.42.7 and the -C option is documented. But I am running a fairly new kernel, vanilla-sources-3.10.5.

----------

## Gentree

I'm a bit confused here. I doubt that manpages is sniffing your kernel version so this does not make sense. Maybe an omission somewhere?

Could you snip me the relevant paragraph. I'll try the option to see whether it's active or not. 

thx.

----------

## morpheus2051

```
-C  cluster-size

              Specify the size of cluster in bytes for filesystems using the bigalloc feature.  Valid cluster-size values  are  from  2048  to

              256M  bytes  per cluster.  By default (if bigalloc is enabled and no cluster size is otherwise specified using this option), the

              cluster size will be 16 times the block size.
```

```
-b block-size

              Specify  the  size of blocks in bytes.  Valid block-size values are 1024, 2048 and 4096 bytes per block.  If omitted, block-size

              is heuristically determined by the filesystem size and the expected usage of the filesystem (see the -T option).  If  block-size

              is  preceded  by  a  negative sign ('-'), then mke2fs will use heuristics to determine the appropriate block size, with the con‐

              straint that the block size will be at least block-size bytes.  This is useful for certain hardware devices which  require  that

              the blocksize be a multiple of 2k.
```

----------

## morpheus2051

What linux-headers do you have?  I have 3.7. Perhaps this option is enabled at build time of e2fsprogs.

----------

## Gentree

I tried -C and it did not complain .

I think I'll wait until I have the system backedup and updated since this drive is getting a bit flaky anyway.  I just need to convert all these partitions to universally available formats.

This is going to cost some much time now , I think I'll go a fresh installation. This one's about ten years old since I first installed Gentoo. 

Just as well, installing Gentoo is so frigging complicated, you don't want to do it more than every ten years.   :Razz: 

----------

## morpheus2051

My oldest gentoo installation is eight years old and has seen different mobos and cpus. I never felt the need to make a new install. When I upgraded my installation to ext4, I used tar to back up my system. That goes much faster than a new install. If you worry about cruft in your system I found this tool to be working quite well: http://www.genoetigt.de/site/projects/gcruft.

----------

## NeddySeagoon

Gentree,

The data does not occupy twice the space. The filesystems operate differently

ext4 reserves 5% for the superuser by defualt.  It also allocates all of the space for its metadata (i-nodes) at filesystem create time.

The 5% (650Mb) can be reclaimed with tune2fs.  The metadata needs the filesystem to be remade.

Look at 

```
df -i
```

this tells how may i-nodes you have.  Each i-node needs 128B or 256B, depending on your options and you need one i-node per file.  Running out of i-nodes produces disk full messages.

mke2fs -t ext4 tries to guess from the size of the volume what a good i-node to disk space ration will be.

For the portage tree, use 1k disk blocks and one i-node per block.  For DVD rips one i-node per GB may be generous.

Hmm /dev/hda ... thats going to be a really painful update

----------

## Gentree

```
df -i 
```

thanks I didn't know that option. 

 *Quote:*   

> Hmm /dev/hda ... thats going to be a really painful update

 

Yes, that was the other reason I'm still on an older kernel. I'm pissed off about them pulling the PATA driver. It makes it a lot easier to identify and manage hardware if an IDE drive is identified as such and not pretending to be a sata device. 

Apart from that I have a clone of the root partition on sda anyway as a fallback in case a portage update does something silly.

The main problem is that I've had to block some packages like udev that no longer recognise /dev/hda* and the knock-on effect means I started to have a chain other updates I had to block and the whole system has not had a proper update for nearly a year.

That is likely to produce such a rat's nest blockages once I update the portage  tree that I think it's probably going to be easier to start afresh.

 :Evil or Very Mad: 

----------

## dmpogo

 *Gentree wrote:*   

> 
> 
> ```
> df -i 
> ```
> ...

 

Why would not you just as a first step convert /dev/hda to a new driver using your old kernel, and then proceed from there ?

----------

## Gentree

Thanks , pretending IDE drives are SATA is not the real problem, it's just a PITA from a hardware maintenance standpoint. 

The trouble is the mess I'm going to hit when I update the portage tree. Either I spend a day trying to unravel a rat's nest of dependency blockages or I do a fresh installation. 

Either way looks like a day's work at least. Update will probably involve about 48 hours worth of compile time in itself since just about every package will be out of date. It's probably going to be like a stage 1 tarball installation plus the hassle with the tree. 

Plus I've just had to deal with a mobo going flaky that needed swapping out. 

I think I'd rather be at the beach.

 :Rolling Eyes: 

----------

