# umount or force flush of busy ext4 filesystem

## mv

I am using a 32 bit chroot und a 64 bit system; it is a separate ext4 partition and has several directories bind-mounted (like e.g. /dev -> /gentoo32/dev).

Since quite a while it happens rather frequently that it is not possible to umount the chroot, because the kernel claims that the partition is busy. This happens even if I systematically kill all processes except for my current shell (even if I temporarily kill the waiting atty's of init). Quite often the bind /dev -> /gentoo32/dev cannot be umounted, but even if it can, then also /gentoo32 cannot be umounted.

Of course, I tried remounting readonly or umounting lazy (or several such combinations), but nothing of this helped: I am not able to shutdown the system cleanly.

After restarting, some journal information is replayed (or sometime not replayed due to journal checksum errors), usually ending finally in a severely broken filesystem which even fsck is only able to repair partially (interlinked directories pointing to identical inodes etc.)

So my question is: Can I force ext4 at least to write all data it currently has in cache to leave in a non-broken state?  So far, I tried 

```
sync

sync

sync

echo 1 >/proc/sys/vm/drop_caches

sleep 2

sync

sync

sync
```

(and/or used lots of other cache, waited 24 hours) - no success yet: After booting, usually part of the journal is replayed (which IMHO should not happen after the above command, but I might be wrong), causing filesystem corruption. I guess there should be some command which really forces writing?

----------

## eccerr0r

Linux won't let you unmount a disk if it has an open write file descriptor to the disk.  Sync()ing does not help, the file is open and the controlling process may have written the file in an inconsistent state; at the very least the process and OS has not finalized how big the file really is and what sectors they're occupying.  You'll need to find what processes are still running and did not close the file handle properly.

Things that I have confused myself and thus can't unmount a partition:

- NFSD/Automounter 

- Swap files on the partition

Of course processes stuck in the 'D' state can be culprits for this issue...

----------

## mv

 *eccerr0r wrote:*   

> You'll need to find what processes are still running and did not close the file handle properly.

 

Thanks for your reply. However, as I said, I had killed all processes (except init and kernel processes). Probably the kernel falsely believes that a file is still open.

I had a similar problem with kernel versions in some range 2.6.2?-3.?: After mount --share / (which is very convenient after mounting / into a chroot subdirectory) mounting some squashfs+aufs devices, these would not be able to umount if in between I just mounted and umounted some dmcrypted devices. (I first had blamed aufs, but the same happened also with unionfs-fuse.) This was finally fixed in some recent kernel version, so that finally I could work with the convenient mount --share / again. However, now the problem persists even if I avoid the mount --share /.

Finally (after some hours of experimenting) I found a reliable way to reproduce the problem: If I run keychain in the chroot (which essentially starts ssh-agent and pgp-agent which certainly open some socket in /tmp within the chroot) and if I kill the generated processes from *outside* the chroot, the umount is impossible afterwards.

So I am rather sure that it is a kernel bug. (I am using hardened-sources with some chroot restrictions, so this may be related - I have no time now for lot of experiments to find out.)

My point is not to track down this particular bug. As the above example of the mentioned bug with mount --share / shows: such bugs can come and go with newer kernel versions. I would like to know how to avoid that bugs of such a "trivial" kind can cause severe data loss.

I remember that I had seen some tool/command especially for ext2 (at that time) to force writing the current state: This would at least fix the severe data loss which I have now: In this case, of course some blocks for the unknown "open" file might have been reserved and need to be fixed with e2fsck, but I should not have blocks claimed by multiple files.

 *Quote:*   

> - NFSD/Automounter
> 
> - Swap files on the partition

 

I use neither of these (only a swap partition). For a while I had blamed aufs+squashfs and had used overlayfs patches instead of aufs patches (without success concerning the bug). However, the above way to produce the bug happens also if I do not mount any squashfs or overlayfs.

----------

## eccerr0r

The point I'm unfortunately making is that since you see corruption, there's a process, whether it's kernel or not, that's still holding onto data into RAM.  That data needs to be dumped to VFS/cache and then this data can be committed to disk.  Any time before, there's no way to know what needs to be sync()'ed to disk.

I'm afraid really the only solution is to actually help debug why the kernel is leaving descriptors open in a half baked state and thus cannot be written back to disk.  This is a fairly serious bug and should be debugged, or at least submitted to lkml or something.  It sounds that the kernel isn't cleaning up properly when dealing with a merged filesystem outside a chroot.  I've seen tons of chroot issues in the past, this isn't surprising...

Did aufs or unionfs get merged into stable kernel yet?  I forget.  Issues such as this are one of the major reasons why these are patches and not part of the kernel yet...

----------

## mv

 *eccerr0r wrote:*   

> This is a fairly serious bug and should be debugged, or at least submitted to lkml or something.

 

Yes, but I have no time to do this now.

 *Quote:*   

> I've seen tons of chroot issues in the past, this isn't surprising...

 

Yes, actually I am already used to this kind of error. But since I had recently now successively enormous data loss due to this problem - several gigabytes of freshly installed packages were corrupted after the fsck several times - I hoped that there is a quick hack to avoid the worst consequences.

 *Quote:*   

> Did aufs or unionfs get merged into stable kernel yet?  I forget.

 

unionfs seems to be dead. Merge requests for aufs are ignored since years. overlayfs seemed to have a chance to be included, but now also merge requests seem to be ignored.

 *Quote:*   

> Issues such as this are one of the major reasons why these are patches and not part of the kernel yet...

 

Neither aufs nor overlayfs are related to the problem; in fact, I am rather sure that both never caused any issues. And if they have really bugs these should be fixed but not ignored by mainstream, since they are really needed on many live/embedded/small harddisk systems.

IMHO there is another concept broken if you cannot avoid that gigabytes a data loss occur just because the kernel does not properly flag some open fifo as closed under certain circumstances.

----------

## BitJam

I've had similar problems of being unable to umount after a chroot but I never experienced the data loss you have been getting.  Sometimes my system will go down (power failures or stupid things on my part).  My ext3 journals play back and everything seems fine afterward.   If umounting gets jammed then there is very little journal playback on the next boot.

My guess is that you have two different problems that gang up to create your data loss disasters.  I'm not very surprised to hear that PGP related programs behave strangely when killed from the outside.  It might be a feature, not a bug because they try very hard to protect your secret keys from prying eyes.  In either case it makes sense that this is why you can't umount.

I think this is combined with a 2nd problem and the 2nd problem is what causes your large data losses after you are unable to cleanly umount.  I think the combination of these 2 problems is rare which would explain why it's not fixed yet.   It seems that somehow the journaling gets screwed up.  You could try journaling data as well as meta-data then the filesystem should almost always be in a consistent state.  The drawback is that data journaling slows things down.  But I would bet that you would still get data loss because the journaling is not working right for some reason although I can't even guess what that reason would be.  It is almost like one side of the chroot is journaling and the other side is not but I can't imagine how that could possibly happen.

BTW: you might have more luck with:

```
# echo 3 > /proc/sys/vm/drop_caches
```

 which  I am told frees dentries and inodes instead of just pages.

----------

## szczerb

There seems to be a bug in current kernels concerning ext4 journal http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ.

----------

## mv

 *szczerb wrote:*   

> There seems to be a bug in current kernels concerning ext4 journal

 

The umount problem is older, but maybe this is related with the huge data loss. I cannot reproduce when it happened first.

----------

## mv

 *BitJam wrote:*   

> If umounting gets jammed then there is very little journal playback on the next boot.

 

The journal playback is usually never huge. But afterwards, sometimes even booting is impossible without manually executing e2fsck, and whenever this is necesary, it "repairs" thousands of blocks and broken file entries. Fortunately, usually only those written since the last shutdown, but practically the whole data between the two shutdowns is completely messed up.

 *Quote:*   

> I'm not very surprised to hear that PGP related programs behave strangely when killed from the outside.  It might be a feature, not a bug

 

No program should be able to finish and leave file handles open - it is certainly a kernel problem.

 *Quote:*   

> In either case it makes sense that this is why you can't umount.

 

It is certainly not the only cause. It is just the only cause which I was able to track down by experimenting, but I still have the same problems sometimes even without starting any daemons in the chroot.

 *Quote:*   

> You could try journaling data as well as meta-data then the filesystem should almost always be in a consistent state.

 

Duplicate blocks should mean that the metadata gets screwed up, so journaling file data should not make a difference. However, maybe I will try this now anyway.

 *Quote:*   

> 
> 
> ```
> # echo 3 > /proc/sys/vm/drop_caches
> ```
> ...

 

Thanks. I will certainly try this.

----------

