# If EXT4, don't use 3.5.7 / 3.6.2 kernels [OBSOLETE]

## aCOSwt

ext4 data corruption regression

----------

## gerard27

Thanks aCOSwt.

I installed 3.5.7 two days ago.

No problems so far but to be on the safe side I switched back to 3.4.9.

Gerard.

----------

## bandreabis

Affected kernels have been Hard Masked!

At least yesterday they were.

----------

## aCOSwt

Could be some poorly tested specific option's fault :

 *Quote:*   

> > the full set of options for all my ext4 filesystems are:
> 
> > 
> 
> > rw,nosuid,nodev,relatime,journal_checksum,journal_async_commit,nobarrier,quota,
> ...

 

----------

## c00l.wave

Note that there's a null-pointer dereference occuring when large files are being deleted from ext4 filesystems in 3.4.9, which was my main reason to upgrade to and stay at 3.5.7 on the systems I maintain (I've actually hit the null-pointer bug when moving backups of many tens of gigabytes across disks). It's much more probable to hit that bug than hitting the journal bug that led into masking panics - for what I read, the current bug is considered to occur only under very specific circumstances that require having created and mounted the filesystem with uncommon options.

The null-pointer bug seems to have been missed by Gentoo devs but now has a bug report as well (at least it reads like the same one I encountered).

So I would add "don't use 3.4.9 either" or "but run 3.5.7/3.6.2 anyway if you run default filesystems" (without warranty, you'd better have backups either way).

----------

## platojones

Mask has been lifted for 3.6.2.

----------

## ppurka

 *platojones wrote:*   

> Mask has been lifted for 3.6.2.

 Not surprised. It was an esoteric bug reproducible only on an esoteric configuration.

----------

## platojones

 *ppurka wrote:*   

>  *platojones wrote:*   Mask has been lifted for 3.6.2. Not surprised. It was an esoteric bug reproducible only on an esoteric configuration.

 

Not sure it's been reproduced at all.  Only the original reporter on the thread and supposedly one other (2nd hand report) so far.  Ts'o has  yet to be able to reproduce it.

----------

## Tony0945

As a user, I'm thoroughly confused. I run a stable system as much as possible. I was on 3.4.9, a routine update installed 3.5.7. I rebuilt the kernel and removed 3.4.9, then 3.5.7 was masked and I re-emerged 3.4.9, rebuilt the kernel, and removed 3.5.7, now emerge -auvND world wants to re-install 3.5.7

I don't understand the mount option problem. My /etc/fstab has the following line:

```
/dev/sda2               /               ext4            noatime         0 1
```

What is the latest stable safe kernel to run? Should I mask both 3.4.9 and 3.5.7 ?

What mount options are safe to use? I don't remember the full line when I created the file system years ago. How can I display this?

Should I tar off the system and reformat the drive with ext3? Random data loss is a scary thing. I applaud wholeheartedly those individuals who take the risk to test these kernels, but I don't want risk on my personal system.

----------

## c00l.wave

I don't think the 3.4.9 bug causes random data loss - loss should happen only to files that were being written/deleted at that time. After the null-pointer dereference occured, I found the backup files I copied to be randomly 0 byte size on either target or source and one file cut off. Having upgraded to 3.5.7 I compared file sizes and copied the larger file to the destination drive but I haven't tried if the data is still ok (those were only old backups moved to make space for newer ones). I'm not a kernel developer but the effect of the 3.4.9 bug does not appear to be worse than simply cutting power while writing to disk - the journal will revert any pending transactions and fsck will check for structural conistency.

If you don't remember having set any fancy options for your ext4 partitions, I wouldn't mind the bug in 3.5.7. However, it would be much more severe if it stroke. It's your own choice but I stayed with 3.5.7 so far.

To be completely safe, you could also choose a kernel older than 3.4. I wouldn't want to "downgrade" to ext3, though.

(Your "noatime" mount option is nothing special, it just disables the usually unnecessary "access time" logging.)

----------

## aCOSwt

 *Tony0945 wrote:*   

> As a user, I'm thoroughly confused. I run a stable system as much as possible. I was on 3.4.9, a routine update installed 3.5.7. I rebuilt the kernel and removed 3.4.9, then 3.5.7 was masked and I re-emerged 3.4.9, rebuilt the kernel, and removed 3.5.7, now emerge -auvND world wants to re-install 3.5.7
> 
> I don't understand the mount option problem. My /etc/fstab has the following line:
> 
> ```
> ...

 

You observed the 3.5.7 -> 3.4.9 -> 3.5.7 flip flop because

1/ 3.5.7 was flagged stable

2/ 3.5.7, by precaution following the problem object of this thread, 3.5.7 was reflagged ~arch => 3.4.9 became last stable

3/ 3.5.7, the problem object of this thread is believed marginal => 3.5.7 comes back stable.

Last x86_64 gentoo stable today is 3.5.7

You do not have to worry with the mount options which probably triggered this option as long as you use default mount options.

Safe mount options are default mount options, that is why... they are default...   :Twisted Evil: 

This is what I get for example for an ext4 in my system.

```
LABEL=M_1_G64_VAR       /var                            ext4    defaults,noatime,nodiratime             0 2
```

The user having the problem was *not* using default mount options.

(BTW, there is no problem with noatime, nodiratime either, even if they are not default)

----------

## Tony0945

Thanks for the prompt response.

----------

