# [SOLVED (workaround)] 2.6.28+ corrupts my ext3 partitions

## javaJake

This is the first time I've had a "serious" kernel issue. I upgraded to 2.6.28 from 2.6.27 first by hand, then by oldconfig, both of which produced nearly identical results. The only difference was oldconfig translated changes over I'd forgotten to make myself.

Now, however, whenever I try to boot 2.6.28, fsck screams that the real superblock count of a partition doesn't match what the partition claims, or something like that. The two numbers it shows are off by thousands. When I reboot back into 2.6.27, fsck detects file system errors, fixes them, and the boot continues as normal. Note that it only detects these errors after attempting to boot with 2.6.28.

If I boot with init=/bin/bb, I can see that it hasn't confused partitions. For example, / is still /  :Wink: 

So, all this put together makes me think that some code in the ext3 module for 2.6.28 is doing something horribly wrong.

Any ideas?Last edited by javaJake on Sun May 24, 2009 12:43 pm; edited 1 time in total

----------

## MaximeG

Hi,

I used ext3 with 2.6.28 for a long time and never had such issues.

However, I can't say that fancy options are not broken with 2.6.28 though, as I used only default mkfs and mount options for ext3.

Regards,

Maxime

----------

## javaJake

I use whatever defaults were recommended by the Handbook. I never step beyond what's officially sanctioned and marked as stable for stability and compatibility reasons, obviously. Well, I try not to.  :Razz: 

Edit: Here's some information for my system:

```
# /etc/fstab

/dev/sda2      /boot      ext3      noauto,noatime   1 2

/dev/sda4      /      ext3      noatime      0 1

/dev/sda1      none      swap      sw      0 0

# sudo fdisk -l

Disk /dev/sda: 80.0 GB, 80000000000 bytes

255 heads, 63 sectors/track, 9726 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0xe4651a0a

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1               1          62      497983+  82  Linux swap / Solaris

/dev/sda2   *          63          75      104422+  83  Linux

/dev/sda3              76        1321    10008495   83  Linux

/dev/sda4            1322        9729    67537260   83  Linux

```

Kernel .config: http://pastebin.com/d9e17e7e

----------

## MaximeG

That's strange then.

Can you check whether everything is configured for the ext3 module in your kernel ?

If it's configured in kernel and not as module.

Also, try and check if you're using the last e2fsprogs package.

What does an fsck say if you try on ext3 partitions ?

I can't see other reason why ext3 would not work properly.

Regards,

Maxime

----------

## i92guboj

Can you run fsck on the offending partition from a livecd?

The fact that it happened now maybe has nothing to do with the update. FS's fail sometimes, even the most solid ones. Hardware can also fail from time to time leaving an fs inconsistent, no matter how good it is.

----------

## javaJake

I can certainly run it from the LiveCD, but fsck on boot already did a scan, after which it has not given me any problems. Again, this only occurs after every attempt to boot into 2.6.28.

----------

## i92guboj

 *javaJake wrote:*   

> I can certainly run it from the LiveCD, but fsck on boot already did a scan, after which it has not given me any problems. Again, this only occurs after every attempt to boot into 2.6.28.

 

Oh, I understand. Unfortunately I have no idea how to start debugging this. It is my understanding that you are using either a vanilla kernel or gentoo-sources, with no fancy patches or anything, right?

----------

## javaJake

Correct. I use straight gentoo-sources stuff. I did have a unionfs patch, but when I got these troubles, I reinstalled 2.6.28, and reconfigured the kernel from scratch without the patch.

Edit: And, since that didn't work, and I use unionfs on a daily basis (read-only, btw) I put my unionfs patch back in.

----------

## Clad in Sky

I had nearly the same problem, all sorts of weird stuff about bad superblocks and suchlike.

It began with my first handmade 2.6.28(-gentoo-r5) panicking on boot.

I got a working one later on, which gave me some errors (dmesg | grep ata → softreset failed). Most prominent was an error claiming that /dev/sdaX was a file system with errors. But there were more severe issues as well.

This is when I for the first time resorted to asking someone (pappy_mcfae) about my kernel config.

I dunno if HE fixed it or if it was the new HDD I bouht or both, but it seems to work now.

Since this 2.6.28 seems to cause troubles (none of my selfmade kernels since 2.6.22 (my first one)) gave me any trouble there seems to be something in 2.6.28.

You might want to try a kernel seed by pappy (google kernel seeds).

Good luck.

----------

## MaximeG

Hi,

I don't believe it's 2.6.28 in itself that's causing the issue, but rather some Hardware/2.6.28 specifications then.

I use 2.6.28 on 2 different machines and it works flawlessly.

Perhaps it's not directly linked to ext3 either, but more the way the module in 2.6.28 for your controllers (MB,HDD ... ) deals with the latest.

I'd reckon some help from pappy as well for anything kernel related then. Or you may search as well based on your MB/controller names instead of ext3.

If it doesn't help, perhaps you can try 2.6.29 ? Or even keep using tha good ole 2.6.27 since I don't think you utterly need .28 (as you may boot with .27)

Regards,

Maxime

----------

## javaJake

lscpi output:

```
00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01)

00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)

00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01)

00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01)

00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01)

00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01)

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81)

00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Bridge (rev 01)

00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01)

00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01)

00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)

01:07.0 Multimedia video controller: Conexant CX23880/1/2/3 PCI Video and Audio Decoder (rev 05)

01:08.0 Network controller: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller (rev 03)
```

I've tried Googling the IDE and PCI controllers (in succession) with 2.6.28 as an additional keyword for each, but only a bug about XSCALE came up, and that only affects users very early on in the boot process.

Seeing that XSCALE bug report gives me courage to file a bug report against the kernel 2.6.28, and see what comes of it. I'll keep you guys posted.

Edit: Actually, going to use a vanilla 2.6.28 kernel source first downloaded directly from kernel.org and see how that goes.

----------

## chris.c.hogan

I have this same problem. I upgraded from 2.6.23 to 2.6.29 because the xorg-x11 upgrade said I needed a newer kernel for udev. While I was playing around with the configuration, I noticed I hadn't moved over to the libata drivers yet. So I went ahead and disabled the old hard drive controller drivers and enabled the new ones. I also updated all of my settings from /dev/hda to /dev/sda When I booted this kernel, I was met with all sorts of errors about the file system being too big for the partition. I couldn't get the system to boot. I changed all of my settings back and rebooted to 2.6.23. fsck corrected some corruption, and the system booted normally. The problem only appeared when I booted with 2.6.29.

There is one difference. I'm using ReiserFS. So it is not the file system. lspci shows:

```
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 02)
```

I went back into the kernel config and disabled the libata drivers. I reverted back to the old /dev/hda drivers. This fixed the problem. There must be a bug in libata for this chip set. By the sound of your message, I'd say it was introduced in 2.6.28.

Please let us know if you find anything else out.

Thanks,

Chris

----------

## javaJake

2.6.29.4 did not work either, so I've made bug reports:

Gentoo: https://bugs.gentoo.org/show_bug.cgi?id=270883

Upstream: http://bugzilla.kernel.org/show_bug.cgi?id=13365

I'm not sure if double-posting like this is a bad thing, so I leaned towards the safe side and did it.  :Wink: 

I'll try the libata workaround mentioned above.

----------

## Clad in Sky

If you're not sure what kernel driver your HW needs, copy the output of lspci -n and post it here http://kmuto.jp/debian/hcl/

----------

## javaJake

Unfortunately, that won't help, because the modules I'm using should already work (and do in 2.6.27).

----------

## javaJake

Good news! Enabling the older ATA drivers and disabling the Serial ATA drivers allows 2.6.29.4 to boot. I'm about to try 2.6.28, but I'm pretty confident that'll work. The Gentoo team also didn't mind my double-posting, so it appears I did a good thing.

See the kernel bug for all the action. I'll update this thread if a true solution is found that'll allow me to go back to Serial ATA drivers again.

----------

## tld

Reading this had me concerned about upgrading.  I have three machines with various 82801xx IDE controllers, though not the same as yours.

Am I reading the bug correctly, that this is only an issue if your disk has an HPA area?  If so I know I don't have any.  However I believe that a lot of Dells used to ship with such things for recovery etc, so if you didn't actually blow away all the existing partitions before installing you could have that sort of stuff at the end of the disk.

Tom

----------

## javaJake

As far as I know, you are correct. Somehow I was supposed to know not to format my disk so harshly as to remove the HPA part. The old way of doing things was to allow this to happen.

However, the old implementation was technically wrong, so it's been changed now so that the kernel will simply refuse to read/write the HPA part, and thus the issues I have above. If you want to use Serial ATA drivers from here on, you can allow reading and writing from/to the HPA part with the following kernel parameter: 

```
libata.ignore_hpa=1
```

You can also check your dmesg logs for "HPA" to see if you're affected, since both ide and libata drivers detect and log it.

----------

## chris.c.hogan

The Host Protected Area is created using the SET MAX ADDRESS ATA command to tell the hard drive to lie to the operating system about it's size. The operating system should believe the results of the IDENTIFY DEVICE command that tell it how big the hard drive is. The IDE drivers used READ NATIVE MAX ADDRESS to bypass this lie. Libata correctly believes IDENTIFY DEVICE.

The HPA area can be removed using hdparm -N. After that, the IDENTIFY DEVICE command returns what ever you set it to and fdisk can set partitions using the maximum size of the disk. man hdparm details it's use. Given the warnings in the man page, I'd not try to make changes unless you can restore the data on the drive from backup. I'm now using the libata.ignore_hpa=1 parameter to work around the issue.

Personally, I always thought this was a useless feature to hide restore partitions from clueless users. So I had no problem with the IDE drivers ignoring it, though I did think it was being disabled. You might want to keep it if you dual boot. I guess Dell has some preOS media players loaded on some laptops. It also looks like some RAID controllers might make use of it. I suppose you could keep a security key there for an encrypted drive, though it would be easy to find.

Anyway, use hdparm to remove it when setting up a new install or the kernel parameter if you are concerned about possible data loss when upgrading. Don't partition your drives using the old IDE drivers if you want to keep the HPA.

----------

