# Udev / kernel 2.6.9-gentoo-r4 problem

## jgluckca

Hi

Sorry to repost this but I think I may have posted it to the wrong forum originally... (Usually someone answers my posts within a couple of hours. This time no answers at all)

Apologies if I'm not following an etiquette... But I think this is an important problem.

I have an interesting problem with udev and or the kernel.

FIrst some info:

2.6.9-gentoo-r4 kernel

udev 046

Hardware:

Pentium 4 with 1 Gig RAM on Asus P4PE motherboard. 2 x 120 Gig IDE hard drives on hda and hdb . Some other stuff but I don't think it's really relevant.

I have several partitions, one of which is a spare. I use the spare mainly for experimenting. It conatian all the system software and can also be mounted as a root partition.

My normal setup has mounts as follows:

dev/hdb3 on / type reiserfs (rw)

none on /proc type proc (rw)

none on /sys type sysfs (rw)

none on /dev type ramfs (rw)

none on /dev/pts type devpts (rw)

/dev/hda2 on /boot type ext2 (rw)

/dev/hdb1 on /home type reiserfs (rw)

/dev/hda1 on /win type ntfs (rw,noexec,nosuid,nodev)

none on /dev/shm type tmpfs (rw)

none on /proc/bus/usb type usbfs (rw)

If I want to run rhe alternate system, everything remains the same except the root partition is :

/dev/hda3

Now on the "normal" root (/dev/hdb3) the /dev directory was created when I installed the system originally from the live CD. In other words the /dev directory contains a bunch of device nodes. Normally, in the boot process, a ramfs gets mounted on /dev so the original device nodes are hidden.

On my "alternate" system, the /dev directory is empty, after all. udev is supposed to create the necessary device nodes. When I boot this alternate system, it finds and mounts the root partition, then there's a message saying it can't open an initial virtual console. After that, the screen is dead for a while. Then I get messages about my home partition having errors. Finally the boot is done. But the home partition is trashed. I can recover it with reiserfsck so it's not terribly serios.

It looks like what is happening is that udev is not started early enough in the boot process and the device nodes for the virtual consoles and the hard disk are not created before the mounts or at least the mount of my home partition is done.

I can't say for sure but it looks like there's a timing issue or a race condition. This is probably a kernel problem since udev responds to the kernel's detection of devices. It is also possible that the sequence of the init scripts have a problem

Whatever the problem, creating the device nodes in the /dev directory will fix the problem but that's not how it should work.

This is the output from dmesg. (sorry for the length) You can see at least part of the problem in this:

Linux version 2.6.9-gentoo-r4 (root@tachyon) (gcc version 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)) #1 Fri Nov 19 21:37:29 AST 2004

BIOS-provided physical RAM map:

BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)

BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)

BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)

BIOS-e820: 0000000000100000 - 000000003ffec000 (usable)

BIOS-e820: 000000003ffec000 - 000000003ffef000 (ACPI data)

BIOS-e820: 000000003ffef000 - 000000003ffff000 (reserved)

BIOS-e820: 000000003ffff000 - 0000000040000000 (ACPI NVS)

BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)

BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)

BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)

127MB HIGHMEM available.

896MB LOWMEM available.

On node 0 totalpages: 262124

DMA zone: 4096 pages, LIFO batch:1

Normal zone: 225280 pages, LIFO batch:16

HighMem zone: 32748 pages, LIFO batch:7

DMI 2.3 present.

ACPI: RSDP (v000 ASUS ) @ 0x000f5340

ACPI: RSDT (v001 ASUS P4PE 0x42302e31 MSFT 0x31313031) @ 0x3ffec000

ACPI: FADT (v001 ASUS P4PE 0x42302e31 MSFT 0x31313031) @ 0x3ffec0c0

ACPI: BOOT (v001 ASUS P4PE 0x42302e31 MSFT 0x31313031) @ 0x3ffec030

ACPI: MADT (v001 ASUS P4PE 0x42302e31 MSFT 0x31313031) @ 0x3ffec058

ACPI: DSDT (v001 ASUS P4PE 0x00001000 MSFT 0x0100000b) @ 0x00000000

ACPI: PM-Timer IO Port: 0xe408

ACPI: Local APIC address 0xfee00000

ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)

Processor #0 15:2 APIC version 20

ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])

ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])

IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23

ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl edge)

ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 22 low level)

ACPI: IRQ0 used by override.

ACPI: IRQ2 used by override.

Enabling APIC mode: Flat. Using 1 I/O APICs

Using ACPI (MADT) for SMP configuration information

Built 1 zonelists

Kernel command line: BOOT_IMAGE=2.6.9-r4-new ro root=303 init 3

Initializing CPU#0

PID hash table entries: 4096 (order: 12, 65536 bytes)

Detected 2700.248 MHz processor.

Using pmtmr for high-res timesource

Console: colour VGA+ 80x25

Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)

Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)

Memory: 1035856k/1048496k available (1817k kernel code, 12068k reserved, 806k data, 148k init, 130992k highmem)

Checking if this processor honours the WP bit even in supervisor mode... Ok.

Calibrating delay loop... 5357.56 BogoMIPS (lpj=2678784)

Security Scaffold v1.0.0 initialized

Mount-cache hash table entries: 512 (order: 0, 4096 bytes)

CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000

CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000

CPU: Trace cache: 12K uops, L1 D cache: 8K

CPU: L2 cache: 512K

CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080

Intel machine check architecture supported.

Intel machine check reporting enabled on CPU#0.

CPU0: Intel P4/Xeon Extended MCE MSRs (12) available

CPU0: Thermal monitoring enabled

CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz stepping 07

Enabling fast FPU save and restore... done.

Enabling unmasked SIMD FPU exception support... done.

Checking 'hlt' instruction... OK.

ENABLING IO-APIC IRQs

..TIMER: vector=0x31 pin1=2 pin2=-1

NET: Registered protocol family 16

PCI: PCI BIOS revision 2.10 entry at 0xf1e50, last bus=2

PCI: Using configuration type 1

mtrr: v2.0 (20020519)

ACPI: Subsystem revision 20040816

ACPI: Interpreter enabled

ACPI: Using IOAPIC for interrupt routing

ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)

ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.

ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)

ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)

ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.

ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.

ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.

ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.

ACPI: PCI Root Bridge [PCI0] (00:00)

PCI: Probing PCI hardware (bus 00)

PCI: Enabled i801 SMBus device

PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1

PCI: Transparent bridge - 0000:00:1e.0

ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]

ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]

ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI2._PRT]

Linux Plug and Play Support v0.97 (c) Adam Belay

PnPBIOS: Scanning system for PnP BIOS support...

PnPBIOS: Found PnP BIOS installation structure at 0xc00f9180

PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0x91b0, dseg 0xf0000

PnPBIOS: 15 nodes reported by PnP BIOS; 15 recorded by driver

PCI: Using ACPI for IRQ routing

ACPI: PCI interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 177

ACPI: PCI interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 185

ACPI: PCI interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 193

ACPI: PCI interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 201

ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 193

ACPI: PCI interrupt 0000:00:1f.3[B] -> GSI 17 (level, low) -> IRQ 209

ACPI: PCI interrupt 0000:00:1f.5[B] -> GSI 17 (level, low) -> IRQ 209

ACPI: PCI interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 177

ACPI: PCI interrupt 0000:02:05.0[A] -> GSI 20 (level, low) -> IRQ 217

ACPI: PCI interrupt 0000:02:0b.0[A] -> GSI 23 (level, low) -> IRQ 201

ACPI: PCI interrupt 0000:02:0b.2[B] -> GSI 20 (level, low) -> IRQ 217

pnp: the driver 'system' has been registered

pnp: match found with the PnP device '00:09' and the driver 'system'

pnp: match found with the PnP device '00:12' and the driver 'system'

pnp: 00:12: ioport range 0x290-0x297 has been reserved

pnp: 00:12: ioport range 0x3f0-0x3f1 has been reserved

pnp: 00:12: ioport range 0xe400-0xe47f could not be reserved

pnp: 00:12: ioport range 0xec00-0xec3f has been reserved

Simple Boot Flag at 0x3a set to 0x1

Machine check exception polling timer started.

audit: initializing netlink socket (disabled)

audit(1101855513.4294966773:0): initialized

highmem bounce pool size: 64 pages

Total HugeTLB memory allocated, 0

Initializing Cryptographic API

inotify init: minor=63

ACPI: Power Button (FF) [PWRF]

ACPI: Processor [CPU0] (supports C1)

isapnp: Scanning for PnP cards...

isapnp: No Plug & Play device found

serio: i8042 AUX port at 0x60,0x64 irq 12

serio: i8042 KBD port at 0x60,0x64 irq 1

Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled

ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A

ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A

pnp: the driver 'serial' has been registered

pnp: match found with the PnP device '00:02' and the driver 'serial'

pnp: match found with the PnP device '00:03' and the driver 'serial'

mice: PS/2 mouse device common for all mice

input: AT Translated Set 2 keyboard on isa0060/serio0

input: ImExPS/2 Generic Explorer Mouse on isa0060/serio1

Using anticipatory io scheduler

Floppy drive(s): fd0 is 1.44M

FDC 0 is a post-1991 82077

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2

ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx

ICH4: IDE controller at PCI slot 0000:00:1f.1

ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 193

ICH4: chipset revision 2

ICH4: not 100% native mode: will probe irqs later

ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA

ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA

Probing IDE interface ide0...

hda: WDC WD1200JB-00FUA0, ATA DISK drive

hdb: Maxtor 6Y120P0, ATA DISK drive

ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

Probing IDE interface ide1...

hdc: HL-DT-ST GCE-8523B, ATAPI CD/DVD-ROM drive

hdd: HL-DT-ST DVDRAM GSA-4040B, ATAPI CD/DVD-ROM drive

ide1 at 0x170-0x177,0x376 on irq 15

Probing IDE interface ide2...

ide2: Wait for ready failed before probe !

Probing IDE interface ide3...

ide3: Wait for ready failed before probe !

Probing IDE interface ide4...

ide4: Wait for ready failed before probe !

Probing IDE interface ide5...

ide5: Wait for ready failed before probe !

hda: max request size: 1024KiB

hda: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=16383/255/63, UDMA(100)

hda: cache flushes supported

hda: hda1 hda2 hda3

hdb: max request size: 128KiB

hdb: 240121728 sectors (122942 MB) w/7936KiB Cache, CHS=65535/16/63, UDMA(100)

hdb: cache flushes supported

hdb: hdb1 hdb2 hdb3

NET: Registered protocol family 2

IP: routing cache hash table of 8192 buckets, 64Kbytes

TCP: Hash tables configured (established 262144 bind 65536)

PM: Reading pmdisk image.

PM: Resume from disk failed.

ACPI: (supports S0 S1 S4 S5)

ACPI wakeup devices:

PCI0 PCI1 PCI2 UAR1 USB0 USB1 USB2 US20 AC97

BIOS EDD facility v0.16 2004-Jun-25, 2 devices found

ReiserFS: hda3: found reiserfs format "3.6" with standard journal

ReiserFS: hda3: using ordered data mode

ReiserFS: hda3: journal params: device hda3, size 8192, journal first block 18,max trans len 1024, max batch 900, max commit age 30, max trans age 30

ReiserFS: hda3: checking transaction log (hda3)

ReiserFS: hda3: Using r5 hash to sort names

VFS: Mounted root (reiserfs filesystem) readonly.

Freeing unused kernel memory: 148k freed

Warning: unable to open an initial console.

NET: Registered protocol family 1

Adding 2899724k swap on /dev/hdb2. Priority:-1 extents:1

i2c_adapter i2c-0: registered as adapter #0

pnp: the driver 'parport_pc' has been registered

pnp: match found with the PnP device '00:01' and the driver 'parport_pc'

parport: PnPBIOS parport detected.

parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,ECP,DMA]

lp0: using parport0 (interrupt-driven).

hdc: ATAPI 52X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)

Uniform CD-ROM driver Revision: 3.20

hdd: ATAPI 32X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache, UDMA(33)

EXT2-fs warning (device hda2): ext2_fill_super: mounting ext3 filesystem as ext2

ReiserFS: hdb1: found reiserfs format "3.6" with standard journal

ReiserFS: hdb1: using ordered data mode

ReiserFS: hdb1: journal params: device hdb1, size 8192, journal first block 18,max trans len 1024, max batch 900, max commit age 30, max trans age 30

ReiserFS: hdb1: checking transaction log (hdb1)

ReiserFS: warning: is_tree_node: node level 25938 does not match to the expected one 2

ReiserFS: hdb1: warning: vs-5150: search_by_key: invalid format found in block 8211. Fsck?

ReiserFS: hdb1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD]

ReiserFS: hdb1: Using r5 hash to sort names

NTFS driver 2.1.20 [Flags: R/W DEBUG MODULE].

NTFS volume version 3.1.

usbcore: registered new driver usbfs

usbcore: registered new driver hub

Real Time Clock Driver v1.12

----------

## dsd

http://www.gentoo.org/doc/en/udev-guide.xml#doc_chap3

----------

## jgluckca

Ok I went to :

http://www.gentoo.org/doc/en/udev-guide.xml#doc_chap3

And checked the base-layout. I'm running a baslayout that is later than the one suggested.

Then I ran the set of commands 

# mkdir test

# mount --bind / test

# cd test/dev

# ls

Now my system is screwed up!!!   Luckily I also have my laptop so I can post this.

After running the commands I couldn't shut the system down. I tried to unmount the test thing I tried the 3 finger salute. Nothing would get my system to shutdown. Finally I rebooted.

Now when I reboot I get compliants about no initial console and no init with a kernel panic.

Now how do I restore my system to working order.  By the way if you post something like this (I'm talking about the document), please also list the commands to undo what has been done and a warning that it can damage the system....

----------

## dsd

i dont really see what would have gone wrong if you followed the instructions (its a perfectly safe operation and many users have to do it). but you say you tried shutdown before unmounting? thats not what the instructions said...

what happened when you tried to unmount? what happened when you tried to shut down?

the message you get on bootup is the one you were suffering from anyway, right?

----------

## jgluckca

I tried a shutdown before unmounting. Nothing happened, it said it couldn't find shutdown.

Then I tried the unmount. It said it couldn't find unmount.

Then I tried ctrl-alt-del. Nothing happened. 

After witing a minute I tried all this agian. Still zilch - nothing.

The only option seemed to be  the reset button.

Then on reboot:

****  Kernel Panic -couldn't find init try appending init= to the bootline   ***

Might not be the exact wording but the idea is there. Ihave **never ever** had a kernel panic.

No matter what I do I can't get the system to boot that partition. The kernel always panics.

I am using baselayout 1.11.6-r1

John

----------

## dsd

strange .. most likely situation that i can think of is that you got the arguments in the wrong order and mounted something over your root partition

(and you do know that it is "umount" and not "unmount" right?)

what happens when you boot from a livecd and mount your partition? all you should need to do is create the missing nodes in dev from there..

----------

## jgluckca

Hi

Yep I know umount unmounts but it's good you asked anyway.

I checked the nodes in the dev directory on the disk  (not the ramfs mount on /dev) and they are

crw-rw----  1 root root 5, 1 Dec  1 15:51 console

crw-rw----  1 root root 1, 3 Dec  1 15:14 null

Which looks ok to me...

I also checked the /etc/fstab just to make sure it's ok and it is. 

Then I checked /etc/runlevels/boot /etc/runlevels/default and /etc/init.d  all that looks normal.

John

----------

## jgluckca

Hi

I figured out what happened to cause the kernel panic.

There are several directories erased, gone, went poof!!  but there was no filesystem corruption. This is reallly really strange.

/etc, /bin, /mnt, /lib, /opt  all mysteriously disappeared. No wonder the system won't start.

I wonder if there may be something wrong with Reiserfs that causes strange corruptions and deletions. I've used ext2 and ext3, both seem more robust. At least I've never had corruptions when the system failed to boot.

I did a reiserfsck on the patition and it showed no errors 

John

----------

## opm8

Just wanted to add my $.02 cents here.  This mysterious disappearance of /etc and /bin was exactly what happened to me.  I had my root partition as reiserfs, too.  Once after a reboot, it kernel panicked.  With a livecd I found out some critical directories had magically disappeared.  At first I though I got cracked in a major way, but this happened across a reboot, with the laptop being shut down over night.

I never did figure out what happened so I had to rebuild the thing from scratch.   :Evil or Very Mad: 

 *jgluckca wrote:*   

> Hi
> 
> I figured out what happened to cause the kernel panic.
> 
> There are several directories erased, gone, went poof!!  but there was no filesystem corruption. This is reallly really strange.
> ...

 

----------

## dsd

so what caused it? booting with no devfs and an empty /dev? or just bind mounting the partition?

----------

## kcsduke

Exactly the same thing happened to me as happened to jgluckca.  My system was totally screwed.  Having been a user and advocate of Gentoo for more than 2.5 years I hate to say this, but this is the end of the line for Gentoo on my machine.  These sorts of unexplained, random problems occur too often and I simply do not have time to deal with them anymore.

----------

## Blacky_of_Skye

Hello ! 

Just an idea, created by an thought and problem I had some sears ago . Could the disappearance of directries/files be an reiserfs problem ?

Some years ago the reiser filesystem screwed up some of my developmet files, almost all were gone, thanks to reiserfs.

I'm using since then ext2 and now ext3, no problems at all, now tried for fun jfs .

Just a thought . . . 

Greetings !

Roland

----------

