# ReiserFS 3.6 & Software RAID 0 Problem [SOLVED]

## Alphanos

Note that if you are reading this message in the future because you think have a problem similar to mine, or especially if you suspect that you actually do have a hardware problem, make sure that until you do some reading on what your options are you do NOT do ANYTHING to your hard drive(s).  If you must use the same computer it/they is/are connected to, power down the machine, make sure the switch on the back of your power supply is set to "o" (not "-"), and disconnect the power and data cables connecting to your hard drive(s).  After this is done, restore the power supply switch to "-" and boot your machine with a linux livecd like Knoppix in order to do some research on the internet.  Do not download the knoppix image to a drive that may be bad, however, as this could make the problem worse.  If you think you need to do what is described in the previous few sentences, do it NOW, and come back to read the rest of this when you aren't risking your drive(s).

This thread will hopefully serve two purposes: helping to fix my problem (I hope  :Smile:  !), and serving as a reference to help people in the future with problems.

EDIT: This problem is now considered solved.  This line is added so that people without problems of their own don't need to waste time reading through to solve an already solved problem.

Okay, this seems (to me at least) to be a pretty complicated problem.  I am putting a lot of information in this very long post, so if you don't want to read detailed output listings of various commands, scroll to the bottom of this post where I'll summarize my current progress.  Even if it does turn out that I'm merely missing a simple solution, I think the amount of stuff in here will be useful to people in the future.

Although this account will show me as a n00b, I've been using Gentoo now for two years.  I'm no expert by any means, since I still usually end up needing to check the forums once or twice a month to solve various emerge errors, but one thing I've learned quite well is being able to search for and find answers to my problems, usually here on the Gentoo forums, and sometimes with Google.  This is the first time I've encountered a serious problem that I've been unable to find an answer for and had need to post my question.

Two nights ago, for no apparent reason, my computer, which has been running Gentoo for two years, crashed.  I noticed that beep media player stopped playing the song it was on, so I moved to that virtual desktop (in kde) and tried to move it along.  No go, so I tried to "killall beep-media-player", followed by "killall -9 beep-media-player".  At this point X.org froze, so I tried to switch back to virtual terminal 1 with Ctrl+F1.  This caused video corruption on the screen, and I was unable to reach any usable terminal, so after trying Ctrl+Alt+Del a couple of times, in case it was responsive enough to respond by running reboot, I hit the power/reset button on my case.  Although rare, lockups like this have happened to me once or twice before, so I wasn't overly worried.

I was mistaken.  On reboot I encountered a kernel panic, as my main partition was unable to mount.  ReiserFS did not recognize the partition.  I run two drives in software RAID 0, and in case one or both of them was seriously borked (as in some of the horror stories I've read), the first thing I did was write down everything on the screen exactly as it appeared, in the hopes that that would help debug/solve the problem, and in the fear that I wouldn't be able to get error messages with the same level of detail in the future if a serious hardware problem existed.  I'm running with framebuffer and bootsplash, so I was unable to Ctrl+PageUp to check for more information (although since I've never had a kernel panic before, I don't even know if you can Ctrl+PageUp after a kernel panic).  One of the few things I can still think of to do, and will try after posting this message, is to try rebooting to yet another older kernel without bootsplash and framebuffer and see if I can get any more details from a Ctrl+PageUp (I keep copies of the previous 4 or 5 major kernel versions I've used on /boot in case of accidentally compiling a dud kernel).  After writing everything down, I rebooted in case it was some weird fluke from a power glitch or something.  No such luck.  I also tried rebooting to an older kernel in case my kernel or filesystem drivers somehow becamse corrupted, but I got the same (or very similar) error messages.  Here is the kernel boot output I wrote down:

```

devfs_mk_dev: could not append to parent for md/0  <<<=========================================== NOTE HERE

md: Autodetecting RAID arrays.

md: autorun ...

md: considering sdb3 ...

md:    adding sdb3 ...

md: sdb1 has different UUID to sdb3

md:    adding sda3 ...

md: sda1 has different UUID to sdb3

md: created md0

md: bind <sda3>

md: bind <sdb3>

md: running: <sdb3><sda3>

md0: setting max_sectors to 64, segment boundary to 16383

raid0: looking at sdb3

raid0:    comparing sdb3(154240000) with sdb3(154240000)

raid0:    END

raid0:    ==> UNIQUE

raid0: 1 zones

raid0: looking at sda3

raid0:    comparing sda3(154240000) with sdb3(154240000)

raid0:    EQUAL

raid0: FINAL 1 zones

raid0: done.

raid0: md_size is 308480000 blocks

raid0: conf->hash_spacing is 308480000 blocks

raid0: nb_zone is 1.

raid0: Allocating 4 bytes for hash.

md: considering sdb1 ...

md:    adding sdb1 ...

md:    adding sda1 ...

devfs_mk_dev: could not appent to parent for md/1 <<<=========================================== NOTE HERE

md: created md1

md: bind <sda1>

md: bind <sdb1>

md: running: <sdb1><sda1>

raid1: raid set md1 active with 2 out of 2 mirrors

md: ...autorun DONE

ReiserFS: md0: found reiserfs format "3.6" with standard journal

ReiserFS: md0: using ordered data mode

ReiserFS: md0: journal params: device md0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30

ReiserFS: md0: checking transaction log (md0)

attempt to access beyond end of device

md0: rw=0, want=3963883712, limit=616960000

ReiserFS: md0: replayed 6 transactions in 0 seconds

attempt to access beyond end of device

md0: rw=0, want=1594617672, limit=616960000

ReiserFS: md0: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occured trying to find statdata of [1 2 0x0 SD]

UDF-fs: No partition found (1)

XFS: bad magic number

XFS: SB validate failed

Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(9,0)

```

The bold tags are added; I will refer back to these two bolded lines later (EDIT: I found out bold tags don't work inside code tags, so I changed this to big arrows  :Smile:  ).  My two drives are Seagate Barracuda 7200.7 SATA 160GB drives.  Each has 3 partitions: the first a 50MB ext2 partition for RAID 1 /boot, the second 2GB swap (for total 4GB of swap), and the third filling the remainder of the driver for the main RAID 0 reiserfs root filesystem (which is currently non-functional).  Here is detailed partition information for the disks:

As reported by Seagate SeaTools:

```

Partition   Cylinder Start   Cylinder End   Size(MB)

a      0      5      49

b      6      254      2048

c      255      19456      157951

Disk Current Capacity: 16514064 sectors

Disk Total Capacity: 312581808

```

As reported by Knoppix's qtparted 0.4.4:

```

Drive Capacity: 152628 MB

Sectors: 312581808

         Partition

         a         b         c

Partition Number   01         02         03

Device         /dev/sda1      /dev/sda2      /dev/sda3

Type         ext2         swap         unknown

Size(MB)      47.03         1953.21         150625.06

Start(MB)      0.0307617      47.0654000      2000.2800000

End(MB)         47.0649000      2000.2800000      152625

```

Sorry for the format change here, I'm trying to avoid creating a horizontal scroll bar on this page.

And lastly, as reported by cfdisk:

```

Name   Flags   Part Type   FS Type            Size (MB)

-------------------------------------------------------------------------

sda1   Boot   Primary      Linux raid autodetect (FD)   49.36

sda2      Primary      Linux swap / Solaris  (82)   2048.10

sda3      Primary      Linux raid autodetect (FD)   157941.83

```

I know that these three tools appear to report some minor differences from each other, but all three tools agree that the partition tables are identical on both of the two drives, and I haven't made any changes to any partition tables since two years ago when this raid was set up just prior to my gentoo install.

Next, here's my /etc/raidtab:

```

# /boot (RAID 1)

raiddev                 /dev/md1

raid-level              1

nr-raid-disks           2

chunk-size              32

persistent-superblock   1

device                  /dev/hde1

raid-disk               0

device                  /dev/hdg1

raid-disk               1

# / (RAID 0)

raiddev                 /dev/md0

raid-level              0

nr-raid-disks           2

chunk-size              32

persistent-superblock   1

device                  /dev/hde3

raid-disk               0

device                  /dev/hdg3

raid-disk               1

```

Now there are a few potential sources of confusion here.  I cannot access my actual /etc/raidtab on my main partition, so this is the backed up version of what I used to set up my machine two years ago.  This is one of the extremely few vital system files I copied onto my raid 1 /boot partition, since if the raid ever failed I would still be able to access the copy of this file on either of the two drives individually in a raid 1 setup, and not have to recall my raid settings from memory!  My machine's first two ide channels are standard parallel ata ide channels, and use devices /dev/hda through /dev/hdd.  These are used for optical drives or left blank.  My hard drives are connected to the motherboard's serial ata ports.  Originally, linux treated these drives as being masters on ide channels 3 and 4, with no slaves, thus /dev/hde and /dev/hdg.  At some point since then, linux handling of sata devices changed, and the drives became /dev/sda and /dev/sdb, now being treated as scsi devices.  This happened a long time ago, and probably isn't the cause of my problem.  I'm fairly sure that I never actually changed my raidtab file to reference the new scsi devices, because I remember being surprised when I found out my drives weren't hde and hdg anymore, and wondering why my software raid had still worked upon reboot when my raidtab referred to old devices.  However, since I cannot access my /etc directory, I cannot verify this, so I might have changed the file to reference sda and sdb, but forgot.

I'm sure that someone will ask for my /etc/fstab, but I know for a fact that the copy on my /boot raid 1 which I still have access to is very outdated and wrong, and I can't read the real one right now.  Suffice it to say that while it is very vaguely possible that an fstab error is somehow involved, the chance is very slim since I've used the same fstab for months without trouble.

Several file postings have interrupted the chronology, but after the first couple of attempts to reboot, I tried booting up a Knoppix CD I had with the noswap option and, after "modprobe md" and editing the above raidtab to use sda and sdb (required for knoppix at least), I tried to start md0 (my main raid 0 device with the reiserfs filesystem) and mount it, to get the half-expected but disappointing "mount: wrong fs type, bad option, bad superblock on /dev/sda3, missing codepage or other error."  I tried reiserfsck, and got:

```

root@1[dev]# reiserfsck /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/md0

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

bread: Cannot read the block (2): (Invalid argument).

reiserfs_open: bread failed reading block 2

bread: Cannot read the block (16): (Invalid argument).

reiserfs_open: bread failed reading block 16

reiserfs_open: the reiserfs superblock cannot be found on /dev/md0.

Failed to open the filesystem.

If the partition table has not been changed, and the partition is

valid  and  it really  contains  a reiserfs  partition,  then the

superblock  is corrupted and you need to run this utility with

--rebuild-sb.

```

 :Shocked: !  That sounds bad!  Oddly enough, even after using gentoo linux for two years, prior to this I'd only ever used fsck once on my ext2 /boot partition when the init scripts told me to, and even then it fixed everything automatically.  Well, I knew enough about damaged disks and data recovery to know that running something like --rebuild-sb would be very dangerous until I knew more about the problem and the potential consequences of running the command.  I had made sure to start knoppix with the noswap option in case writing to my swap partitions damaged something further (like if my partition tables were corrupted; this was before I was able to get the three partition table program outputs listed above).

After reading some man pages and some web pages, I learned that even having the hard drives turned on had the potential to damage them further if there were hardware problems.  So I shut down the system, turned off the power, and unplugged the data and power cables from the drives before restarting knoppix to do some more research.

The most useful sites I found are listed later for people trying to fix their own problems.  Note that if you're following along trying to troubleshoot your own problem, make sure that, to be safe, each time you disconnect or reconnect a drive you fully switch off power to your system, using the power switch on the back of your power supply after the power button on the front of your case.  Modern power supplys and motherboards still transmit some power even when turned off, which is how things like wake-on-lan work.  I know from personal experience that plugging/unplugging the main atx power connector into a motherboard while the power supply is on can fry it  :Embarassed:  .  Such is the result of forgetting important things you already knew while frantically trying to make a broken system work again  :Wink:  .  In case you don't know, "-" means on and "o" means off on the power supply's power switch.

Next I tried to get an extremely rough estimate of the extent of the (possible) damage by connecting one hard drive at a time and attempting to boot(ie. first connecting drive 1, then disconnecting it and connecting drive 2).  I knew this was potentially a bad idea, since it could damage a hard drive further if there was physical damage, but I wanted to find out what ballpark I was playing in.  Both drives were able to load grub and the kernel individually, and basically get to roughly the same point in the boot process as having both drives connected before bombing out.  So I was able to rule out some kind catastrophic disk failure that rendered an entire drive inert.

At this point I once again disconnected both drives, and loaded knoppix's memtest86 boot module, since it occured to me that there was always a possibility that it was a RAM/CPU problem.  A full pass of memtest86 completed successfully, and I'd run longer memtest runs in the past, so I was satisfied that no major reliability change had occurred to to hardware age.  When I first set up my system I ran 10 simultaneous mprime95 torture tests for a while, so my machine is pretty stable.

The major remaining test, which I was both anticipating and dreading, was to check my drives for bad blocks with the utility provided by my hard drive manufacturer (in this case, Seagate SeaTools).  So I booted from the Seagate iso, and first ran the quicktest, thinking that if error messages filled the screen on a quicktest then I shouldn't try to spend 1.5 hours per drive trying to read everything, since it might be the case that vital data near currently bad blocks could only be read once.  No errors showed up though, to my relief, meaning that any hardware errors must be relatively minor (ie. I probably wasn't going to lose half my data or anything (hopefully  :Smile:  )).  3 hours later, the full surface scans had completed for both drives, and said that I had no errors!  This was met with a mix of relief and suspicion, since I had thought that my initial error messages had looked like they were indicating hardware errors.  To get a second opinion to verify the results, I booted back into Knoppix and used the /sbin/badblocks program I had found reference to at namesys's website (link listed lated on).  It agreed that everything on my drives could be read.  Great  :Very Happy:  !  Note that if you are trying to fix your own system and are certain that you have hardware errors, you should still at least check out the links at the bottom of this post, since I originally thought I had hardware problems, and so researched in detail how to recover as much as possible from physically damaged hard drives.

So, since I knew that my hard drives weren't physically broken, I tried to rebuild the superblock.  I considered waiting until I could backup everything, but decided that I would go ahead with --rebuild-sb and stop to wait for a backup if it told me I needed to use --rebuild-tree.  Trying "reiserfsck --fix-fixable /dev/md0" gave me the same error messages as --check (bread failing to read blocks, reiser superblock not found), and attempting to mount the device still told me I had the wrong filesystem type or bad superblock.  So, mentally crossing my fingers, I ran "reiserfsck --rebuild-sb /dev/md0" :

```

root@1[dev]# reiserfsck --rebuild-sb /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will check superblock and rebuild it if needed

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

bread: Cannot read the block (2): (Invalid argument).

reiserfs_open: bread failed reading block 2

bread: Cannot read the block (16): (Invalid argument).

reiserfs_open: bread failed reading block 16

reiserfs_open: the reiserfs superblock cannot be found on /dev/md0.

what the version of ReiserFS do you use[1-4]

        (1)   3.6.x

        (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, choose this one)

        (3) < 3.5.9 converted to new format (don't choose if unsure)

        (4) < 3.5.9 (this is very old format, don't choose if unsure)

        (X)   exit

1

Enter block size [4096]:

4096

root@1[dev]#      

```

Huh? Why did it quit?  I was surprised it was so non-verbose, but thought maybe I has entered the wrong block size.  I tried running it again, and it still gave the messages about not finding a superblock prior to asking me my reiserfs version, so I knew it hadn't silently succeeded.  I tried a few different block sizes, and ran it again, but it did the same thing every time.  I still don't yet know how much data (if any) it has actually written to my disks.  But I had the idea to check dmesg to see what was going on, and spotted a huge bunch of useful error messages!  I had thought I had checked the dmesg output earlier at some point, but I don't know how I could have missed these!  If someone wants me to post the entire dmesg output I will, but I'm fairly sure I've got the relevant bits.  Nonsequiter, but also remember, this is inside knoppix.

Messages about initializing the drives:

```

libata version 1.10 loaded.

ata_piix version 1.03

ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 18 (level, low) -> IRQ 18

PCI: Setting latency timer of device 0000:00:1f.2 to 64

ata1: SATA max UDMA/133 cmd 0xEFF0 ctl 0xEFE6 bmdma 0xEF60 irq 18

ata2: SATA max UDMA/133 cmd 0xEFA8 ctl 0xEFE2 bmdma 0xEF68 irq 18

ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:207f

ata1: dev 0 ATA, max UDMA/133, 312581808 sectors: lba48

ata1: dev 0 configured for UDMA/133

scsi0 : ata_piix

ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:207f

ata2: dev 0 ATA, max UDMA/133, 312581808 sectors: lba48

ata2: dev 0 configured for UDMA/133

scsi1 : ata_piix

  Vendor: ATA       Model: ST3160023AS       Rev: 3.05

  Type:   Direct-Access                      ANSI SCSI revision: 05

  Vendor: ATA       Model: ST3160023AS       Rev: 3.05

  Type:   Direct-Access                      ANSI SCSI revision: 05

SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)

SCSI device sda: drive cache: write back

SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)

SCSI device sda: drive cache: write back

 sda: sda1 sda2 sda3

Attached scsi disk sda at scsi0, channel 0, id 0, lun 0

SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB)

SCSI device sdb: drive cache: write back

SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB)

SCSI device sdb: drive cache: write back

 sdb: sdb1 sdb2 sdb3

Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0

```

I think these are just saying that they can't find weird pieces of hardware I don't have, but if I'm wrong then they're definately relevant:

```

seagate: ST0x/TMC-8xx not detected.

Failed initialization of WD-7000 SCSI card!

```

I think this message is unrelated, but just in case:  (two other lines included for context)

```

ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00023c009106acec]

[b]Warning: /proc/ide/hd?/settings interface is obsolete, and will be removed soon![/b]

ISO 9660 Extensions: Microsoft Joliet Level 3

```

This might be referring to something else, but the ICH5R is the intel southbridge that my motherboard's serial ata ports go through:

```

ichxrom: ichxrom_init_one(): Unable to register resource 0xffb80000-0xffffffff - kernel bug?

CFI: Found no ichxrom @fff80000 device at location zero

JEDEC: Found no ichxrom @fff80000 device at location zero

CFI: Found no ichxrom @fff80000 device at location zero

JEDEC: Found no ichxrom @fff80000 device at location zero

CFI: Found no ichxrom @fff80000 device at location zero

Found: PMC Pm49FL004

ichxrom @fff80000: Found 1 x8 devices at 0x0 in 8-bit bank

number of JEDEC chips: 1

cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.

hw_random hardware driver 1.0.0 loaded

```

And now the REALLY important stuff:

```

md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27

md: raidstart(pid 4362) used deprecated START_ARRAY ioctl. This will not be supported beyond 2.6

md: could not open unknown-block(33,3).

md: could not open unknown-block(34,3).

md: autorun ...

md: considering sda3 ...

md:  adding sda3 ...

md: created md0

md: bind<sda3>

md: running: <sda3>

md: raid0 personality registered as nr 2

md0: setting max_sectors to 64, segment boundary to 16383

raid0: looking at sda3

raid0:   comparing sda3(154240000) with sda3(154240000)

raid0:   END

raid0:   ==> UNIQUE

raid0: 1 zones

raid0: FINAL 1 zones

raid0: too few disks (1 of 2) - aborting!

md: pers->run() failed ...

md: do_md_run() returned -22

md: md0 stopped.

md: unbind<sda3>

md: export_rdev(sda3)

md: ... autorun DONE.

md: md0 stopped.

```

The first line shows up when I modprobe md, the big middle section shows up each time I use "raidstart /dev/md0" and the last line when I use "raidstop /dev/md0".  Now the whole problem is starting to make some more sense, such as reiserfs suddenly exiting instead of building the superblock, reporting that it can't read certain blocks on the raid device (since half of it is missing!), and, although this is only a guess, possibly the root problem.  This is where I recall your attention to the two bolded lines in the original error message posting back up at the top.  The first error messages I got were certainly different than the ones I get in knoppix, so I can't even be sure that the problems in my gentoo system and knoppix are the same thing, but the whole "attempt to read beyond end of device" would make a lot more sense if half of the device is missing!  I realize that depending on the exact behaviour of "reiserfs --rebuild-sb" and a half-raid 0, I could have corrupted my superblock by running --rebuild-sb when there was nothing wrong with it.  Hopefully, this isn't the case, but I would have more success rebuilding it in the future if i can at least put my device back together.

On the other hand, the "md: could not open unknown-block(33,3)." and "md: could not open unknown-block(34,3).", which are presumably preventing sdb3 from being added to the array, do kind of sound like hardware problems?  I don't know what to make of that.  And if for some reason other tools can read from my second drive, but md can't, this would partially explain the original problem, but not why it appeared so suddenly.

On yet another hand, maybe the messages indicate that blocks 33 and 34 which previously stored raid-level superblock data were overwritten by reiserfsck --rebuild-sb on the second drive.

This is getting very close to the point where I'm stuck and am asking for help.  I'm just going to list one or two other things I tried and then summarize what I know.  One gentoo forums post recommended using raid0run, which I tried, but as my /etc/raidtab shows, I do use a persistent superblock for raid devices.  At some earlier point before I had read about this super-superblock, which even the filesystem can't see, I tried reversing the order that the devices appeared in my /etc/raidtab, thinking maybe I had somehow swapped the cables during the disconnecting and reconnecting phase.  I've always been careful about making sure I don't do such a thing, but know I know that I was worrying for nothing, since if you use a persistent superblock in your raidtab then it stores records on each drive about which one it is and what order they should be in.  It has occurred to me, however, that if somehow this raid-level superblock became corrupted, it would explain how both gentoo and knoppix can no longer figure out my main partition.

In case it's relevant, here's the dmesg output produced by starting and stopping /dev/md1, the raid 1 /boot partition (sda1 and sdb1):

```

md: considering sdb1 ...

md:  adding sdb1 ...

md:  adding sda1 ...

md: created md1

md: bind<sda1>

md: bind<sdb1>

md: running: <sdb1><sda1>

md: raid1 personality registered as nr 3

raid1: raid set md1 active with 2 out of 2 mirrors

md: ... autorun DONE.

md: md1 stopped.

md: unbind<sdb1>

md: export_rdev(sdb1)

md: unbind<sda1>

md: export_rdev(sda1)

```

The more detailed stop message gives further confirmation that /dev/md0 isn't actually starting at all, since stopping it doesn't unbind or export anything, but starting it does.

Random other information about my system that didn't go anywhere else: I have a Pentium 4 2.8GHz processor with hyperthreading enabled (not a Prescott, I think this revision was the one called Northwood), 1GB of RAM (2x512MB sticks), and a Asus P4P800SE motherboard.  It might be worth mentioning that when I first built this system 2 years ago, I used an Abit IC7-G motherboard, but that is the one I previously mentioned that I had experience frying  :Embarassed:  , and I replaced it with the Asus motherboard, since at the time it was urgent I get my machine running again asap and I didn't have time to find an identical motherboard.  I have now been using the new Asus motherboard for 10 out of the 24 months this machine has been running, however, so this shouldn't be the cause of the problem.  I have a 480W Antec power supply, so there's no reason this should be caused by lack of power or anything.  On gentoo, I'm fairly sure I'm running the 2.6.11-gentoo kernel, but I'll check the exact version in grub and come back here to post it.

So, finally, a summary of what I know:

-The drives' master boot records are fine, since I can boot from either one and have grub and then the kernel load.

-The drives' partition tables are fine, or at least both changed absolutely identically in such a minute way that I can't notice a difference from the original setup.

-It isn't a hardware problem (at least of the bad-sectors-on-disk kind).

-ReiserFS can't read my main (3rd) partition, nor can any of its tools (ie. reiserfsck or debugreiserfs).

-Knoppix can't properly rebuild my raid device from my /etc/raidtab. (explaining at least in part why reiserfs can't read it)

Things I think are probably not the problem:

-Hardware failure.  Two different tools, one by Seagate and one by linux developers, both report that all sectors on both of my disks are good.  I can't imagine a processor/RAM/motherboard/power supply hardware error that would cause such specific and non-changing errors.

-Loose cables.  I can boot from either hard drive, and two tools say they could read any sector on either disk.

-Overheating.  My case is very well ventilated normally, with five case fans aside from the two in my power supply and my CPU fan.  Yes, that is a lot, and no, my machine isn't loud, because I use the "low" Panaflo 80mm fans  :Smile:  .  One of the case fans causes air to blow directly over the hard drives, and ever since the problem occurred I've had the side of the case open with a room fan blowing in.

-Electrostatic discharge.  I using a grounding wrist strap when I do fiddle with the insides, which I hadn't done recently prior to the problem.

-Weird emerges.  I can't remember anything I've emerged recently that would cause something like this.

-Changed partititons/raid setup.  I haven't changed either of these for two years.

-Kernel/module problems with raid options.  I've always been wary of making changes to my kernel config when upgrading kernels, so even when the layout of menuconfig has changed I've examined everything carefully.  Anyway, I'm not sure exactly when I last upgraded my kernel, but I know it hasn't been in the last few weeks.  I have all of the required raid options compiled inside my gentoo kernels, not as modules, and yes, I ran "modprobe md" once I got into knoppix.

-Outdated reiserfsprogs.  I'm currently running on Knoppix 3.9, which reports it has reiserfsprogs 3.6.19, the latest version.  I don't know what version I have on gentoo, but it's the latest non-masked package from portage.

Things I think could be the (current   :Question:  ) problem(guesses only, I'm still trying a figure this out):

-Corrupted RAID-level superblock.  Maybe one of the drives now thinks it is supposed to be the 37th drive out of 2, or something, and so the RAID 0 /dev/md0 device can't be constructed properly.

-Bad configuration.  Although I know that my raid setup hasn't changed in two years, and knoppix shows sda1, sda3, sdb1, and sdb3 as the four available hard disk partitions, maybe the /etc/raidtab needs to reference something else in order to get the device running in knoppix.  If this is the case, then the problem that appears when I try to boot Gentoo is probably different.

Things I think could possibly have caused the problem:

-Random one-time power variation.  I have a surge protector and a good quality power supply, but you never know.

-Temporary overheating.  I already mentioned that my ventilation is quite good, but it's always a possibility.

-Minor glibc version upgrade.  I think this unlikely, as it was only an upgrade from some version n to n-r1, but I mention it because glibc is pretty central to any system.

-A program I wrote.  I definately consider this a very remote possibility, but I can conceive of a very small C++ program I wrote recently somehow managing to corrupt something vital.  All it did was accept an filename argument from the commandline and run a bit mask on 512 bytes about 30k into the file.  The bit mask was "byte = (byte & 244) | 14".  I actually wrote it for the purposes of making some changes to a zsnes save state for Star Ocean which were beyond the capabilities of its internal memory editing.  However, I am 100% certain that I only ever passed the right argument to the program, and unless this somehow combined with the possible glibc thing mentioned above I can't imagine how it could write to a RAID-level superblock or the like.

Things I haven't done yet, but should (maybe) try:

-I'm going to try rebooting into gentoo again without framebuffer or bootsplash to see if I can get anything more useful using Ctrl+PageUp from my Gentoo kernel's error messages.  At the same time I'll write down the versions of all of the kernels I have saved on my /boot partition and try booting into them all.  I did try booting into 2.6.8 (I think that was the one I tried), but I have some older kernels on there like 2.6.3, so it's worth a shot to see if they give different error messages which are useful.

-I'll probably try fiddling with the raidtab in knoppix.  Maybe sda and sdb are the same as hde and hdg, but the raidtab can't use scsi devices?  That makes zero sense to me, since RAID is used more often with SCSI devices, and I've already tried using the hd devices in knoppix, but I'll try a few changes like that.

-I found an odd package available for download at namesys's ftp site.  Although the latest reiserfsprogs is 3.6.19, reiserutils 3.6.25 is listed there as existing three years earlier.  Barring a bunch of weird circumstances, this is probably a useless package, but I may download it to take a look.

-mkraid.  This is my last resort for the time being, and I'll only try it if I can back everything up and/or get assurances from some people here that it will work.  I know that using mkraid on partitions that already have data on them can destroy important stuff, but I think I remember during the initial setup of my system I had to reboot a few times, and didn't know about raidstart, so I used mkraid again without problem.  So if the raidtab is the same as it was in the original raid setup, I might end up doing something similar to reiserfsck --rebuild-sb on a lower level, and rewriting the raid-level superblock the same as it was before without damaging any data, since it would only write over top of exactly the same stuff it had written in the past.  If I do have a corrupted raid-level superblock, then this might be one of the few things that would work.

I know this post has been very long, so in advance I profusely thank any potential helpers who have even just read it  :Smile:  .  As mentioned above, I have a couple of things left that I want to try, but I'm pretty stuck now, so if you can think of something that would help I'd really appreciate it.  Also let me know if there's any other information about my system that you think would be useful.  If you started reading thinking you might be able to help, but now think you can't, I'd even appreciate a brief message like that so that I know people are reading this  :Smile:  .

----------

## Alphanos

These are the pages that I found while researching what to do with my bad ReiserFS that I found most helpful.  Although it appears as though I ended up not having a hardware problem after all, many of these pages deal with hardware problems because I feared I had one.  If you have a problem and are looking for more information to solve it, try reading all of these pages, as most of them are incomplete.  In other words, combining the information found on these pages is probably best.

Moderators, I'm not posting this as a separate post to spam, but because it seems to me to make the most organizational sense, making both posts clearer.  This way I don't interrupt my problem post with a bunch of links, and it can end with a summary of what I know about the problem.

Gentoo Forums post: Recovering ReiserFS data from a physically damaged hard drive.

Slightly different, possibly more complete page on some gentoo user's blog. (If the page number change for some reason, the title of this entry is "ReiserFS Disk Recovery")

A Debian user's failed attempt to recover data from a damaged drive. (Although he failed, this page is useful for the referral to dd_rescue, a dd-like program specifically made to recover data from damaged disks, and dd_rhelp, a script to help automate dd_rescue.  Also useful for a rough/preliminary guess at whether the errors you're getting are due to bad hardware, as his were, since he posts some of his error messages.)

And, of course the vital resource if you use ReiserFS, namesys, developers of ReiserFS.  Especially notable subpages include the ReiserFSCK man page, with an example of what options to use with the program and in what order, the debugreiserfs man page, a possibly useful tool to help get some information on your filesystem, and the very useful but obscure page describing how ReiserFS handles bad blocks on hard drives.  I would like to note that the purpose of this last page seems to be to describe how to keep using a disk with only a few bad blocks, rather than evacuating data off of a badly damaged disk as most of the previous links intend.  However, if you primarily want to find out whether or not your disk has a hardware or filesystem problem, as I did, then the "/sbin/badblocks" program described on the page can be used without knowing your block size just to see whether or not all of the parts of your disk can be read from.

Also a couple of useful bootable cds:

Knoppix, the standard bootable cd for most tasks, featuring a full linux system.

UBCD (Ultimate Boot CD), a boot cd specifically designed to help recover borked systems.  Features some useful tools not found in Knoppix, such as the software tools written by all of the main hard drive manufacturers to check for problems with their drives.  Although I burned a copy of this, I ended up not actually using it (yet), because their version of Seagate SeaTools was slightly outdated.  At bare minimum though, the project page is a great resource to help you find out what kinds of recovery tools exist.  Going to your drive manufacturer's site and checking out what tools you can download from them is a GREAT idea.  Usually you can also find information there about commercial data recovery services, which I'm glad I don't need, because they're darn expensive  :Shocked:  , and I couldn't afford them if I did need them!

----------

## Alphanos

I'm trying a few more things and making note of them here since I can't really save what I'm doing on my hard drives any more, writing them down would take too long, and they may be of use to someone.  I may continue to edit this post if I find something else useful.

```

root@0[md]# cat /proc/mdstat

Personalities : [raid0] [raid1]

unused devices: <none>

root@0[md]# raidstart /dev/md1

root@0[md]# cat /proc/mdstat

Personalities : [raid0] [raid1]

md1 : active raid1 sdb1[1] sda1[0]

      48064 blocks [2/2] [UU]

unused devices: <none>

root@0[md]# raidstart /dev/md0

root@0[md]# cat /proc/mdstat

Personalities : [raid0] [raid1]

md1 : active raid1 sdb1[1] sda1[0]

      48064 blocks [2/2] [UU]

unused devices: <none>

```

So raid 1 /boot is working, raid 0 / is definitely not.

```

root@0[proc]# cat /proc/scsi/scsi

Attached devices:

Host: scsi0 Channel: 00 Id: 00 Lun: 00

  Vendor: ATA      Model: ST3160023AS      Rev: 3.05

  Type:   Direct-Access                    ANSI SCSI revision: 05

Host: scsi1 Channel: 00 Id: 00 Lun: 00

  Vendor: ATA      Model: ST3160023AS      Rev: 3.05

  Type:   Direct-Access                    ANSI SCSI revision: 05

```

The first three devices below are optical drives.  This and the above confirm that, at least under knoppix, none of the hd? devices reference the hard disks.

```

root@0[md]# cat /proc/diskstats

   3    0 hda 2672 24591 109052 92868 0 0 0 0 0 89505 92868

  22    0 hdc 0 0 0 0 0 0 0 0 0 0 0

  22   64 hdd 0 0 0 0 0 0 0 0 0 0 0

   8    0 sda 28 816 1116 164 5 1 12 6 0 166 170

   8    1 sda1 346 356 6 12

   8    2 sda2 321 328 0 0

   8    3 sda3 164 328 0 0

   8   16 sdb 14 802 1064 69 0 0 0 0 0 66 69

   8   17 sdb1 320 320 0 0

   8   18 sdb2 321 328 0 0

   8   19 sdb3 164 328 0 0

   9    0 md0 0 0 0 0 0 0 0 0 0 0 0

   9    1 md1 0 0 0 0 0 0 0 0 0 0 0

```

```

root@0[md]# cat /proc/partitions

major minor  #blocks  name

   8     0  156290904 sda

   8     1      48163 sda1

   8     2    2000092 sda2

   8     3  154240065 sda3

   8    16  156290904 sdb

   8    17      48163 sdb1

   8    18    2000092 sdb2

   8    19  154240065 sdb3

 240     0    1966656 cloop0

```

I don't know what this below means, if anything, but in case it does I'm posting it  :Smile:  .

```

root@0[proc]# cat /proc/dma

 4: cascade

```

A few lines from /proc/modules:

```

raid1 18176 1 - Live 0xf94d3000

raid0 11136 0 - Live 0xf928a000

md 42576 2 raid1,raid0, Live 0xf94da000

md5 7680 1 - Live 0xf94af000

mtdpart 12288 1 cfi_cmdset_0002, Live 0xf8d5f000

ichxrom 8576 0 - Live 0xf8d54000

mtdcore 9800 3 mtdpart,ichxrom, Live 0xf8d50000

```

```

root@0[proc]# cat /proc/mtd

dev:    size   erasesize  name

mtd0: 00080000 00001000 "ichxrom @fff80000"

```

```

root@0[proc]# lsmod

Module                  Size  Used by

raid1                  18176  1

raid0                  11136  0

md                     42576  2 raid1,raid0

md5                     7680  1

ipv6                  234912  8

snd_mixer_oss          18688  0

snd                    46308  1 snd_mixer_oss

autofs4                18820  1

af_packet              20104  0

audio                  46080  1

emu10k1                72196  1

sound                  71812  1 emu10k1

soundcore              11104  6 snd,audio,emu10k1,sound

ac97_codec             20108  1 emu10k1

sk98lin               156128  0

via_rhine              23172  0

mii                     7808  1 via_rhine

intel_agp              22044  1

agpgart                30512  1 intel_agp

hw_random               8468  0

cfi_cmdset_0002        23296  1

cfi_util                7168  1 cfi_cmdset_0002

mtdpart                12288  1 cfi_cmdset_0002

jedec_probe            20352  0

cfi_probe               9984  0

gen_probe               6912  2 jedec_probe,cfi_probe

ichxrom                 8576  0

mtdcore                 9800  3 mtdpart,ichxrom

chipreg                 6656  3 jedec_probe,cfi_probe,ichxrom

map_funcs               5632  1 ichxrom

i2c_i801               11276  0

i2c_core               21248  1 i2c_i801

emu10k1_gp              6912  0

gameport                7552  1 emu10k1_gp

parport_pc             38596  0

parport                33480  1 parport_pc

8250                   41692  0

serial_core            21120  1 8250

tsdev                   9664  0

evdev                  11008  0

joydev                 11840  0

usbhid                 42176  0

pcmcia                 21776  0

yenta_socket           21896  0

rsrc_nonstatic         12160  1 yenta_socket

pcmcia_core            42272  3 pcmcia,yenta_socket,rsrc_nonstatic

video                  18308  0

thermal                14984  0

processor              24552  1 thermal

fan                     7300  0

container               7296  0

button                  7168  0

battery                12420  0

ac                      7556  0

genrtc                 12060  0

unionfs               109944  1

cloop                  18848  1

sbp2                   24456  0

ohci1394               33028  0

ieee1394              300600  2 sbp2,ohci1394

usb_storage            63296  0

ub                     18332  0

ohci_hcd               21896  0

uhci_hcd               31376  0

ehci_hcd               31752  0

usbcore               101496  8 audio,usbhid,usb_storage,ub,ohci_hcd,uhci_hcd,ehci_hcd

```

EDIT:

Highly useful info!  From several gentoo forum posts I got the impression that mdadm was an older tool which was replaced by mkraid/raidstart/raidstop.  Definitely not the case!

```

root@1[knoppix]# mdadm -v --query /dev/sda3

/dev/sda3: is not an md array

/dev/sda3: device 0 in 2 device undetected raid0 md0.  Use mdadm --examine for more detail.

root@1[knoppix]# mdadm -v --query /dev/sdb3

/dev/sdb3: is not an md array

/dev/sdb3: device 1 in 2 device undetected raid0 md0.  Use mdadm --examine for more detail.

root@1[knoppix]# mdadm -v --examine /dev/sda3

/dev/sda3:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : 2b5cfa4a:a8c28794:81f5f15f:b7117a7e

  Creation Time : Fri Aug 22 05:51:45 2003

     Raid Level : raid0

    Device Size : 154240000 (147.09 GiB 157.94 GB)

   Raid Devices : 2

  Total Devices : 2

Preferred Minor : 0

    Update Time : Tue Aug 16 21:43:06 2005

          State : active

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0

       Checksum : 41ccbf7b - correct

         Events : 0.3

     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State

this     0      33        3        0      active sync

   0     0      33        3        0      active sync

   1     1      34        3        1      active sync

root@1[knoppix]# mdadm -v --examine /dev/sdb3

/dev/sdb3:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : 2b5cfa4a:a8c28794:81f5f15f:b7117a7e

  Creation Time : Fri Aug 22 05:51:45 2003

     Raid Level : raid0

    Device Size : 154240000 (147.09 GiB 157.94 GB)

   Raid Devices : 2

  Total Devices : 2

Preferred Minor : 0

    Update Time : Tue Aug 16 21:43:06 2005

          State : active

 Active Devices : 2

Working Devices : 2

 Failed Devices : 0

  Spare Devices : 0

       Checksum : 41ccbf7e - correct

         Events : 0.3

     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State

this     1      34        3        1      active sync

   0     0      33        3        0      active sync

   1     1      34        3        1      active sync

```

This seems to indicate that the raid superblocks are intact on both drives, which (I think) is a good thing.

EDIT AGAIN:

/etc/mdadm/mdadm.conf:

```

DEVICE partitions

ARRAY /dev/md1

 level=1

 num-devices=2

 devices=/dev/?d?1

ARRAY /dev/md0

 level=0

 num-devices=2

 devices=/dev/?d?3

```

```

root@1[knoppix]# mdadm -v --assemble --scan

mdadm: looking for devices for /dev/md1

mdadm: /dev/sdb3 is not one of /dev/?d?1

mdadm: /dev/sdb2 is not one of /dev/?d?1

mdadm: /dev/sdb1 is identified as a member of /dev/md1, slot 1.

mdadm: /dev/sdb is not one of /dev/?d?1

mdadm: /dev/sda3 is not one of /dev/?d?1

mdadm: /dev/sda2 is not one of /dev/?d?1

mdadm: /dev/sda1 is identified as a member of /dev/md1, slot 0.

mdadm: /dev/sda is not one of /dev/?d?1

mdadm: added /dev/sdb1 to /dev/md1 as 1

mdadm: added /dev/sda1 to /dev/md1 as 0

mdadm: /dev/md1 has been started with 2 drives.

mdadm: looking for devices for /dev/md0

mdadm: /dev/sdb3 is identified as a member of /dev/md0, slot 1.

mdadm: /dev/sdb2 is not one of /dev/?d?3

mdadm: /dev/sdb1 is not one of /dev/?d?3

mdadm: /dev/sdb is not one of /dev/?d?3

mdadm: /dev/sda3 is identified as a member of /dev/md0, slot 0.

mdadm: /dev/sda2 is not one of /dev/?d?3

mdadm: /dev/sda1 is not one of /dev/?d?3

mdadm: /dev/sda is not one of /dev/?d?3

mdadm: added /dev/sdb3 to /dev/md0 as 1

mdadm: added /dev/sda3 to /dev/md0 as 0

mdadm: /dev/md0 has been started with 2 drives.

```

dmesg output:

```

md: md1 stopped.

md: bind<sdb1>

md: bind<sda1>

raid1: raid set md1 active with 2 out of 2 mirrors

md: md0 stopped.

md: bind<sdb3>

md: bind<sda3>

md0: setting max_sectors to 64, segment boundary to 16383

raid0: looking at sda3

raid0:   comparing sda3(154240000) with sda3(154240000)

raid0:   END

raid0:   ==> UNIQUE

raid0: 1 zones

raid0: looking at sdb3

raid0:   comparing sdb3(154240000) with sda3(154240000)

raid0:   EQUAL

raid0: FINAL 1 zones

raid0: done.

raid0 : md_size is 308480000 blocks.

raid0 : conf->hash_spacing is 308480000 blocks.

raid0 : nb_zone is 1.

raid0 : Allocating 4 bytes for hash.

```

Looks good  :Very Happy:  !

```

root@1[knoppix]# cat /proc/mdstat

Personalities : [raid0] [raid1]

md1 : active raid1 sda1[0] sdb1[1]

      48064 blocks [2/2] [UU]

md0 : active raid0 sda3[0] sdb3[1]

      308480000 blocks 32k chunks

unused devices: <none>

```

Excellent  :Very Happy:  !

```

root@1[knoppix]# reiserfsck --check /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/md0

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

reiserfs_open: the reiserfs superblock cannot be found on /dev/md0.

Failed to open the filesystem.

If the partition table has not been changed, and the partition is

valid  and  it really  contains  a reiserfs  partition,  then the

superblock  is corrupted and you need to run this utility with

--rebuild-sb.

```

Somewhat promising, it still can't find the superblock, but no longer gives messages saying it can't read certain blocks!

----------

## Alphanos

```

root@1[knoppix]# reiserfsck --rebuild-sb /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will check superblock and rebuild it if needed

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

reiserfs_open: the reiserfs superblock cannot be found on /dev/md0.

what the version of ReiserFS do you use[1-4]

        (1)   3.6.x

        (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, choose this one)

        (3) < 3.5.9 converted to new format (don't choose if unsure)

        (4) < 3.5.9 (this is very old format, don't choose if unsure)

        (X)   exit

1

Enter block size [4096]:

4096

No journal device was specified. (If journal is not available, re-run with --no-journal-available option specified).

Is journal default? (y/n)[y]: y

Did you use resizer(y/n)[n]: n

rebuild-sb: no uuid found, a new uuid was generated (762f3bdf-acb7-43f3-808b-8d111062222f)

rebuild-sb: You either have a corrupted journal or have just changed

the start of the partition with some partition table editor. If you are

sure that the start of the partition is ok, rebuild the journal header.

Do you want to rebuild the journal header? (y/n)[n]: n

root@1[knoppix]#

```

Almost there! (I hope  :Smile:  )

Can anyone give some advice?  I know that I didn't change the partition table with an editor, but should I buy an external hard drive to back up to before doing this?

----------

## Ma3oxuct

OK, I'll be honest, I only read 25% of your post and skimmed the rest.

Have you attempted to rebuild the superblocks/partitions with mdadm? I have done these commands without data loss in the past.

```
mdadm --create --verbose /dev/md# --raid-level=1 --raid-devices=2 /dev/sda# /dev/sdb#

of if a raid0 device:

mdadm --build /dev/md#  --raid-level=0 --raid-devices=2 /dev/sda# /dev/sdb#

```

Oh, and considering that your raid0 partitions have messed up superblocks just:

```
mdadm --create /dev/md#  --raid-level=0 --raid-devices=2 /dev/sda# /dev/sdb#
```

This probably won't help, but its all I can suggest.

----------

## Alphanos

Thanks for the reply, I was just able to do something similar a minute or two ago, so I think that your advice was right  :Very Happy:  .  mdadm was able to successfully assemble both my raid 1 /boot and my raid 0 root arrays, although I still don't understand why raidstart/raidtab couldn't get my raid 0 array running.  mdadm -v --examine reported that they did have valid raid superblocks, so unless mdadm silently fixed corruption in them raidstart/raidtab shouldn't have had trouble with them.  I still can't mount or reiserfsck --check md0, however, as it still reports that it can't find the reiserfs superblock.  

Looking back at the original kernel panic error message I got, it seems like it was able to read the superblock just prior to the first trouble, since it starts to go through the journal before dying.  At this point there's no way to tell whether the superblock was already corrupt, or if reiserfs broke it, causing the attempts to read past the end of device.  On some more recent attempts to boot Gentoo, ReiserFS is not able to read the superblock or the journal at all, but I don't know whether that is because the original kernel panic caused further corruption, or if it is a result of starting to run reiserfsck --rebuild-sb on half the raid 0 array.  It only got to the stage where it asks for the block size before dying, but I don't know when it starts to write.  Maybe if/when I can get the partition working again I'll have a better idea of whether this is something that namesys would be interested in as a bug report.

If they ever do read this post, something that I think would be a great idea is being able to save an undo file for --rebuild-sb, so that if you didn't change the partition table but are concerned something might have, you can try rebuilding the superblock to see if it works and undo the changes if it didn't.

----------

## Alphanos

Does anyone know enough about ReiserFS to be able to figure out a fairly rare bit pattern that would likely occur in my superblock?  I tried to start to look through the first 400k of my root partition with "lde --read-only-s 4096 -N 100 -B 0 /dev/md0 | less", but I don't know what I'm looking for.  Alternatively, are there specific locations that ReiserFS usually stores the superblock in? If I could look at those locations, maybe I could figure out whether there's a partially corrupted superblock there (in which case I could use reiserfsck --rebuild-sb), and if there's nothing there, that increases the chances that somehow my partitions got changed.

EDIT:

Okay, the first 64k of my ReiserFS filesystem on /dev/md0 is all zeros, from 0x00000000 to 0x0000FFFF.  Data starts appearing at 0x00010000.  Is this normal?

EDIT AGAIN:

Most of the data from 0x00010000 to 0x00018FEF seems like it consists of repeated patterns, except for some mozilla/firefox data  :Question:  , and then at 0x00018FF4 the data 52 65 49 73 45 72 4C 42 appears, which translates to the ascii "ReIsErLB".  I'm sure this is somehow significant, but I don't really know what it means.

EDIT AGAIN 2:

 *Archangel1 wrote:*   

> 
> 
> As I understand it, reiserfs' main advantage is in small files; this is because it's capable of storing small quantities data in the tree itself, rather than just a pointer to it.
> 
> Hence when you want to read bunches of small files, it can simply read through the tree, rather than reading part of the tree, then the file it points to, then the next part of the tree, etc. Given hdd seek times this results in a significant improvement.
> ...

 

[/quote]

I found this over here.  Maybe this means that the data I'm seeing with little bits of mozilla and firefox config files mixed in is part of the ReiserFS tree?  Can anyone confirm this?

EDIT AGAIN 3:

The 52 65 49 73 45 72 4C 42 = "ReIsErLB" appears again at 0x0001BFF4, and at 0x0001C034 the data 52 65 49 73 45 72 32 46 73 = "ReIsEr2Fs" appears.

EDIT AGAIN 4:

According to grep, 52 65 49 73 45 72 4C 42 = "ReIsErLB" appears 8 times in the first 400k of the filesystem, at 0x00018FF4, 0x0001BFF4, 0x00025FF4, 0x00039FF4, 0x00042FF4, 0x00049FF4, 0x00054FF4, and 0x0005CFF4.  52 65 49 73 45 72 32 46 73 = "ReIsEr2Fs" appears 7 times, at 0x0001C034, 0x00029034, 0x0003A034, 0x00045034, 0x0004A034, 0x00057034 and 0x0005D034.  I think searching for more would be redundant, but I'm taking this as a good sign until someone can explain further.

EDIT AGAIN 5:

Looking at the mathematical patterns between these numbers, "ReIsEr2Fs" appears either 64 (four times), 12352 (once), or 8256 (twice) bytes after "ReIsErLB".  I need to stop looking at this for the moment though, because at this point my interest isn't due to a belief that it will help recover my system, but but rather because it's kind of cool to look at some of the things filesystems do  :Smile:  .

----------

## Ma3oxuct

What is you status concerning 

```
reiserfsck --rebuild-sb /dev/md0
```

. Did you try rebiulding with that yet?

----------

## Alphanos

The furthest I went with it so far was this posting:

```

root@1[knoppix]# reiserfsck --rebuild-sb /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will check superblock and rebuild it if needed

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

reiserfs_open: the reiserfs superblock cannot be found on /dev/md0.

what the version of ReiserFS do you use[1-4]

        (1)   3.6.x

        (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, choose this one)

        (3) < 3.5.9 converted to new format (don't choose if unsure)

        (4) < 3.5.9 (this is very old format, don't choose if unsure)

        (X)   exit

1

Enter block size [4096]:

4096

No journal device was specified. (If journal is not available, re-run with --no-journal-available option specified).

Is journal default? (y/n)[y]: y

Did you use resizer(y/n)[n]: n

rebuild-sb: no uuid found, a new uuid was generated (762f3bdf-acb7-43f3-808b-8d111062222f)

rebuild-sb: You either have a corrupted journal or have just changed

the start of the partition with some partition table editor. If you are

sure that the start of the partition is ok, rebuild the journal header.

Do you want to rebuild the journal header? (y/n)[n]: n

root@1[knoppix]# 

```

I was mucking around with viewing the raw data on my filesystem because one of the pages at namesys mentions that if reiserfsck says you need to rebuild the superblock, it's possible that your partition slightly shifted and the superblock could be moved forward or back.  I was hoping to find some clear indication of a corrupted superblock in the right place (in which case I should certainly --rebuild-sb), or an uncorrupted superblock in the wrong place (in which case I should figure out how to fix my partitions).  Without being sure which is the problem, I'm concerned that going ahead with rebuilding the reiser superblock might end up causing more problems.  Also, I'm not 100% sure that my reiserfs block size is the default 4096 size, and was hoping to find a way to check whether that is the right block size.  If I can't get some kind of clear indications of answers for the above two questions, then I may wait until I can back up everything before going ahead with the superblock rebuild.

----------

## Ma3oxuct

```
rebuild-sb: You either have a corrupted journal or have just changed

the start of the partition with some partition table editor. If you are

sure that the start of the partition is ok, rebuild the journal header.

Do you want to rebuild the journal header? (y/n)[n]: n

root@1[knoppix]# 

```

Why dont you say y there?

----------

## Alphanos

My reasoning is that after this whole experience, I think I'm going to want to get a backup system set up anyway, so even if there's only a 0.1% that the partition table has been changed, I may just wait until I have the backup set up in order to be safe.  Also, if my block size isn't supposed to be 4096 then it would be good to have a backup so that I can try again.

----------

## Alphanos

Just a short time ago I was able to find this page describing the data structure of a ReiserFS partition in a great deal of detail  :Very Happy:  !  This is exactly what I was looking for, and produces a number of useful pieces of information:

-As I previously mentioned, the first 64k of my filesystem is all zeroed out, and data starts at byte 65536.  This is the exact case described as normal in the document, since I don't have a bootloader on this partition, and basically confirms that my partition tables have not been altered.

-Using the description of the superblock location and format, I can confirm that my reiserfs superblock is currently junk.  None of the data there appears to be correct, based on what I know about the filesystem and what was printed to the console just prior to my original kernel panic.

-It turns out that the searching I did with lde to view the raw data in my filesystem was useful after all.  "ReIsEr2Fs" is apparently the "magic number" which signifies a specific position within the filesystem superblock!  "ReIsErLB" is the "magic number" that signifies a specific position at the end of a transaction header in the journal.

-The document says that there should only be one superblock, whereas I found the superblock's magic number 7 times in disk space that should contain the journal.  Furthermore, 4 of those times, as I previously calculated in what I thought was just for fun, occurred 64 bytes after the journal transaction header's magic number.  What I think this signifies is that of the most recent journal transactions that occurred just before my original kernel panic, 4 of them involved writing to the superblock  :Question:   :Exclamation:   :Idea:  .  Based on the description of the contents of the superblock, it appears as though this in itself is probably not cause for concern, since the superblock contains data such as the number of free blocks in the filesystem and the height of the filesystem tree, which presumably change as data is written to disk.  This would explain in part how the original problem could have occurred, as a random power variation or vibration could conceivably cause a piece of bad data to be written to the superblock.  However, I know that at least some of the data in the superblock was still good just prior to the time of the original kernel panic, so I am still unclear on how the entire superblock was overwritten with wrong data, as it is now.

-I am now almost certain that my ReiserFS filesystem does NOT use the default 4096 block size.  Carefully interpreting one of the superblock copies I found in the journal said that it thought my filesystem block size was 16(k), or 16384.  This triggered my memory.  I had thought I had changed either the block size of my filesystem or the chunk size of my software raid from the defaults when I first set up my machine.  However, I couldn't remember which until now.  It was the ReiserFS block size.  I am 99% certain that my block size is one of 16384, 32768, or 65536.  I could check the other superblock copies in the journal to see if they all say that my block size is 16, or if they disagree.  I should also be able to get some indication of which it is from the facts that the filesystem's first bitmap occurs at byte 65536 + block size, and the journal starts at byte 65536 + 2xblock size.  While I don't really understand yet what the bitmap should look like, I do know that the journal starts with a transaction header, which contains the magic word "ReIsErLB" shortly in.  This should make the block size fairly clear.  And ... the first transaction header's magic word shows up at byte 102388, subtract 65536 to get 36848.  This number is supposed to be slightly more than twice the block size, so this calculation method agrees with the superblock copy I looked at earlier that the block size is 16384.  Hurrah!

So now that I have been able to confirm both my correct block size and the fact that my partitions are not out of alignment (through the statistical unlikelyhood of my partitions moving into a new alignment that shows all of this relevant reiserfs data in the right spot), I will attempt to rebuild my reiser superblock.  I'll be back to post the results soon  :Smile:  .

----------

## Alphanos

```

root@0[knoppix]# reiserfsck --rebuild-sb /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will check superblock and rebuild it if needed

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

reiserfs_open: the reiserfs superblock cannot be found on /dev/md0.

what the version of ReiserFS do you use[1-4]

        (1)   3.6.x

        (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, choose this one)

        (3) < 3.5.9 converted to new format (don't choose if unsure)

        (4) < 3.5.9 (this is very old format, don't choose if unsure)

        (X)   exit

1

Enter block size [4096]:

16384

rebuild_sb: wrong block size specified, only divisible by 1024 are supported currently

root@0[knoppix]#

```

 :Shocked:  Hmm, interesting math that reiserfsck uses there.  And here I thought that 16384 was divisible by 1024.  I also tried 16, which gave the same error message before quitting.  This last bit here, I think, is definitely something I should email namesys about.  I don't know much about reiserfsck's internal workings, but I can be pretty sure it's intended to divide correctly  :Smile: .

As a side note, it's probably a good thing that I didn't try to rebuild my filesystem's journal earlier, because it was looking in the wrong place when it was trying to find it, 24 kilobytes too early (although there could still be corruption inside my journal, I don't know yet).  It seems to me that it would be a good idea for reiserfsck --rebuild-sb to be able to figure out your block size the way I did, by comparing the position of the first journal transaction header magic number to the superblock.  I guess they can't really do that though since there are so many things you can do with the journal, like having it stored externally.  Also I suppose it can't assume that your journal is uncorrupted.  Oh well.

----------

## Ma3oxuct

 :Shocked:  WoW.

I thought 16384 would the magic number according to the post before your last!

Let us know what namesys says. 16384/1024=16!

Very nice read by the way, I've learned a lot from your post.

----------

## Alphanos

Well, although I found several pieces of evidence to support the idea of my block size being 16k, it turns out it isn't after all, so it's a good thing reiserfsck didn't let me rebuild the superblock with that block size  :Embarassed:  .  I emailed the namesys mailing list, and was told that the error message isn't quite correct; block sizes from 512 bytes to 8192 bytes can be used, and sizes larger than 512 must be divisible by 1024.  I don't know whether something like 3072 can be used, or if is must be "even".

At first when I read the email I thought there must be some mistake, but I thought further and rechecked the superblock copies in my journal.  "lde --read-only -s 4096 -N 256 -B 16 /dev/md0 | grep -B 1 "ReIsEr2Fs"" prints out a big list of pieces of my superblock.  Here's one of them to demonstrate:

```

0x000F4020  84 03 00 00 1E 00 00 00 : 00 00 00 00 00 10 CC 03  ................

0x000F4030  06 03 02 00 52 65 49 73 : 45 72 32 46 73 00 00 00  ....ReIsEr2Fs...

```

I only printed out two lines, the one with the magic number (which I was searching for), and the previous line, which contains the block size.  In the first line, the last two bytes, CC 03, are the "OID Max Size".  I have no idea what that means.  But the previous two bytes, "00 10", are the block size.  All copies of the superblock I could find in my journal agreed the the block size was 00 10.  Last night, I interpreted this as 16, with the kilobyte inferred (ie, the block size could range from a minimum of 1k to a maximum of 65535k).  The location of the first journal transaction header supported this interpretation.  However, namesys told me that the block size actually ranges from 512 bytes, or 0.5k to 8k.  Well, my previous interpretation of the block size wouldn't allow for a block size of 512 bytes, so I was doing something wrong.   :Idea:  Aha, the least significant byte must be listed first!  This means that "00 10" would be "10 00", or 4096.  Since this is supposed to be the default block size, that's a very reasonable number.  But what about the first journal transaction header that appeared at the right spot for a block size of 16384?  I mentioned offhandedly at the end of my previous post that anything designed to fix a corrupt superblock can't assume that the journal, which appears so soon afterwars, is uncorrupted.  It turns out that was something I should have thought more about, and I also should have recalled that a block size of 4096 put into reiserfs --rebuild-sb told me that I had a corrupt journal!  Since 16384 is evenly divisible by 4096, it seems as though what occurred is that the first eight journal entries are corrupted, which is why the magic number "ReIsErLB" doesn't appear.   If this is correct, then the journal transaction which I thought was the first is actually the 7th.  I should have realized that something was off last night based on the fact that some readable text from some config file appeared after the superblock but where I thought was before the start of the journal.

So, I think the moral of the story is that if you discover some amazing way to fix your computer in the middle of the night, you should go to sleep first and consider in the morning whether it's really correct  :Embarassed:   :Smile: .  Hypotheses, even with substantial evidence to support them, may be incorrect if there are other interpretations of the evidence  :Smile: .

Now that I have determined the correct block size (for a second time?), I will attempt to rebuild the superblock, and post back here when I get some kind of result.

----------

## Alphanos

Superblock rebuild output:

```

root@0[Desktop]# reiserfsck --rebuild-sb /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will check superblock and rebuild it if needed

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

reiserfs_open: the reiserfs superblock cannot be found on /dev/md0.

what the version of ReiserFS do you use[1-4]

        (1)   3.6.x

        (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, choose this one)

        (3) < 3.5.9 converted to new format (don't choose if unsure)

        (4) < 3.5.9 (this is very old format, don't choose if unsure)

        (X)   exit

1

Enter block size [4096]:

4096

No journal device was specified. (If journal is not available, re-run with --no-journal-available option specified).

Is journal default? (y/n)[y]: y

Did you use resizer(y/n)[n]: n

rebuild-sb: no uuid found, a new uuid was generated (3f1e85f7-b611-437e-a145-0560b0e9350d)

rebuild-sb: You either have a corrupted journal or have just changed

the start of the partition with some partition table editor. If you are

sure that the start of the partition is ok, rebuild the journal header.

Do you want to rebuild the journal header? (y/n)[n]: y

Reiserfs super block in block 16 on 0x900 of format 3.6 with standard journal

Count of blocks on the device: 77120000

Number of bitmaps: 2354

Blocksize: 4096

Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 0

Root block: 0

Filesystem is NOT clean

Tree height: 0

Hash function used to sort names: not set

Objectid map size 0, max 972

Journal parameters:

        Device [0x0]

        Magic [0x0]

        Size 8193 blocks (including 1 for journal header) (first block 18)

        Max transaction length 1024 blocks

        Max batch size 900 blocks

        Max commit age 30

Blocks reserved by journal: 0

Fs state field: 0x1:

         some corruptions exist.

sb_version: 2

inode generation number: 0

UUID: 3f1e85f7-b611-437e-a145-0560b0e9350d

LABEL:

Set flags in SB:

Is this ok ? (y/n)[n]: y

The fs may still be unconsistent. Run reiserfsck --check.

root@0[Desktop]#

```

----------

## Alphanos

Filesystem check:

```

root@0[Desktop]# reiserfsck --check /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/md0

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Fri Aug 19 18:38:30 2005

###########

Replaying journal..

Reiserfs journal '/dev/md0' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..

Bad root block 0. (--rebuild-tree did not complete)

Aborted

root@0[Desktop]# reiserfsck --fix-fixable /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will check consistency of the filesystem on /dev/md0

and will fix what can be fixed without --rebuild-tree

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --fix-fixable started at Fri Aug 19 18:39:15 2005

###########

Replaying journal..

Reiserfs journal '/dev/md0' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..

Bad root block 0. (--rebuild-tree did not complete)

Aborted

root@0[Desktop]# 

```

So, it looks like I have to rebuild the tree.  :Confused:   :Sad: 

Wish me luck  :Smile:  .

----------

## Alphanos

I had an idea, I'll post back if it works, but I need to store this data somewhere:

```

0x00010000  00 C2 98 04 00 C2 98 04 : 00 00 00 00 12 00 00 00  ................

0x00010010  00 00 00 00 00 20 00 00 : 00 04 00 00 00 00 00 00  ..... ..........

0x00010020  84 03 00 00 1E 00 00 00 : 00 00 00 00 00 10 CC 03  ................

0x00010030  00 00 02 00 52 65 49 73 : 45 72 32 46 73 00 03 00  ....ReIsEr2Fs...

0x00010040  00 00 00 00 FF FF 32 09 : 02 00 00 00 00 00 00 00  ......2.........

0x00010050  00 00 00 00 3F 1E 85 F7 : B6 11 43 7E A1 45 05 60  ....?.....C~.E.`

0x00010060  B0 E9 35 0D 00 00 00 00 : 00 00 00 00 00 00 00 00  ..5.............

```

EDIT: Looks like it may pay off.

```

0x003B5000  00 C2 98 04 7F CC 39 00 : B7 93 0A 00 12 00 00 00  ......9.........

0x003B5010  00 00 00 00 00 20 00 00 : 00 04 00 00 4F 65 11 7D  ..... ......Oe.}

0x003B5020  84 03 00 00 1E 00 00 00 : 00 00 00 00 00 10 CC 03  ................

0x003B5030  12 03 02 00 52 65 49 73 : 45 72 32 46 73 00 00 00  ....ReIsEr2Fs...

0x003B5040  03 00 00 00 05 00 32 09 : 02 00 00 00 D1 EE 97 02  ......2.........

0x003B5050  01 00 00 00 27 6F 0B 2F : CC 26 4E 10 B0 66 1F 6A  ....'o./.&N..f.j

0x003B5060  80 D0 93 6D 00 00 00 00 : 00 00 00 00 00 00 00 00  ...m............

```

The first code listing is my new, rebuilt superblock.  This second code listing is the most recent superblock from my journal prior to the filesystem's failure.  I know this because the 4 bytes at the end of the line just after the magic number form a number which is only incremented, never decremented, as time passes (the "Inode Generation").  Now, it should be the case that all attempts to write to the superblock show up in the journal, and disappear only if other transactions overwrite them.  Since there are a bunch of superblock copies in my journal, I know that many attempts to write to it were made relatively recently.  There is potential for concern since the first 6 or 7 journal entries are apparently corrupt, but the ReiserFS structure document I posted the link to a while back says that the jounal operates on a wraparound principle (ie. it writes sequentially, but when it reaches the end it returns to the beginning and starts to overwrite transactions there).  Since this superblock which I found as being the most recent occurs in the middle of the journal, and the superblock copy after it has the lowest Inode generation, I can verify that this is the most recent superblock prior to my filesystem crashing.  Since it isn't anywhere near the actual superblock or the start of the journal, the chances of it being corrupted aren't too high, and after a cursory inspection by me, it looks okay (even though I didn't know anything about this stuff 24 hours ago  :Smile:  ).  So I'll try to replace the superblock with this one, and see what reiserfsck --check thinks about it.  If it doesn't help, I'll probably have to use the rebuilt superblock and run --rebuild-tree.  But if using this superblock shows the filesystem as clean, then I'll be done  :Smile: .

----------

## Alphanos

Well, it didn't work, and I still have to rebuild the tree, but I think it was a good (if dangerous) idea.  I was offline for a while after I found out it didn't work because I had to shut down the computer and wait for a lightning storm to pass  :Smile: .

Ma3oxuct, I appreciate your complement on my earlier post  :Smile:  .  I just hope that given the mistakes I've made, I'm not teaching people incorrect things  :Confused:  .

----------

## Alphanos

I can't delete this post any more.  I think too much time has elapsed.  Any moderator or administrator who reads this, feel free to delete this post  :Smile:  .

----------

## Alphanos

```

####### Pass 0 #######

block 1073051: The number of items (9) is incorrect, should be (1) - corrected

block 1073051: The free space (68) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 1073051, item (0): Unknown item type found [65546 1638427 0x5004001c ??? (15)] - deleted

block 1235434: The number of items (9) is incorrect, should be (1) - corrected

block 1235434: The free space (68) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 1235434, item (0): Unknown item type found [65546 1638427 0x5004001c ??? (15)] - deleted

block 2287679: The number of items (33807) is incorrect, should be (1) - corrected

block 2287679: The free space (2936) is incorrect, should be (3792) - corrected

pass0: vpf-10110: block 2287679, item (0): Unknown item type found [65538 189891599 0x1000014d9e90000 ??? (8)] - deleted

block 4843699: The number of items (61440) is incorrect, should be (1) - corrected

block 4843699: The free space (256) is incorrect, should be (208) - corrected

pass0: vpf-10110: block 4843699, item (0): Unknown item type found [1048817 1052416 0x1000001f000010f ??? (14)] - deleted

block 5404484: The number of items (3856) is incorrect, should be (1) - corrected

block 5404484: The free space (0) is incorrect, should be (3792) - corrected

pass0: vpf-10150: block 5404484: item 0: Wrong key [0 15728656 0x0 SD (0)], deleted

block 5406189: The number of items (4080) is incorrect, should be (1) - corrected

block 5406189: The free space (3841) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 5406189, item (0): Unknown item type found [284164143 65536 0xf1000 ??? (15)] - deleted

block 5425509: The number of items (256) is incorrect, should be (1) - corrected

block 5425509: The free space (0) is incorrect, should be (3792) - corrected

pass0: vpf-10150: block 5425509: item 0: Wrong key [0 1 0xf0 SD (0)], deleted

block 5427901: The number of items (61440) is incorrect, should be (1) - corrected

block 5427901: The free space (0) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 5427901, item (0): Unknown item type found [1114367 1048831 0xf00010 ??? (15)] - deleted

block 5459665: The number of items (8192) is incorrect, should be (1) - corrected

block 5459665: The free space (256) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 5459665, item (0): Unknown item type found [16839152 15728641 0xf0f ??? (15)] - deleted

block 5459726: The number of items (61456) is incorrect, should be (1) - corrected

block 5459726: The free space (271) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 5459726, item (0): Unknown item type found [271 983536 0xf0f ??? (15)] - deleted

block 7284674: The number of items (5) is incorrect, should be (1) - corrected

block 7284674: The free space (961) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 7284674, item (0): Unknown item type found [16456 28 0x30001 ??? (15)] - deleted

block 24768434: The number of items (1) is incorrect, should be (0) - corrected

block 24768434: The free space (0) is incorrect, should be (4072) - corrected

block 25053195: The number of items (1) is incorrect, should be (0) - corrected

block 25053195: The free space (14) is incorrect, should be (4072) - corrected

block 30475375: The number of items (41888) is incorrect, should be (1) - corrected

block 30475375: The free space (26535) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 30475375, item (0): Unknown item type found [676000 4294439920 0x55554057560101b ??? (5)] - deleted

block 37693071: The number of items (2) is incorrect, should be (1) - corrected

block 37693071: The free space (0) is incorrect, should be (4048) - corrected

pass0: vpf-10110: block 37693071, item (0): Unknown item type found [257 202752 0x1600000 ??? (15)] - deleted

block 39380515: The number of items (18) is incorrect, should be (1) - corrected

block 39380515: The free space (41) is incorrect, should be (3792) - corrected

pass0: vpf-10110: block 39380515, item (0): Unknown item type found [45118 134217984 0x6000000 ??? (15)] - deleted

block 65672089: The number of items (9) is incorrect, should be (0) - corrected

block 65672089: The free space (2) is incorrect, should be (4072) - corrected

1591976 directory entries were hashed with "r5" hash.

####### Pass 1 #######

####### Pass 2 #######

####### Pass 3 #########

rebuild_semantic_pass: The entry [9124750 9124751] ("CC") in directory [11235377 9124750] points to nowhere - is removed

rebuild_semantic_pass: The entry [9124750 7575012] ("COUNTER") in directory [11235377 9124750] points to nowhere - is removed

vpf-10650: The directory [11235377 9124750] has the wrong size in the StatData (912) - corrected to (864)

rebuild_semantic_pass: The entry [9124680 9124681] ("temp") in directory [67643 9124680] points to nowhere - is removed

vpf-10650: The directory [67643 9124680] has the wrong size in the StatData (72) - corrected to (48)

rebuild_semantic_pass: The entry [9124701 9124726] ("gconfd-root") in directory [9124597 9124701] points to nowhere - is removed

rebuild_semantic_pass: The entry [9124701 583949] ("orbit-root") in directory [9124597 9124701] points to nowhere - is removed

rebuild_semantic_pass: The entry [9124701 9124706] ("environment") in directory [9124597 9124701] points to nowhere - is removed

rebuild_semantic_pass: The entry [9124701 9124705] ("eclass-debug.log") in directory [9124597 9124701] points to nowhere - is removed

vpf-10650: The directory [9124597 9124701] has the wrong size in the StatData (176) - corrected to (48)

rebuild_semantic_pass: The entry [9124676 9124677] ("temp") in directory [67643 9124676] points to nowhere - is removed

vpf-10650: The directory [67643 9124676] has the wrong size in the StatData (72) - corrected to (48)

rebuild_semantic_pass: The entry [9124600 9124603] ("environment") in directory [9124595 9124600] points to nowhere - is removed

vpf-10650: The directory [9124595 9124600] has the wrong size in the StatData (112) - corrected to (80)

rebuild_semantic_pass: The entry [9124615 9124617] ("temp") in directory [67643 9124615] points to nowhere - is removed

vpf-10650: The directory [67643 9124615] has the wrong size in the StatData (72) - corrected to (48)

rebuild_semantic_pass: The entry [9124725 9124737] ("htop") in directory [368 1394] points to nowhere - is removed

vpf-10650: The directory [368 1394] has the wrong size in the StatData (77368) - corrected to (77344)

rebuild_semantic_pass: The entry [9124725 9124744] ("README.gz") in directory [1587 9124793] points to nowhere - is removed

rebuild_semantic_pass: The entry [9124725 9124747] ("TODO.gz") in directory [1587 9124793] points to nowhere - is removed

rebuild_semantic_pass: The entry [9124725 9124746] ("ChangeLog.gz") in directory [1587 9124793] points to nowhere - is removed

vpf-10650: The directory [1587 9124793] has the wrong size in the StatData (136) - corrected to (48)

rebuild_semantic_pass: The entry [9124725 9124743] ("htop.1.gz") in directory [1585 1229] points to nowhere - is removed

vpf-10650: The directory [1585 1229] has the wrong size in the StatData (69256) - corrected to (69224)

rebuild_semantic_pass: The entry [9222 9124568] (".htoprc") in directory [9222 2532740] points to nowhere - is removed

rebuild_semantic_pass: The entry [9222 8509170] (".DCOPserver_DeepThought_:0") in directory [9222 2532740] points to nowhere - is removed

rebuild_semantic_pass: The entry [9222 8509169] (".DCOPserver_DeepThought__0") in directory [9222 2532740] points to nowhere - is removed

rebuild_semantic_pass: The entry [9222 8510256] (".mcoprc") in directory [9222 2532740] points to nowhere - is removed

rebuild_semantic_pass: The entry [9222 8509490] (".ICEauthority") in directory [9222 2532740] points to nowhere - is removed

vpf-10670: The file [6510800 8851992] has the wrong size in the StatData (3014656) - corrected to (3145728)

rebuild_semantic_pass: The entry [9222 9124610] (".Azureus") in directory [9222 2532740] points to nowhere - is removed

rebuild_semantic_pass: The entry [9222 8509277] (".fonts.cache-1") in directory [9222 2532740] points to nowhere - is removed

vpf-10680: The directory [9222 2532740] has the wrong block count in the StatData (11) - corrected to (10)

vpf-10650: The directory [9222 2532740] has the wrong size in the StatData (5296) - corrected to (5064)

####### Pass 3a (lost+found pass) #########

```

So it finished rebuilding the tree.  I need to run reiserfsck --check now, but I'm posting the entire log file in case of a power outage, and I'll delete my previous post with the partial log file.

Personally, I find it slightly unsettling that it looks like when it finds something and doesn't know what it is, it deletes it  :Confused:  . Hopefully that isn't actually as bad as it sounds.

EDIT: 

```

root@0[Desktop]# reiserfsck --check --logfile ./check.log /dev/md0

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to reiserfs-list@namesys.com, **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/md0

Will put log info to './check.log'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Fri Aug 19 21:59:25 2005

###########

Replaying journal..

Reiserfs journal '/dev/md0' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

        Leaves 134980

        Internal nodes 914

        Directories 119560

        Other files 1472420

        Data block pointers 73816245 (632538 of them are zero)

        Safe links 0

###########

reiserfsck finished at Fri Aug 19 22:11:51 2005

###########

root@0[Desktop]# mount -t reiserfs /dev/md0 /mnt/md0

root@0[Desktop]#

```

reiserfsck --check finished with no errors, and the filesystem mounted.  There are 24 items (2.1MB) in my lost+found directory.  At this point I'm going to consider the problem solved, and I won't post back to this thread unless I see someone post a question I think I can help answer  :Very Happy:  .

----------

## Ma3oxuct

 *Alphanos2 wrote:*   

> I can't delete this post any more.  I think too much time has elapsed.  Any moderator or administrator who reads this, feel free to delete this post  .

 

Why do that? I think that it is excellent for people to see potential mistakes (even if they seem innocent) when they are going around doing this type of stuff. I'll tell you one thing Alphanos, if I ever have problems with my superblock, I'll be heading for this thread first  :Smile: .

BTW, I'm glad that you got your patition fixed. At least you did not do all of that work for nothing.

----------

## Alphanos

Oh, sorry for the confusion.  The post that I was trying to delete originally contained about half of the log of "reiserfsck --rebuild-tree".  I posted the first bit of the log part-way through in case there was a power failure and I needed to know what was in the log.  Since I was rebuilding the tree from knoppix, the log was being saved to ramdisk, which would have lost everything if I lost power.  Earlier in the afternoon there had been a lightning storm which caused some brief blackouts, and I waited until it passed before starting to rebuild the tree, but I later heard some far-off thunder and began worrying.  After the rebuild tree finished, as well as checking the newly corrected filesystem, I wanted to use a new post to say that I was considering the problem solved, so I posted the completed log in that post and intended to delete the post with only the first half of the log  :Smile:  .

I've tried to post everything I've done in this thread, even the more embarrassing mistakes (like the whole 16k block size issue  :Embarassed:  ), in the hope that it will help people in the future who have similar filesystem problems  :Smile:  .

Well, my system has been back up and running since just after my previous post last night.  Konqueror thinks that about half of the files in lost+found are libraries, so I've started recompiling everything on my system hoping that it will replace whatever those were.  I haven't gotten any errors though, other than having to redo the configuration on Azureus (2 of the lost+found files were bits of its configuration).  All in all, though I wouldn't want to repeat this experience, it was interesting to learn some of the details of the Reiser filesystem.  Another benefit is that now just about every possible detail on my partitions, raid, and filesystem are posted here so I can reference them if I ever do have another problem  :Very Happy:  .  Finally, I think the scare from this had definitely taught me the true importance of backups, even though I was able to recover everything this time.

Thanks for sticking around Ma3oxuct, it's definitely nice to know that at least one person has found this thread somewhat useful  :Smile:  .

----------

