# MAJOR problem with RAID

## Unik

Well, I hav two identical ATA/100 WD2000JB disks with 15 partitions each, all merged into a software RAID0 array. After staying powered on for 2 or 3 weeks, I installed a new kernel and therefore needed to reboot. But how big my surprise was when the init told me that my /dev/md8 (/dev/hda14 + /dev/hdb14 at RAID0 w/ persistent superblocks, 180GB total) is completely broken and dropped me to the service console (which I weasn't able to type into - all 'coz hid for my usb keyboard was compiled as a module and wasn't reachable).

So I booted from a LiveCD, copied my raidtab from a backup and issued raidstart --all. All was fine till I've made it to /dev/md8 - the program throwed a heap of garbage (something like hda: SeekComplete DriveError - most probably from IDE subsystem), and the device started to scream and buzz strangely   :Laughing:  . Almost crying with the thought of a disk covered by bad blocks, I started to play with mdadm, loosing the last remnants of hope with each command typed in. After an hour or so, totally sad and hopeless, I typed as a last resort "mkraid -R /dev/md8" with hope that the data (a VERY valuable one, one proggie there took me about 3 years to write it) will not be erased... And voila! Everything seemed to work like normal! Data was even untouched! But the keywiord here is *seemed*   :Crying or Very sad:  . After I rebooted my PC to my normal OS, the problem repeated exactly as it was (wll, later then I tweaked /etc/init.d/checkfs to overcome the sulogin, but still the problem with an unreachable /dev/md8 remains). Even worse - I coudn't gain access to the data - as I did before from a LiveCD. People, PLEASE HELP ME, or I gonna drink vodka till i'm unconscious (or die - whatever comes first  :Smile:  ). I'll be very grateful for ANY advice!

PS: sorry for my English, just look at my location  :Smile: 

----------

## Unik

Still nobody willing to help... I've managed to get the data out, by mounting ext3 partition as an ext2 one (realized that journal may be corrupted). But didn't find a way to restore my partition to untouched state... aaah, gonna go to the shop and buy another couple of bottles...

----------

## taskara

mate.. I wish I could help...

unfortunately you made a raid 0 array.

if it was raid 1 u can just mount the partition directly.

silly boy where was your backup!!!?  :Confused: 

didn't you keep your old kernel? so u can boot back to it?

always keep your old kernel around, and meave it in your boot loader so u can try and fix things like this  :Very Happy: 

----------

## Unik

 *taskara wrote:*   

> silly boy where was your backup!!!? 
> 
> didn't you keep your old kernel? so u can boot back to it?
> 
> always keep your old kernel around, and meave it in your boot loader so u can try and fix things like this 

 

The problem is not an inability to boot, but the inability to mount a raid0 partition without rebuilding a superblock (raid superblock, not ext3 one) each time. Currently I've marked out all bad blocks in that partition, but I don't know how to move the raid superblock to the undamaged place   :Crying or Very sad:  . I don't know how long will the PC be in operable state, but, despite the SMART is crying out loud, the situation is starting to stabilize. Still the problem remains.

PS. I have no problems in booting my PC, coz' the damaged partition mounts in /opt, and that's the place that has nothing to do with kernel boot up and OS operation  :Smile: 

----------

## taskara

can't you just copy all your data off and start again?

is it a physical hard disk problem you think?

----------

## Unik

 *taskara wrote:*   

> can't you just copy all your data off and start again?
> 
> is it a physical hard disk problem you think?

 

1. I copied it already, and reformatted the partition several times with different options. As I've said before, mke2fs -c marked all bad blocks in the partition itself, but not in the RAID superblock (it has nothing to do with the file system, it is separately written onto both physical partitions that form an array), that sits on a damaged space. Also I have rebuilt the array using mkraid -R (normal mkraid just fails, same for mdadm). After the array is rebuilt, it becomes accessible, but only until reboot. So the question is "can I somehow MOVE the RAID superblocks inside physical partitions?".

2. Well, I suppose it's a physical disk problem, because the SMART tells me so, and because of the ill noise coming out from the disk when it tries reading from the damaged area.

PS. Will never EVER use RAID0  :Smile: 

----------

## taskara

so if it's a hard disk problem you just need a new drive.. I don't think it's a problem with raid0, or that it's a problem that the raid array caused.

seems like you just have a bung drive, so just copy the data off, get a new drive and re-build the array?

----------

## Unik

 *taskara wrote:*   

> seems like you just have a bung drive, so just copy the data off, get a new drive and re-build the array?

 

too bad... in fact I too start to think that my beloved HDD is beyound repair  :Sad:  . But the worst thing is finding a place for nearly 380GB of data to reside...

Thanx Taskara, but honestly, I thought more people would respond... Ok, vodka is finished, same with money... Oh that f**king life...

PS. expression of my current state in russian:

ЕЛЫ ПАЛЫ! КАК ХРЕНОВО ТО!

----------

## taskara

yeah.. they are new drives, so u should be able to get a free replacement under warranty, no?

sorry I can't give u happy news  :Sad: 

One thing I WILL suggest is to check your powersupply.

most hard drive problems are caused by cheap powersupplies that spike the hard disks causing them to have problems.

the powersupply is the most important part of a computer - it delivers the life juice for things to work! you wouldn't put muddy petrol in your car would you!?

so if u have a cheap powersupply, I would spend some $ and get a decent one.

good luck!

----------

## ben_h

There's no reason not to use RAID0, but it MUST be paired with a good backup setup.

But that's not unique; any RAID setup (moreover, any filesystem at all) needs seperate backups. The only thing redundant RAID arrays protect against is the sudden and complete failure of a drive. It does nothing to prevent the effects of gradual errors caused by bad blocks, because the device is effectively a single one. Once a block goes bad, the data there will be corrupt, and unless it is noticed immediately and the disk is replaced, then these errors will start to affect the filesystem.

Doing regular backups is the only sure-fire way to protect your data. There's a couple of nice rsync-based scripts around the place that make this easy, and I've made one myself. Shout if you'd like a look.

----------

## Unik

Backup is good if you have a space, large removable media or network storage for a backup  :Smile:  . I don't  :Smile:  .

----------

