# raid 0 xfs issue

## lifeform

hi all,

on a fateful day, i was emerging world and was compiling glibc-2.3.5-r2 when my system crashed out.

it didn't just hang, the entire machine turned off by itself in the midst of the compilation!

i had to restart the machine from the wall socket (as the power switch on the casing didn't work no more).

i re-ran the the emerge process and it happened again!!!

the next time i ran it, it could not mount my raid devices anymore

i upgraded my kernel to 2.6.13-gentoo-r3, thinking that there might be a slim possibility of "better" raid detection....

but no....

tested out with previous kernel agian.. but to no avail.

i have raid 0 running on md0 and md4

md0 consisting of hde1 and hdg1

md4 consisting of hde6 and hdg6

both are on SATA drives.

they contain all my invaluable data

on boot up,

there's some "DriveReady SeekComplete Error" messages for hde1

when starting up RAID devices (mdadm)...

mdadm: /dev/hde1 has no superblock - assembly aborted

mount: /dev/md0: can't read superblock

mount: /dev/md4: can't read superblock

/proc/mdstat gives

Personalities : [raid0]

md4 : active raid0 hdg6[1] hde6[0]

          115105408 blocks 32k chunks

unused devices: <none>

however, when i attempt to mount md4,

it says

mount: /dev/md4: can't read superblock

dmesg gives a bunch of "DriveReady SeekComplete Error" messages.

does anyone have any clue or solution, or where can i start troubleshooting?

please help.

----------

## NeddySeagoon

lifeform,

It sounds like one of your drives has died. However, it may just be the controller or even the data cable.

Its worth swapping over SATA cables and seeing if the problem changes drives, or swapping in a known good cable.

Its also worth trying the drives in another PC.

The IDE SATA driver has long be depreciated. You should move to to the SCSI driver for your chipset and have your drives appear  as /dev/sd...  Thats a job for when you can get your raid back

You can also try to read the raw unmounted devices with dd. Do get this right, there is no safety net.

```
dd if=/dev/hde of=/dev/null
```

will read the whole of input file (if=) to output file (of=).

Errors are a bad sign especially if repeated tests make them occur in the same place.

----------

## lifeform

thanks neddy,

i'll give that physical solution a go when i get back.

was wondering... md4 was listed under /proc/mdstat and that means (hde6 and hdg6) was up and running?

which also means that one drive is not "fully down" since hde6 is one of the constituents.

is it likely that bad physical stuff can effect drives partially (as in only some partitions of a drive are down and some others up)?

that said, it's funny why i'm not able to mount md4 due to an unreadable superblock

i remember back when these drives where working, there wasn't a superblock either.

what's the purpose of having a superblock?

will post results soon...

----------

## lifeform

swapped the sata cables.

errors still appear on the same drive - hde

doing "dd if=dev/hde of=/dev/null" now...

like u mentioned, errors would be a bad sign..

what would no errors indicate?

----------

## lifeform

okie..

with hde, the dd thing gave 

"

76339272+0 records in

76339272+0 records out

Input/output error"

with hdg, no errors..

simply

"

312581808+0 records in

312581808+0 records out"

looks like my hde is the troublemaker..

is there any way to salvage the situation?

any methods at all?

----------

## NeddySeagoon

lifeform,

No errors from dd would indicate it read the entire surface of the drive and the drive is physically OK.

dd will list the number of blocks read and that should be the same as the number of blocks on the drive when it completes. If it terminates with an error, it will say so.

Provided the drive is physically OK, the implication is that the data it contains has become corrupt. Its worth trying fsck in interactive or read only mode. Do not let it fix anything to start with, it can make matters worse, rather than better.

Every filesystem has a superblock, most have several. They describe the filesystem to the kernel. Normally, only the first one is used, the others are backups.

You say your filesystem is xfs. Read man fsck and man fsck.xfs for specific help and be sure you understand what fsck will do for you before you run it. I'm not an xfs user.

To be able to use fsck, you need to get the raid set formed but it must not be mounted.

----------

## lifeform

read that fsck.xfs is a no-op.

did what i think is the equivalent of a fsck with xfs_repair as well as xfs_check

xfs_check returned:

/dev/md0 is invalid (cannot read first 512 bytes).

xfs_repair returned:

Phase 1 - find and verify superblock...

superblock read failed, offset 0, size 524288, ag 0, rval 0

----------

## NeddySeagoon

lifeform,

Check your dmesg to see if your raid device was created. If /dev/md0 is not attached to the raid volume, then nothing works since there is nothing there to read.

If the dd of both disks works then the disks are OK.

Its quite valid to do 

```
dd if=/dev/mdX of=/dev/null
```

put the right number in for X.

If the kernel did not form your raid set, it will fail very quickly.

----------

## lifeform

dd of hde gave me error

dd of hdg was error free

should i be running fsck on the individual disks or on the raid set?

if there are entries in /proc/mdstat, what can i tell from that?

"dd if=/dev/mdX of=/dev/null"

indeed died very quickly

dmesg gave me

"Buffer I/O error, dev hdX, logical block n"

when i ran it

----------

## NeddySeagoon

lifeform,

You run fsck on the raid set, neither of the individual partitions hold a complete filesystem, so cannot be checked

In raid0 its all the odd numbered chunks on one drive, all the even numbered chunks on another.

Hence when data on one drive is damaged, you loose the data on the entire affected partition.

It may be time to visit the drive manufactuers web site and get their test untility to run against the bad drive.

These things normall have several tests, some of which are non-destructive. Many will even produce a RMA when the detect a failure.

The ideal, is not to run fsck on the corrupt data unless you have a copy of it. fsck can make things worse, and you may wish to go back. Look at dd_rescue and its wrapper (whose name I forget) to copy as much as you can from /dev/mdX.

You need space to store the output as a file, which can then be mounted with mount -o loop ...

----------

## lifeform

neddy,

what's RMA?

am i correct to say that if the raid entry (eg.md0) appears in /proc/mdstat

but not under mount, the raid set is formed but not mounted?

it seeems that i'm able to dd the drive by itself, and not in its raid set.

when i dd md4, which was "formed but not mounted", it gave an Input/output error

with 0+0 records in and out

am i at the end of the road?

or will i be able to salvage stuff from drives individually and then reform the data from 2 such individual "rips" to form consolidated data like in a raid0 set?

----------

## NeddySeagoon

lifeform,

RMA is Returned Materials Authorisation. Its the first step in returning your drive for a warranty repair, if its still under warranty.

dd doesn't care if the raid is mounted or not, it does raw device access. It must be formed though.

You are correct in what you say about mounts and mdstat.

Well, the good drive is still good. You can get some data off the bad drive but the tricky bit is getting back into a raid set so you can use it.

You can try to recover as much as possible with dd_rhelp 

This can be used on whole drives or just partitions. It will make an image file of the recoved data.

Hmm interesting thought for the day ... can you mount two files or a file and a partition as a raid0 set?

Everything in Linux is a file, so it may be worth a try.

dd_rhelp is not in portage but its easy to build. Copy what you can before you let fsck loose. fsck will work on both drives, so image the partition on the good drive too. You may want an undo function.

----------

## lifeform

okie.. looks like it's time to let it go..

my bios isn't even picking up the drive now..

and there'r clicking sounds during the attempts to access the disk..

----------

## lifeform

thanks for your help neddy...

appreciate it.

----------

