# XFS corruption

## big_gie

Hi all,

I had a sytem failure running on a hardware raid1 for the os and raid60 for our simulations data.

Booting from systemrescuecd, I was able to recover the os (even though it does not boot anymore, grub doesn't even show up).

But then I'm trying to recover the raid60 data, or at least part of it. First, I tried mouting the single xfs partion read-only but got a strange error. The error show-up in dmesg and it looked like the kernel crashed. To prevent a kernel problem to further affect the data, I rebooted the livecd before checking the filesystem. But I can't check it:

 *Quote:*   

> # xfs_check /dev/sdb1
> 
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> 
> be replayed.  Mount the filesystem to replay the log, and unmount it before
> ...

 

Trying to mount the filesystem gave (probably) the same error as before, this time I saved it. Here it is:

 *Quote:*   

> 
> 
> # dmesg > dmesg1.txt
> 
> # mount -o ro /dev/sdb1 raid/                                                                                                                                                                                
> ...

 

What's wrong? Is there a bug in the kernel, or is it just its own way of telling me the filesystem is really broken?

Could mouting with "-o ro,norecovery" cause more trouble? There is a couple of files I really need to restore, but don't want to break things even more for not much.

Thanks for your help!

----------

## madchaz

I would suggest you make a copy of your disk to offline media first (use DD)

that way, you can always go back. 

However, as long as you are read-only, it "should" be ok, if it works.

You might want to look at the mount.xfs options to force the journal replay as well. (Again, do a backup first)

And when this is all over, I strongly suggest you schedule daily backups  :Wink: 

----------

## big_gie

Thanks for the suggestion.

This is exactly what I've done for the OS partitions. Unfortunately, I just can't do this, at least yet. The XFS partition is... 40TB.

----------

## madchaz

Yikes. And no backup? You like living dangerously.

----------

## big_gie

No, no backup...

We hoped we would be fast enough to change bad drives so the RAID could be rebuilt. But we suspect the controller to be bad... There was around 75% of the drives that failed inside 20 minutes, which crashed the os and made things a lot more complicated then what they should have been...

Backing up 40TB is quite hard. We'll probably explore the different possibilities after this  :Wink: 

For now, it's only a couple hunder megs that I need to salvage. Everything else is luxury.

----------

## madchaz

I suggest you take a good look at different differential backup solutions  :Wink: 

----------

## trippels

In your case I would try the "-L option" of  xfs_repair. 

You may lose a few seconds of data from shortly before the system failure,

because it will zero the file system log. But most of the data should

 still be OK.

You could also contact the friendly xfs developers on their mailing list

and describe your problem @:

linux-xfs@oss.sgi.com

----------

