# XFS repair question

## i0

Hey

Suddenly at night i saw:

```
Filesystem "sdb1": corrupt dinode 24658384, (btree extents).  Unmount and run xfs_repair.
```

So i unmounted and started xfs_repair.

That was 24h ago.

Machine load is 1.00 stable.

sdb1 is 8TB

xfs_repair has not given any output for 24h.

Is that normal, or should xfs_repair give some output by that time?

----------

## frostschutz

should produce output, especially if you started it with -v

disk activity?

hope you have a good backup...

----------

## i0

I did not started xfs_repair with -v

There is some disk activity, but not much.

Can i kill xfs_repair and start over, or does it make things worse?

----------

## kimmie

You'll have to kill it eventually if it doesn't stop, whether it makes things worse or not. Sure, it's not good to kill anything doing low level disk stuff, but I'm sure the xfs folks have done their best to make sure it's as painless as possible. Anyway, it's not as if you have a choice.

Have you looked in the system log for hard disk errors? Perhaps you're getting hardware errors which are causing xfs_repair to choke.

If you haven't already got backups then at this point I'd make a backup to a different physical disk. Then you can try running xfs_repair again.

----------

## i0

There's the thing: there are no disk errors.

Disks are connected to 3ware sata controller and they are configured as RAID6, also there is hot spare available.

RAID verify completed successfully, also smartctl did not gave any errors on any disks, and there is no disk errors in syslog.

I read in another forum saort of same thing when 6TB filesystem repair by xfs_repair took 56 hours.

Since mine is 8TB i think i'll wait for couple of days and see what happens.

If i have to kill xfs_repair and start over, what would be correct procedure to do?

Somewhere i read that xfs_repais should first be run first with no-modify mode.

Is that so or doesn't it matter?

----------

## kimmie

My experience stops at RAID0 and 250MB filesystems, the one time I had to run xfs_repair it took about 10 minutes.

However, the xfs_repair man page says this:

```

  -P     Disable  prefetching  of  inode  and  directory blocks. Use this

          option if you find xfs_repair gets stuck and  stops  proceeding.

          Interrupting a stuck xfs_repair is safe.

```

So maybe try killing it, run xfs_check to see if it tells you more about the problem, then run xfs_repair again. Sorry this isn't from experience, I'm just saying what I would do.

For more certainty, I'd head over to xfs.org and look at the mailing lists, maybe post a question there.

----------

## i0

Thanx, i will try xfs_check first and see what it says.

Do you thing i should disable RAID controller cache for that massive or not?

Can this make any difference while doing xfs_repair or not?

----------

## kimmie

I don't really have an answer to that. http://xfs.org/index.php/XFS_FAQ recommends setting cache off for 3ware RAID controllers, although it doesn't really say why. It can't hurt while running the repair anyway.

Unmount the filesystem, run xfs_check, then if it says to repair, run xfs_repair with the -P option. 

Forgive me if I'm telling you stuff you already know, but the situation you are in is one where "RAID is not backup" kicks in. I didn't believe this myself until I lost a set of mirrored drives due to a dud ATA cable. So much for my mirror = backup; both drives got trashed. So unless you really need ultra-high availability, you can probably use RAID 0 and have a backup with not many extra drives, which is a much safer proposition. The backup will protect you against non-disk hardware failure (eg. controller, cable) as well as software failure or a system crash. For backups,  xfs_dump is exceptionally fast, does incremental backups and is a really good way to back up xfs file systems.

BTW you might find http://hep.kbfi.ee/index.php/IT/KernelTuning looks interesting too, I found that by googling "3ware xfs raid cache".

----------

## i0

Thanx.

I'll let you know how things went when it is done.

Currently i managed to copy everything i needed while mounting filesystem read-only.

I'll try first to repair it, if that does not succeed then i check disks and create new filesystem.

----------

