# [solved] mount cannot read superblock of raid device

## majoron

Hello,

I have a problem with a raid6 (done by hardware). After changing one disk, that was having a lot of SMART warnings, the controller rebuilt the raid and according to it everything is fine.

However, when I try to mount the partition it complains:

```
# mount /dev/sdc 

mount: /dev/sdc: can't read superblock
```

and kern.log says:

```
Mar 28 15:20:14 localhost kernel: [19771.776731] XFS mounting filesystem sdc

Mar 28 15:20:14 localhost kernel: [19771.805883] Starting XFS recovery on filesystem: sdc (logdev: internal)

Mar 28 15:20:14 localhost kernel: [19771.850390] sd 8:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK

Mar 28 15:20:14 localhost kernel: [19771.850395] sd 8:0:0:0: [sdc] Sense Key : Hardware Error [current] 

Mar 28 15:20:14 localhost kernel: [19771.850400] sd 8:0:0:0: [sdc] Add. Sense: Internal target failure

Mar 28 15:20:14 localhost kernel: [19771.850406] end_request: I/O error, dev sdc, sector 873024

Mar 28 15:20:14 localhost kernel: [19771.850426] I/O error in filesystem ("sdc") meta-data dev sdc block 0xd5240       ("xlog_recover_do..(read#2)") error 5 buf count 8192

Mar 28 15:20:14 localhost kernel: [19771.850440] XFS: log mount/recovery failed: error 5

Mar 28 15:20:14 localhost kernel: [19771.850586] XFS: log mount failed
```

Some things I tried:

1) 

```
# xfs_check /dev/sdc

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed.  Mount the filesystem to replay the log, and unmount it before

re-running xfs_check.  If you are unable to mount the filesystem, then use

the xfs_repair -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.
```

of course mount doesn't work, by definition of the problem  :Wink: 

2) So I tried what the last command suggests (adding -n, just to see)

```
# xfs_repair -L -n /dev/sdc

Phase 1 - find and verify superblock...

Phase 2 - using internal log

        - scan filesystem freespace and inode maps...

xfs_repair: read failed: Input/output error

fatal error -- can't read btree block 0/108822
```

No luck yet  :Sad: 

3) I opened it with parted:

```
# parted /dev/sdc

Warning: Could not determine physical sector size for /dev/sdc.           

Using the logical sector size (512).

GNU Parted 2.3

Using /dev/sdc

Welcome to GNU Parted! Type 'help' to view a list of commands.

(parted) print                                                            

Error: partition length of 42953867264 sectors exceeds the loop-partition-table-imposed maximum of 4294967295
```

It's using 512 byte sector, but it is GPT!!

I thought that perhaps I should set the label with parted to gpt (it was before, but obviously it is no longer the case).

However the documentation of parted says that this command (mklabel gpt) will cause that my data is unusable and I will need to rescue the data, but no warranties are given about it...

So, I'm a bit lost.

Does anyone have an idea about how to mount the disk without data loss??

TIA,

Regards

Some specs:

24x1TB disks

Adaptec 52445

xfs file system

----------

## majoron

No one?   :Sad: 

----------

## Goverp

FWIW, and it's not much, it looks like your hardware RAID was lying when it said it rebuilt the array.  Personally, I've suffered neither failure of a normal drive nor a RAID array, so I can't help you with recovering data.  

Your kernel log says there are hardware IO errors.  Is that on the disk that was rebuilt, or another disk?  One well-known possibility is that you bought all your drives at the same time, and therefore they have had the same usage and came from the same manufacturer's batch, That makes them likely to start failing at the same time.  If you get that, when one disk has an unreadable areas and causes you to rebuild the array, you discover that rarely-used areas of other disks are also unreadable, and the recovery dies a painful death.

I'd hope that RAID 6 with two parity areas would survive, but perhaps you've had three bad disks - I note that your say you have 24x1TB drives - are they all in the same array?  If so, you could be heading for a very painful time   :Sad: 

I hesitate to ask, but, do you have backups?  With that many disks, I guess you may have been hoping to use the array as backup.  (The books say "RAID is not a substitute for a backup strategy", but that would be a bit late now.)  Otherwise, I guess you're down to using disk forensics.  You'll find help on that in the forums.

----------

## majoron

 *Goverp wrote:*   

> FWIW, and it's not much, it looks like your hardware RAID was lying when it said it rebuilt the array.  Personally, I've suffered neither failure of a normal drive nor a RAID array, so I can't help you with recovering data.  
> 
> Your kernel log says there are hardware IO errors.  Is that on the disk that was rebuilt, or another disk?  One well-known possibility is that you bought all your drives at the same time, and therefore they have had the same usage and came from the same manufacturer's batch, That makes them likely to start failing at the same time.  If you get that, when one disk has an unreadable areas and causes you to rebuild the array, you discover that rarely-used areas of other disks are also unreadable, and the recovery dies a painful death.
> 
> I'd hope that RAID 6 with two parity areas would survive, but perhaps you've had three bad disks - I note that your say you have 24x1TB drives - are they all in the same array?  If so, you could be heading for a very painful time  
> ...

 

Thank you for the reply.

The IO errors correspond to the whole raid. I'm not totally sure of this, but I think the kernel cannot see the individual disks, only what the raid controller "shows" (all the disks are directly connected to the controller). Am I wrong?

Anyway, we changed one disk not because it broke. We just were seeing a lot of smart warnings for some weeks and we decided to prevent any problem by changing the disk. Only one disk was behaving like that.

All disks are in the same array, yes.

Concerning backups: actually this is a kind of central storagement+backup system for one project we are working in. I believe all the data on that system is replicated somewhere else. But I'm not totally sure. This is precisely my concern. At least I'd like to have a look at the name of files and directories before destroying anything (else(?)). If I could confirm that we have a copy of all the data, then probably I would forget about the data in the raid and start from scratch. Probably making another type of raid. Or two arrays of raid5's.

Best

----------

## Goverp

 *Quote:*   

> The IO errors correspond to the whole raid. I'm not totally sure of this, but I think the kernel cannot see the individual disks, only what the raid controller "shows" (all the disks are directly connected to the controller). Am I wrong?

 I've not worked with hardware RAID, so I don't know.  I was guessing there would be some driver software associated with it that might be issuing diagnostics.  This may be the distinction between true hardware RAID and "fake RAID".  At some level I think there's a connection between Linux and the underlying drives, or you would not have seen the SMART warnings.  If not, then I'd hope the RAID hardware has some diagnostic or test software, but I guess you have already looked for things like that.

If you can't approach the problem from the hardware end, that only leaves disk forensic software that works with the unmounted block device.  That should at least tell you if the array contains something that looks like data, albeit unreachable, or random noise, in which case I'd suspect the RAID controller's broken.  If the array still looks to contain data, then it's presumably a case of fixing partition tables and the like until you can get the array mounted.  Otherwise it might be worth substituting a new RAID controller,

Out of interest, do the 24 drives comprise 22 data and 2 parity components, or is there some smaller grouping?  If it is 22+2, and if all the drives are same batch/age, it does seem possible that you might be discovering that three drives are going bad simultaneously.  Are you still able to get SMART data for all the components?  Not that it would help much if it is 3 bad drives.

One other approach occurs to me.  First, you hardware arrangements might allow mounting the 24 drives native, not as a RAID array.  Disk forensics should let you see if they are readable and contain data rather than noise.  Next, and I'm way out of my depth here, I remember reading Linux's software RAID can use similar super-blocks to hardware RAID - perhaps that means the kernel could handle the array in software rather than going via the RAID hardware.  After all, the RAID algorithm is pretty simple!

----------

## majoron

Hello,

After waiting enough we found that we could live without the data in the raid. So I forced things.

I did

```
# xfs_repair -L /dev/sdc 
```

Which started. But crashed. I rebooted and entered into the controller bios. The bios complained saying that the situation was not ok. Apparently did something by itself after I went on. After rebooting I could already mount the partition. There was corruption in some files, but not everywhere.

So, thank you for the answers.

Regards.

PS marked as solved, as kind of solution was found...

----------

