# SOLVED: RAID5: compute_blocknr: map not correct

## eccerr0r

Problem turned out to be HARDWARE and not software.

---------------------------------------------------------------------------------

Wondering if anyone has been has been getting these on 2TB disks lately (MDRAID)?

I figure that it may be hardware issue but just wanted to make sure.

I'm trying to run two 2TB disks.  I tried these two configurations:

1. Two disk RAID5 with one missing disk (Degraded mode)

2. One disk "degenerate" RAID5 (degraded mode) that I added the second disk (so it becomes a degenerate RAID5 or actually more like a RAID1)

Both ways I tried a 1.2 superblock.

Both of these configurations are getting the error

```
compute_blocknr: map not correct
```

in dmesg, and the machine hangs on disk i/o to the RAID array.  I guess I have to use these disks as JBOD for now until I rootcause this, perhaps this is ultimately a hardware issue too... ugh.  Memtest86 passes on this machine through at least 1 pass.  I suspect the SATA controllers may have issues.

I'll need to incorporate the third disk but not until I get a backup onto the degraded or degenerate array...  The third disk is currently the backup disk and I don't want to sacrifice its contents just yet.

----------

## Keruskerfuerst

1. Try to use smartmoontools

2. Try to use the two disks separatly and format them with ext4 and do write check (e.g. with dd)

----------

## frostschutz

Which kernel version?

```

drivers/md/raid5.c

        chunk_number = stripe * data_disks + i;

        r_sector = chunk_number * sectors_per_chunk + chunk_offset;

        check = raid5_compute_sector(conf, r_sector,

                                     previous, &dummy1, &sh2);

        if (check != sh->sector || dummy1 != dd_idx || sh2.pd_idx != sh->pd_idx

                || sh2.qd_idx != sh->qd_idx) {

                printk(KERN_ERR "md/raid:%s: compute_blocknr: map not correct\n",

                       mdname(conf->mddev));

                return 0;

        }

        return r_sector;

```

I'd be worried if I got that message from regular RAID usage (degraded or no). Can you show mdadm --examine /dev/sd[xyz]6 for the raid members? Are you trying to access beyond end of device?

----------

## eccerr0r

This was seen on 3.17.8-gentoo-r1.  Yes indeed this is a dangerous error.

```
/dev/sdh2:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x1

     Array UUID : feed:face:dead:beef (faked)

           Name : seagate750G:0  (local to host seagate750G)

  Creation Time : Thu Jul  9 22:03:00 2015

     Raid Level : raid5

   Raid Devices : 3

 Avail Dev Size : 3906267136 (1862.65 GiB 2000.01 GB)

     Array Size : 3906267136 (3725.31 GiB 4000.02 GB)

    Data Offset : 262144 sectors

   Super Offset : 8 sectors

   Unused Space : before=262056 sectors, after=0 sectors

          State : active

    Device UUID : dead:beef:feed:cafe (faked)

Internal Bitmap : 8 sectors from superblock

    Update Time : Thu Jul  9 23:41:12 2015

  Bad Block Log : 512 entries available at offset 72 sectors

       Checksum : 1f46a62b - correct

         Events : 3751

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 1

   Array State : AA. ('A' == active, '.' == missing, 'R' == replacing)

```

Again I cannot discount hardware problems (as in SATA, motherboard) but the hard drives themselves report no errors (they're fairly young - less than 200 hours.  No reallocate, no pending, looiks clean.)

I guess I have to test them individually but need a good test based on block numbers or some other method...  Something like memtest86 but for hard drives...

----------

## frostschutz

Is it a 32bit system without large block device support?

Doubt it's a hardware issue, this is a software calculation going wrong somehow.

----------

## eccerr0r

That's a good point, yes this is a 32-bit kernel and userland system.  I was worried that there was a hardware issue reading incorrect values from the hardware but yes this may very well be an overflow issue somehow, and is a real kernel bug...

I think this machine can run 64-bit code, it might be worth trying to get a 64-bit userland to test...  As this machine has only 1GB of RAM I tried the 32 bit kernel to save on pointer memory.  (I don't have any extra DDR2 lying around... all of the big DDR2 DIMMs are in my server (running a 64-bit kernel) that I need the memory in.  Actually these 2T disks will be put in the server once I validate them!)

Large block support is enabled as far as I can tell:

```
CONFIG_BLOCK=y

CONFIG_LBDAF=y

CONFIG_BLK_DEV_BSG=y

# CONFIG_BLK_DEV_BSGLIB is not set

# CONFIG_BLK_DEV_INTEGRITY is not set

# CONFIG_BLK_DEV_THROTTLING is not set

# CONFIG_BLK_CMDLINE_PARSER is not set

```

EDIT

Looks like dummy1 != dd_idx is the failing assertion in that code by adding a bit more debugging printks.

Hmm.  Need to study this some more.  The actual weirdness is that when I dump a whole bunch of stuff to the array, the array will hang and no more forward progress happens - livelocked on writing.  Probably due to barriers of some sort, I can no longer sync(1) and need to unclean reboot the machine.

I just noticed that this appears to happen after writing 1TB or so to the array.  Unsure if this has anything to do with the issue or just so happens to be the point where something else got triggered.

----------

## eccerr0r

I just temporarily moved these disks to another (true) 32 bit machine.  So far so good, but it's not quite done yet, this machine does not have Gbit Ethernet so I can't copy nearly as fast.  (Perhaps I should have also tried sticking this SiL3114 PCI  board in the other machine too, alas, I think that board is kind of broken anyway since it was a hardware swapout already.)

Done now, looks like in this case we have a hardware problem.  Oh well.

----------

