# BitRot, silent corruption, how should one deal with it?

## DingbatCA

My backups keep getting corrupted.  Ya, the sob story you hear from everyone.  Go to unzip some massive backup image and it fails. Your data if forever locked in a corrupted zip and your primary copy is gone do to your new pets bathroom habits and your laptop...

The above situation, or things similar, have happened to me more times then I would like to count.

I backup to my large home NAS.  Large in my case is 6X 2TB disks in a RAID 6 configuration.  The problem I keep getting is silent corruption (Bit Rot, bad blocks...). After a large peace of content sits cold for a few years, it always seems to get corrupted.  The correct way to deal with this is a file system that can scrub for errors, like ZFS or BTRFS.  Sad to say ZFS cant grow in a style that works for me and btrfs is still to beta for my tastes.  I would love to use WAFL, but I cant afford that for home use.  What in the heck should I do?

As a short term solution I cobbled together the below script to generate a par2 archive for every single file.  I don't like it.  I wish there was a better way. Any ideas?

```
#!/bin/bash

#Version 0.5

#The goal of this script was to add an extra layer of protection on top of 

#my current storage. Does not matter what type of storage this is, single

#disk, RAID, read-only media, NFS/CIFS mount... What ever.

#Why? Because I have had to may time where my data had been corrupted because

#of some kind of failure.  This includes bad, drive, cable, backplane, drivers, 

#controller cards, funky networking, human stupidity, FS problems, BitRot, RAID

#and more. But I have never had data corruption do to bad ram, go figure.

#How? par2. https://en.wikipedia.org/wiki/Parchive basically file level RAID.

#Setup. Two directories are needed.  A place for all the parchive stuff and

#source media which will be treated as read only. I used AuFS to simply

#overlay the two file systems because par2, in its current form, does not

#support putting the archives somewhere else. It turned out much cleaner

#than I expected.  Example mount command:

# mount -t aufs -o br=/backup/pararchive=rw:/media/=ro none /archive

#If you choose to run this directly on your data, it will make a MESS!

#In a perfect world, the pararchive directory is on a different physical storage

#then the primary media. Why?  Because a bad RAID stripe could damage both the

#primary data and the par files intended to keep it safe.

#ZFS. Cant grow one disk at a time.

#Btrfs. Too beta, but there is hope

#WAFL. Hell ya, I can't afford it.

#Warranty?

#Nope, none, nothing.  This script will delete all your data, then scrub the

#disks, then give them a bath in salt water, then roast them over an open 

#flame ensuring your data in unrecoverable.  Use at your own risk.

#Enable job control

set -m

#Go to the unified file system and start the magic

cd /archive/

#Got to deal with spaces in the names.

SAVEIFS=$IFS

IFS=$(echo -en "\n\b")

buildit(){

  SUMS=`cat "$f" | tee >(md5sum) >(sha512sum) > /dev/null`

  MD5SUM=`echo $SUMS | awk '{print $3}'`

  SHA512SUM=`echo $SUMS | awk '{print $1}'`

  echo "sha512sum: $SHA512SUM" > $1.extattribs

  echo "md5sum: $MD5SUM" >> $1.extattribs

  par2create -n1 -r5 -qq "$1"

  stat --format "stat: %Y,%Z" "$1" >> "$1.extattribs"

}

fast_check(){

  STAT=`grep "stat: " "$f.extattribs"`

  SHA512SUM=`grep "sha512sum: " "$f.extattribs" | awk '{print $2}'`

  MD5SUM=`grep "md5sum: " "$f.extattribs" | awk '{print $2}'`

  SUMS=`cat "$f" | tee >(md5sum) >(sha512sum) > /dev/null`

  MD5SUM_NEW=`echo $SUMS | awk '{print $3}'`

  SHA512SUM_NEW=`echo $SUMS | awk '{print $1}'`

  if [ "$MD5SUM" != "$MD5SUM_NEW" ];then

    echo "ERROR, bitrot?: md5sum and md5sum_new did not match: $MD5SUM, $MD5SUM_NEW, $f"

    #break out of this round

    continue

  fi

  if [ "$SHA512SUM" != "$SHA512SUM_NEW" ];then

    echo "ERROR, bitrot?: sha512sum and sha512sum_new did not match: $SHA512SUM, $SHA512SUM_NEW, $f"

    #break out of this round

    continue

  fi

  #echo "Healthy, Fast Check: $f"

}

deep_check() {

  par2verify -qq "$f.par2"

  PAR2VERIFY_EXIT="$?"

  if [ "$PAR2VERIFY_EXIT" != "0" ];then

    echo "ERROR, bitrot?: par2verify exited with: $PAR2VERIFY_EXIT"

    #break out of this round

    continue

  fi

  #echo "Healthy, Deep Check: $f"

}

#Find takes a while.  Not sure how to speed this up.

#Par2 does NOT like zero byte files. Well Duh!

find . -type f -size +1c | egrep -v '(.par2|.extattribs)' > /dev/shm/list_o_files.txt

#Hummm can you say dash shell! Re-code with dash support, later 

for f in `cat /dev/shm/list_o_files.txt`

do

  STAT=""

  SHA512SUM=""

  MD5SUM=""

  if [ "$f" = "" ];then

    echo "ERROR WTF?! $f"

    #break out of this round

    continue

  fi

  while [ 4 -le `jobs | wc -l` ]

  do

    #Never more than 4 jobs at once

    sleep 0.01s

  done

  if [[ -e "$f.extattribs" && -e "$f.par2" ]];then

    #Check for changed files

    STAT=`grep "stat: " "$f.extattribs"`

    STAT_NEW=`stat --format "stat: %Y,%Z" "$f"`

    if [ "$STAT" != "$STAT_NEW" ];then

      echo "Modified file found: $f"

      #All checksum data is garbage, need to rebuild:

      rm -f "$f.extattribs"

      rm -f "$f*.par2"

      echo "Rebuilding for: $f"

      buildit $f &

      #break out of this round

      continue

    else

      if [[ "$1" = "-f" || "$1" = "-d" ]];then

        echo "Fast check of: $f"

        fast_check $f &

        if [ "$1" = "-d" ];then

          echo "Deep check of: $f"

          deep_check $f &

        fi

      fi

    fi

  else

    #New file found.  Build everything!

    #break out of this round

    echo "Building for: $f"

    buildit $f &

    continue

  fi

done

rm -f /dev/shm/list_o_files.txt

echo "Finding and delete dangling extattribs/par2"

find . -type f | egrep '(.extattribs)' > /dev/shm/list_o_filesX.txt

for f in `cat /dev/shm/list_o_filesX.txt`

do

  if [[ -e "$f.extattribs" && -e "$f.par2" ]];then

    if [[ ! -e "$f" ]];then

      #Found par2 and extattribs with no matching file.  Clean up time.

      #This still does not solve the dengling dir problem

      rm -f "$f.extattribs"

      rm -f "$f.par2"

    fi

  fi

done

rm -f /dev/shm/list_o_filesX.txt

IFS=$SAVEIFS

cd - > /dev/null
```

----------

## frostschutz

 *DingbatCA wrote:*   

> The problem I keep getting is silent corruption (Bit Rot, bad blocks...).

 

Is it really silent, as in, no errors in dmesg, smartctl -a, smarctl -t long pass for all disks, mdadm --examine shows up fine for all disks, no bad blocks, etc.?

 *DingbatCA wrote:*   

> The correct way to deal with this is a file system that can scrub for errors, like ZFS or BTRFS.

 

The question is, where exactly does this corruption happen?

If you actually have a program running haywire and corrupting files, ZFS/BTRFS might not help you either. They will see the corruption as regular write accesses and they change the files the way they're supposed to.

Hard disks do their own checksumming so if some outside influence (moonlight, pixies, and such) changes bits on the disk, the disk itself would notice and report read errors.

Most network protocols have their own ways of detecting data corruption in transfers, or you would see a lot of corruption when doing regular downloads over congested lines...

While RAID does not do checksums, it has parity and can check parity (mismatch_cnt after running a raid check). If one disk somehow flipped its bits, you'd get mismatches, so if you do raid checks regularly and check the mismatch_cnt afterwards you'd notice that something is amiss. I've been checking my own RAID5 (7 disks) for a long time like that and mismatch_cnt was always 0. So RAID is able to detect bit flips on single disks, however it does not know which side is correct.

Personally, I've never even heard of bitrot issues before. Disks going bad yes, but not silently, but reporting errors properly. I've had bitrot in images not because of any fault in disks or filesystems or ram but because some fancy image viewer thought it was a great idea to modify each image it touched, which is a case of buggy software, and no manner of "bitrot protection" will help you here, because until you notice manually that this went somehow wrong, to everyone it looks like a change that was supposed to happen...

----------

## NeddySeagoon

DingbatCA,

Raid6 with errors ouch!

Examine the smart data from each drive.

Save it somewhere.

Run a repair on the raid.  Do this monthly in a cron job.

Check the smart data and dmesg after the repair.

The repair checks that the redundant data is self consistent across all the drives and rewrites any blocks on drives that disagree.

Note I said 'self consistent' - that's not the same as correct.

Have you been validating the backups as soon as they are written?

If not, you don't know that its bit rot.  The backups could have been faulty on write.

----------

## DingbatCA

I always cover the basics and check the drive logs (SMART) + system logs.  I also run extensive (smart long test, smart secure wipe, mkfs.ext3 -cc) tests against any questionable drives.  The cases where I have good errors are nice.  Most of the times I have no errors to go on. I feel like I am chasing ghosts.

I know having a single sector go bad (NOT silent corruption) on a single disk can still cause corruption in RAID6. Ya, this should not be! I have run into this problem dozens of time in the past few years. Take a simple example: RAID5, 4 disks.  One sector gets wiped out in a stripe. How does the RAID know if the "bad" sector is because of bad parity, or bad data?  What gets over written/updated?  Does the RAID assume the data is all good and the parity is broken, or the parity is good and the data is broken. This problem is even more pronounced in a mirror.

I know in one of my cases I made a full backup, tested backup from NAS, then let it sit for a few weeks.  During that time where was a power outage.  The array came up and did the normal re-sync.  After that the restore of the backup failed. The 33GB zip (tar.xz) was corrupted. I spent weeks trying to figure out what went wrong.  All the disks were/are healthy. I was never able to track down a root cause...  I blame silent corruption for this one.

I have my array check set for once a month.

I also know a few of my problems were caused by bugs in the MD stack...  Again, data corruption.

I have had a bad SAS backplain, sas933el1, that happened to work 99.9% of the time.  The bug is well know.  The fix... Burn the SAS expander back plane and go back to direct attached.

My only constant has been the Marvell (mvsas) based cards. As of a few weeks ago I just started moving all my NAS's over to LSI. I am tired of the problems with the mvsas driver stack.

The script was built as a generic fix it.  Kind of a CYA incase of something like a FS,RAID,Drivers or hardware level errors.  I was just hoping there was a better way?

----------

## frostschutz

 *DingbatCA wrote:*   

> I know having a single sector go bad (NOT silent corruption) on a single disk can still cause corruption in RAID6. Ya, this should not be!

 

Kernel bugs aside, this happens only should the disk give bogus data instead of reporting read errors. If you ever come across a disk that does that, no choice but to kick it out. RAID or no, everything relies on disks reporting errors properly.

 *DingbatCA wrote:*   

> One sector gets wiped out in a stripe. How does the RAID know if the "bad" sector is because of bad parity, or bad data?

 

The RAID does not know, but there are no processes either that wipe out things this way. If you're talking about filesystem corruption after a power loss, you get those without RAID, too. I would not count power loss as "silent" corruption, the filesystem will say quite clearly that it's corrupt and in need of fsck, and fsck is not a magic that fixes everything, in many cases it causes even more damage. If you have a power loss you know things could have happened and you can check files.

If you run RAID checks and the mismatch_cnt also (have it mailed to you after every check) and mismatch_cnt is not 0 you have cause to investigate.

I have something like this in cron:

```

echo Sync Action Check for /dev/md$i

mdadm --wait /dev/md$i # in case not idle already

echo check > /sys/block/md$i/md/sync_action

time mdadm --wait /dev/md$i

cmp <(echo 0) /sys/block/md$i/md/mismatch_cnt && echo OK || echo FAIL

echo mismatch_cnt is $(cat /sys/block/md$i/md/mismatch_cnt)

```

 *Quote:*   

> Kind of a CYA incase of something like a FS,RAID,Drivers or hardware level errors.  I was just hoping there was a better way?

 

There is not. You always trust some hardware which may be faulty or some software which may be buggy.

If paranoid, set up more systems (different hardware, different software [distribution/kernel versions], different filesystems. You get a kernel bug that kills one filesystem - the others survive.

I do this on a (very) small scale. I use XFS for everything but my backup partition is EXT4. Not because I (dis)like either filesystem, just to have different ones just in case one of them goes south in a new kernel release - it happened before.

----------

## Akkara

Memory issues can be a very likely culprit.  I've seen bits get silently flipped just from copying from one hard drive to another.  The first drive will read fine, the second drive has a flipped bit.  No errors reported.  Whether the RAM loses a bit, or the bus timing is marginal and it gets mis-interpreted, or the SATA chip mis-reads it, I don't know, but, it happens.  Seems to be around one in 10TB transferred more or less, on consumer-level hardware.  I've never seen a bit get flipped just sitting there on a offline hard-drive, after I had filled it _and_ verified it.

Another thing I also noticed: bit-flips seem to be more likely to happen when the disk(s) and ethernet are both being used heavily, such as might be the case on during a high-speed local-network copy.  Again, I think (but do not know for sure) it's something having to do with marginal bus timing because the net protocols themselves have error checking and would report a problem if it happened on the wire.

A few years ago I moved to server-class hardware with ECC memory.  Haven't seen a problem since then.  This is probably not what you want to hear, since it is pricey.  You might try underclocking what you have now.  That might improve the margins enough that flips happen much less often.  It's an exponential thing once you get too close to critical timing.

A bad or marginal power supply can also cause this.

 *frostschutz wrote:*   

> I've had bitrot in images not because of any fault in disks or filesystems or ram but because some fancy image viewer thought it was a great idea to modify each image it touched, which is a case of buggy software, and no manner of "bitrot protection" will help you here, because until you notice manually that this went somehow wrong, to everyone it looks like a change that was supposed to happen...

 

There's easy protection against this sort of thing: Make your images (and rest of media) owned by a different user, such as media.  Give read but not write access to the the main user and anyone else.  That keeps overly "smart" programs from messing up tags and generally screwing things up.  When you do need to change something, sudo media ...

----------

## frostschutz

Aye, I use chattr +i to make it read-only.

 *Quote:*   

> 
> 
>        A file with the 'i' attribute cannot be modified: it cannot be  deleted
> 
>        or  renamed,  no  link  can  be created to this file and no data can be
> ...

 

And of course, backups backups backups. Read-only backups. For photos in particular you can still use DVD-R, which is one of the few media that survives short-circuits that may kill many drives at once...

----------

## DingbatCA

I dont think I can make my current hardware much better.

Intel Xeon X3470

SuperMicro X8SI6-F, with LSI2008 controller

24GB of Registered ECC, HMT351R7CFR8A-H9, from the list of memory modules approved for this board.

SuperMicro case with 3X hot swap power supplies.  (http://www.supermicro.com/products/chassis/3U/932/SC932T-R760.cfm)

@Akkara.  This effects my big production arrays that are under constant load.  Mind you, at work, WAFL cleans up these errors with out fuss.

I am not here to argue the point if this is happening or not.  It happens, even with the best in class hardware. My big question is how to deal with it on a home Linux NAS? RAID2?

----------

## Buffoon

I know you said ZFS is not an option for you. I faced same problem and after figuring there is no perfect solution I went for ZFS and RAIDZ2, ECC memory.

----------

## DingbatCA

Sad to say I am backing my self into a corner.  ZFS or BTRFS...

----------

## davidm

 *DingbatCA wrote:*   

> Sad to say I am backing my self into a corner.  ZFS or BTRFS...

 

Btrfs works in theory and has support native to the kernel.  The problem is there are a lot of bugs and honestly you are more likely to lose data due to a bug in btrfs rather than bit rot.  It doesn't seem as stability is a priority in the project at the moment as serious regressions are routine and seem to occur with almost every major kernel version.  The only thing  will give them is that in my case I haven't lost data with Raid1.  However it feels to me as if I simply got lucky and won a game of Russian Roulette.  I've migrated away from btrfs due to this for everything other than for non-essential storage of things such as movies and torrents.

I'm not sure how things are with ZFS.  From what I understand it can be a pain if you like to use the newest kernel versions as often you get stuck on waiting for support due to the licensing issues.  Otherwise it offers most of the same features as btrfs but with considerably more stability.

----------

## DingbatCA

@davidm.  I have lost data with btrfs.  It was a sad day. Btrfs is just too beta for me.

ZFS has one simple problem.  It cant grow.  Yes, you can add a single disk, and it comes in as a single Vdev.  But you loose the whole point of RAID when adding single disks.  My normal growth plan, for home, is buy one off disks as needed.  At work I have the luxury of buying a shelf of disks at a time.

Am I back to my dumb little script?

----------

## szatox

What about simply scrubbing your raid?

RAID5 (at least md-raid) will replace parity in case of mismatch. Raid 6 has double parity which mans you can recover from losing 2 strips (when you know which strips) or corruption of a single strip (when you don't know in advance which on is corrupted) - the second parity allows the drives to vote for end result and determine which single strip is broken.

----------

## frostschutz

 *szatox wrote:*   

> the second parity allows the drives to vote for end result and determine which single strip is broken

 

Can you confirm it's actually implemented that way? To my knowledge no such voting occurs.

Voting can also be the wrong thing. For example a quite common damage is blocks getting zeroed for some reason. If two zeroed parity blocks over-vote the (valid!) data block, you've caused more damage instead.

RAID scrubbing can tell you that there is a mismatch but it has no notion of what is the correct way to fix that so if you do tell it to fix you have to expect it will fix it the wrong way.

It can be done manually with a lot of effort... locate the mismatch, get different versions depending of which disk(s) are involved in serving that sector, see which file was stored there, see which version of file is the correct one and write that back.Last edited by frostschutz on Wed Nov 11, 2015 8:55 pm; edited 1 time in total

----------

## DingbatCA

Just trying to find a good long term solution.

I do have one array in bad shape currently.  When the new controllers come in, I am going to do a full wipe of each disk, then rebuild the array and restore from backup.

My normal order of operations for testing a disk:

1) smartctl -a /dev/sdX and save the output

2) Secure enhanced wipe https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase

3) Long SMART test

4) mkfs.ext3 -cc which just runs badblock across the drive 4X times.

6)  smartctl -a /dev/sdX and check the differences

If a drive passes all those tests, I call it good.  Any one else have any other drive checks they like to use?

----------

## szatox

 *frostschutz wrote:*   

>  *szatox wrote:*   the second parity allows the drives to vote for end result and determine which single strip is broken 
> 
> Can you confirm it's actually implemented that way? To my knowledge no such voting occurs..

 

Well, it seems that mdadm just overwrites parity, which is a shame, as it's really doing more damage than leaving that array inconsistent. I've ran a few tests on a VM (RAID6 4x500MB + an old 750 MB movie vs dd. Md5sum decided that dd won. ) overwriting ~150 MB somewhere in the middle of one drive changed the checksum. Repeating it a few times (followed by repairing raid) rendered filesystem unusable.

Anyone feels like doing that scrubbing in a sane way? Unfortunately I don't know nearly enough C to even attempt messing with kernel.

I wonder how well LVM would handle it.

----------

## DingbatCA

So I have found something....  I need to do extensive testing before I am willing to put data I care about on it.

http://www.snapraid.it/

@szatox.  I have been fighting these problems with RAID from both hardware and MD raid for a long time.  I think we need some one with amazing kernel level programming stills to implement a Reed-Solomon style parity into MD RAID. RAID-RS?

Doing a lot of testing to better understand the problem.  It looks like MD RAID just guesses.  I think there is an assumption in the RAID world that disks are perfect, or failed.  No middle ground.  This whole thread is just making me more depressed.

----------

## frostschutz

 *DingbatCA wrote:*   

> I think there is an assumption in the RAID world that disks are perfect, or failed.  No middle ground.

 

The assumption is that the disk reports errors instead of returning false data.

----------

## Akkara

 *DingbatCA wrote:*   

> I dont think I can make my current hardware much better.
> 
> Intel Xeon X3470
> 
> SuperMicro X8SI6-F ...

 I'm afraid I'm at a loss, then.  Can't get much better than what you have.

Do you think it might be the drives themselves?  There's been some models within some brands that are reported to have much higher failures than usual.  Maybe they have quiet errors too?

 *szatox wrote:*   

>  *frostschutz wrote:*    *szatox wrote:*   the second parity allows the drives to vote for end result and determine which single strip is broken 
> 
> Can you confirm it's actually implemented that way? To my knowledge no such voting occurs.. 
> 
> Well, it seems that mdadm just overwrites parity, which is a shame, as it's really doing more damage than leaving that array inconsistent. I've ran a few tests on a VM (RAID6 4x500MB + an old 750 MB movie vs dd. Md5sum decided that dd won. ) overwriting ~150 MB somewhere in the middle of one drive changed the checksum. Repeating it a few times (followed by repairing raid) rendered filesystem unusable.
> ...

 

I find this surprising, and troubling to hear.  Why wouldn't one be checking parity at all times, when it is available.  Maybe speed?  Regardless, during a rebuild, I'd expect there would be a best-efforts attempt thrown at it, checking everything regardless what the disks say.  I'm very surprised to read this might not be happening.

 *DingbatCA wrote:*   

> @szatox.  I have been fighting these problems with RAID from both hardware and MD raid for a long time.  I think we need some one with amazing kernel level programming stills to implement a Reed-Solomon style parity into MD RAID. RAID-RS?
> 
> Doing a lot of testing to better understand the problem.  It looks like MD RAID just guesses.  I think there is an assumption in the RAID world that disks are perfect, or failed.  No middle ground.  This whole thread is just making me more depressed.

 

I have the skills required to help out with the Galois-field parity-matrix code, if that is needed.  But I haven't done any linux kernel programming.

I just did a bit of searching.  It seems most of what you need is already in the kernel, in the form of the party logic for the btrfs filesystem.  In fact, this article describes some of the newer additions, seems to have everything that's needed.  Interestingly the code is by the same person as the snapraid link two posts above.

----------

## DingbatCA

 *Akkara wrote:*   

> Do you think it might be the drives themselves?  There's been some models within some brands that are reported to have much higher failures than usual.  Maybe they have quiet errors too?

 

I am happy to blame my drives.  I almost always use low quality SATA drives for home (WD Green).  At work I am using a little over 500 10K SAS drives and have the same issues, just much lower rates, and WAFL cleans up that mess. 

 *Akkara wrote:*   

> I just did a bit of searching.  It seems most of what you need is already in the kernel, in the form of the party logic for the btrfs filesystem.  In fact, this article describes some of the newer additions, seems to have everything that's needed.  Interestingly the code is by the same person as the snapraid link two posts above.

 

So I am trying to stay away from btrfs because it is too beta, and here I am suggesting we write our own version of raid...  :roll: 

I just need to get a hold of some of those perfect drives that frostschutz has.

----------

## frostschutz

 *DingbatCA wrote:*   

> I just need to get a hold of some of those perfect drives that frostschutz has.

 

Why, they're WD Greens.  :Laughing: 

Best drives I ever had, I would not call them low quality. They certainly don't have bitrot issues.

----------

## NeddySeagoon

frostschutz,

Heh WD Greens.  Two of mine in a raid5 set failed within 15 min of one another.

Still, it was only my DVD collection.  I got it all back except one 4k disc block.

The other three and the two warranty replacements are still running, although, one of the replacements had a pending sector the other day.

A repair 'fixed' that.

I should probably start migrating the drives to bigger drives to make room for more DVDs.

----------

## frostschutz

Mine are ranged between ~15k and ~25k power on hours and going... although that number no longer refers to spin time, since I added a SSD to the system and send HDDs in standby while I don't need them.

 *Quote:*   

> WD Greens. Two of mine in a raid5 set failed within 15 min of one another.

 

Regardless which brand or model, you'll find people who had theirs fail. They all do, eventually...

The question in this thread was whether they do so silently, without reporting any errors, returning bad data instead. Mine don't do that, and I'm checking for such things by validating RAID parity regularly.

 *Quote:*   

> one of the replacements had a pending sector the other day

 

I count disks as failures starting from the first reallocated/pending/uncorrectable sector. A disk with a pending sector already lost you data which you had to recover from your other disks. Losing data is not an acceptable condition for any disk, particularly in a RAID set, so it should not be trusted with important tasks/data anymore.

----------

## NeddySeagoon

 *Quote:*   

> The question in this thread was whether they do so silently, without reporting any errors, returning bad data instead. 

 

Getting back to that topic, I have never seen that, nor do I expect to.  It requires a the data and CRC read from the disk (both of which cold be in error) to match after the data stream has been through the HDD error recovery process. The probably of that is very small but still finite.

----------

## kernelOfTruth

*subscribing* - this is interesting

++

to the occuring bugs with Btrfs and new kernel releases

ECC memory (a processor, motherboard that supports it), ZFS and good hardware is the basic guarantee that

bitrot and silent corruption should not occur

----------

## szatox

 *Quote:*   

> I have the skills required to help out with the Galois-field parity-matrix code, if that is needed. But I haven't done any linux kernel programming.
> 
> I just did a bit of searching. It seems most of what you need is already in the kernel, in the form of the party logic for the btrfs filesystem. In fact, this article describes some of the newer additions, seems to have everything that's needed. Interestingly the code is by the same person as the snapraid link two posts above.

 

Perhaps it would be a good idea to merge this logic into MD. Unfortunately I still can't do that myself.

I'm gonna test how would LVM (device mapper) behave though, as those things seem to be loosely related to each other

----------

## Ant P.

Still living on a single six-year-old WD Green here, though I fixed the factory misconfiguration early on so that probably helped a bunch.

----------

## DingbatCA

I am just going to add a bit more info.  All good reads.

If you have never seen these metrics from backblaze, this is a must read.

https://www.backblaze.com/blog/hard-drive-reliability-stats-for-q2-2015/

Sorry for using a Microsoft reference here. Just some interesting info about error rates and sources.

http://research.microsoft.com/pubs/64599/tr-2005-166.pdf

And of course, the Wikipedia link:

https://en.wikipedia.org/wiki/Data_corruption#Silent_data_corruption

----------

## kernelOfTruth

Google: Failure Trends in a Large Disk Drive Population (2007)

http://static.googleusercontent.com/media/research.google.com/de//archive/disk_failures.pdf

Hardware.FR harddrive stats

http://www.hardware.fr/articles/927-6/disques-durs.html

backblaze.com stats

https://www.backblaze.com/blog/hard-drive-reliability-stats-for-q2-2015/

Also:

 *Quote:*   

> The data from Backblaze should not influence a purchasing decision by any consumer, regardless of what type of drive they are purchasing. The innumerable variables, and lack of documentation, ensures the results are unreliable. Even for the winners, the results aren't good; the failure rates are exponentially higher than those observed in the real-world. One should question whether these companies could survive financially with the massive warranty return rates in real-world scenarios.

 

http://www.tweaktown.com/articles/6028/dispelling-backblaze-s-hdd-reliability-myth-the-real-story-covered/index5.html

----------

## coldlight

 *Ant P. wrote:*   

> Still living on a single six-year-old WD Green here, though I fixed the factory misconfiguration early on so that probably helped a bunch.

 I've heard many times of WD Green drives failing in less than a half year but this is the fist time I've seen your statement. I'd really like to know what the factory misconfiguration was/is and how you fixed it.

----------

## frostschutz

Empirical Measurements of Disk Failure Rates and Error Rates 

http://arxiv.org/pdf/cs/0701166.pdf

 *Quote:*   

>  I'd really like to know what the factory misconfiguration was/is and how you fixed it. 

 

wdidle, hdparm -J, ...

I cranked up my timeout from the start as well. It has not caused any harm, whether it has helped any who knows? I think the people who do that are a minority yet there are no reports of massive early failures for the WD Green series (or we would have a nickname like the deathstar HDD).

----------

## Ant P.

 *coldlight wrote:*   

> I'd really like to know what the factory misconfiguration was/is and how you fixed it.

 

hdparm explains all:

```
-J     Get/set the Western Digital (WD) Green Drive's "idle3" timeout value.  This  timeout  controls  how

       often  the  drive parks its heads and enters a low power consumption state.  The factory default is

       eight (8) seconds, which is a very poor choice for use with Linux.  Leaving it at the default  will

       result  in  hundreds  of  thousands of head load/unload cycles in a very short period of time.  The

       drive mechanism is only rated for 300,000 to 1,000,000 cycles, so leaving it at the  default  could

       result  in  premature  failure,  not to mention the performance impact of the drive often having to

       wake-up before doing routine I/O.

       WD supply a WDIDLE3.EXE DOS utility for tweaking this setting, and  you  should  use  that  program

       instead  of  hdparm  if at all possible.  The reverse-engineered implementation in hdparm is not as

       complete as the original official program, even though it does seem to work on at  a  least  a  few

       drives.   A  full  power  cycle is required for any change in setting to take effect, regardless of

       which program is used to tweak things.

       A setting of 30 seconds is recommended for Linux use.  Permitted values are from 8 to  12  seconds,

       and  from 30 to 300 seconds in 30-second increments.  Specify a value of zero (0) to disable the WD

       idle3 timer completely (NOT RECOMMENDED!).
```

I went and used the DOS program and no longer get the delayed click of death.

----------

## coldlight

Thanks Ant P. and frostschutz.

One more question: will running WDIDLE3.EXE from XP in a VM give the same results as booting into say FreeDOS and running it?

----------

## frostschutz

hdparm works fine, plus there's sys-apps/idle3-tools, if you insist on the exe do it in freedos only.

----------

## krinn

This is why green hdd sucks balls for reliability.

The green hdd park its heads to get into sleep state, because it allow it to safely also reduce its rpm.

green hdd consume less power, generate less heat and less noise because they get into that sleep state.

This is what bug everyone, park, rpm reduce, you need to use it, rpm back to "normal" (that is always hard to get what normal rpm are for these drives) and heads move from park area to where the datas is : making a huge delay before the disk is ready to read.

And because of this rpm variation and mechanical move, the disk reliability is lower than a classic disk.

There's also another issue, when you disable that feature, because everyone gets bored fast with their performance, you'll get into another problem.

Green hdd components are design to run at low heat, because most of the time the disk should be in sleep state.

But as soon as you disable this, the heat goes closer to other disks heat (still a few less, as max rpm of green drives are below a classic drive, lol look at their rpm: intellipower, better gave a word than what really they use) http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701229.pdf

So as soon as you disable the feature, the heat goes up, and the cheap components of green drives will fail earlier than their given mtbf, because their mtbf is base on a certain heat, and your drive is running over this now.

As the mtbf is logically reduce, their afr will goes up, and certainly way over 0.8% that is an acceptable value for an hdd. (note also how they hide the afr in the specs).

You can also note that they have a base warranty of 2 years, giving you an idea how they trust the components they use in these drives...

Green hdd are slow cows with a short lifetime, but what did you expect, nobody could beat their prize!

----------

## frostschutz

 *Quote:*   

> the cheap components of green drives

 

Getting far off topic now...

Is the green actually different components than the red, purple, ... drives?

From the outside the drives look virtually identical. Same case, same PCB with same PCB layout.

From the inside (not opening mine, thank you very much, but product pictures show them) they look virtually identical too. I've been googling for HDD teardowns, whether anyone was crazy enough to open the things up and confirm differences in hardware/components (or lack thereof) but couldn't find any.

Does it even actually have different components anywhere or are they just building one type of disk, and then ship them with different labels, firmwares, ...? Only one type of hardware would probably be much better for mass production, reduce costs and maximize profits...?

Does an extra year of warranty really mean better quality or isn't it just that WD can afford it due to the more profitable price of that label? Is a year of warranty worth a +20-30€ (or whatever it is) per disk to you?

Once you wdidle-patch a Green's firmware to behave as a Red (as far as head parking is concerned) what difference left is there?

----------

## kernelOfTruth

 *frostschutz wrote:*   

> 
> 
> Once you wdidle-patch a Green's firmware to behave as a Red (as far as head parking is concerned) what difference left is there?

 

it's probably mostly the (much) better firmware and some improved components:

https://www.pugetsystems.com/labs/articles/Western-Digital-Green-vs-Red-Hard-Drives-602/

(TLER -> firmware ?, 3D Active Balance Plus -> better mass balancing, other components ?)

well, I had at least one of the newer WD Reds with much frequently increased headparking (similar to WD Green; a WD40EFRX), too

so it's mostly firmware I guess ...

Then there's these models:

 *Quote:*   

> WD Red Pro
> 
> Storage for 8 to 16 bay NAS solutions
> 
> Joining the original color of NAS, WD Red Pro continues the formula of success that has led the WD Red product family by adding support beyond consumer, SOHO, and small business markets into medium and large 8-16 bay business storage systems. WD Red Pro hard drives integrate WD’s exclusive technology, NASware™ 3.0, to provide unparalleled support for drive compatibility, reliability, and performance.
> ...

 

interesting ...

----------

## krinn

Let's compare with a red serie if you wish that.

red also use Intellipower, and as such, are for me put directly into the "sucks balls" category of drives.

why would red would be of better value than green? http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-800002.pdf

- red mtbf is given at 1.000.000, showing they are confident enough to give it

- afr is not given, but you can easy get it : 0.87% that's poor

- warranty is one year better, again, WD trust this one better than green.

- load/unload cycles is given and twice the green one. Better mechanical components.

These are ok drives for NAS if you want security over performance. They should be reliable if you disable the park feature and cool them good.

Look at RE specs: http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-800044.pdf

- 7200 rpm, this time, no lame Intellipower

- 10E15 recovery vs 10E14 (that's just one better, but one pow better!)

- mtbf is at 1.200.000 and WD even tell you operational condition : 40°C and a 550TB/year workload. If you use these drives over 40°C, it mean its mtbf will fall. Better cool them and keep them at that heat level if you want stay out of trouble.

- afr is given (you don't have to calc it), again, showing if it was a weakness they would hide it. 0.73% is a good one.

- again the 600.000 load/unload cycles

- warranty is at 5 years, again, if you are confident about the components, you will show this to your customers that don't read specs, as warranty is giving anyone a clue about your level of trust on the components and the quality you use.

That's serious NAS drives.

now Blue series: http://www.wdc.com/wdproducts/library/?id=371&type=8&cn=2879-771436

- First thing to note about blue is that little note at WD site: http://www.wdc.com/en/products/products.aspx?id=770#Tab3

 *Quote:*   

> Disclaimer
> 
>     WD is making it easier for its customers to choose the correct drive for their desktop and laptop. Over the next several months, WD Green 5400 RPM-class 3.5-inch hard drives (with up to 6TB capacity) will be merging with WD Blue 7200 RPM-class 3.5-inch hard drives (with up to 1 TB capacity) to be collectively sold under the WD Blue brand. Product availability is dependent upon each retailer. Please see the model specifications above for more information on the expanded lineup.
> 
> 

 

If you read it good, WD is bullshit telling everyone its blue serie will soon include this green crap serie in it, in order to "easier its customers to choose...".

When everyone knows why they are doing that: now that green are known to be shit, they will hide their shitty product within it so customers will buy greens rebrand blue as customers refuse to buy green anymore.

So you better really look at the blue specs you would buy, as you might buy a rebrand green!!!

From specs, you can see blue are green without intellipower. That's just "classic" drives

Because these drives are still cheap, and have some good specs (the 7200rpm version), i put them in "ok drive" category.

Even 5400rpm version is still ok for me (for backup, not NAS), you know what you buy.

But any blue that are green rebrand would fall again into the sucks ass category for me.

The black specs: http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771434.pdf

- 10E14 error

- 5 years warranty (we jump from 2 years to 5 years for this class)... black aren't pussies for WD. So even without mtbf/afr they look good.

That should be good drives.

And velociraptor : http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701284.pdf

- 10E16 recovery(!)

- 5 years warranty

That's badass category. But better use them in cooled case (+rpm == +heat == -mtbf)

edit: didn't look at red pro until i saw kernelOfTruth message

Same as red, but no intellipower, error recovery raise to 10E15 and that 5 years warranty WD offer with its quality products.

Now that's what i would call real NAS drives.

----------

## Buffoon

I have reds for my NAS, they run cool (less than 36C without extra cooling) and the load count is acceptable, less than hours powered up, all factory settings.

----------

## coldlight

 *frostschutz wrote:*   

> hdparm works fine, plus there's sys-apps/idle3-tools, if you insist on the exe do it in freedos only.

 

Thank you. I've got much to learn.

----------

