# Samba + sleeping drives. Can we wake up faster? Solved

## DingbatCA

Success!! 

Posted this to the beginning of the thread in order to guide all those who follow.

Why?

I wanted to put my DM software based RAID6 to sleep when not in use. At 10 watts per drive, it adds up! I did not want to wait 10 seconds per drive, in series for the array to come to life. I was tired of my windows desktop hanging while waiting for a simple directory look up on my NAS.

Disclaimer:

Do not come crying to me when you destroy a hard drive, loose all your data, fry a power supply, or cause a small country to be erased from the face of the Earth.

The key points covered below:

Drive Controller

Bcache

Inotify

Drive Controller

My server/NAS was running 3X LSI SAS 1068e controllers to control my 7 drives RAID 6.  Turns out that the cards are hard coded to spin up in series.  No way to get around it, it just is.  This happens to apply to ANY card running the LSI 1068e chipset, such as a Dell Perc 6/i, or HP P400. This may even apply to all LSI based cards.  To make matters worse, the cards are smart and will only spin one drive up at a time across all 3 cards.  My 7 disk RAID 6 was taking 50 seconds to spin up (10 seconds per drive).  This was dropped to 40 seconds when I moved 1 drives to the on board SATA controller.  That was my first clue.  Thanks to the Linux-Raid group mailing list for the help isolating this one.

So I was on the Internets looking for a new, cheap, 12~16 port SATAII controller card.  I found a very strange card on ebay.  A "Ciprico Inc. RAIDCore" 16-port card.  I cant even find any good pictures or links to add to this post so you can see it.  It basically has 4 Marvell controllers and a pci-e bridge strapped onto a single card.  No brains, no nothing.  Just a pure, dumb controller with out any spin up stupidity. Same chipset (88SE6445) found on some RocketRAID cards. It was EXACTLY what I was looking for.  At a cost of $60 I was thrilled.  In Linux is shows up as a bridge + controller chips:

```
07:00.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)

08:02.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)

08:03.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)

08:04.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)

08:05.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)

09:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)

0a:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)

0b:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)

0c:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)
```

Bcache https://www.kernel.org/doc/Documentation/bcache.txt

Now that I have the total spin up time down from 50 seconds ((number_of_drives *10) -2) to 10 seconds. I was able to address the reaming 10 seconds using caching.  In this case I am using bcache. My operating system disks 2X are OCZ Deneva 240GB SSD's set up in a basic mirror.  I partitioned these drives out and used 24GB's as a caching device for my raid.  Quickly found out that bcache is unstable on the 3.16 kernel and was forced back to the 3.14lts kernel.  After I landed on the 3.14.15 kernel everything is running great.  The basic bcache setting work, but I wanted more:

```
#Setup bcache just the way I like it, hun-hun, hun-hun

#Get involved in read and write activities

echo "writeback" > /sys/block/bcache0/bcache/cache_mode

#Allow the bcache to put data in the cache, but get it out as fast as possible

echo "0" > /sys/block/bcache0/bcache/writeback_percent

echo "0" > /sys/block/bcache0/bcache/writeback_delay

echo $((16*1024)) > /sys/block/bcache0/bcache/writeback_rate

#Clean up jerky read performance on file that have never been cached.

echo "16M" > /sys/block/bcache0/bcache/readahead
```

I put all the above code in rc.local so my system picks them up on boot.  Writes still need to wake the array, but reads from cache don't even wake up the drives.

```
root@nas:/data# time (dd if=/dev/zero of=foo.dd bs=4096k count=16 ; sync)

16+0 records in

16+0 records out

67108864 bytes (67 MB) copied, 0.0963405 s, 697 MB/s

real    0m10.656s  #######Array spin up time#########

user    0m0.000s

sys     0m0.128s

root@nas:~# ./sleeping_raid_status.sh

/dev/sdc standby

...

/dev/sdd standby

root@nas:/data#  time (dd if=foo.dd of=/dev/null iflag=direct)

131072+0 records in

131072+0 records out

67108864 bytes (67 MB) copied, 0.118975 s, 564 MB/s

real    0m0.121s  ########Array never even woke up#########

user    0m0.024s

sys     0m0.096s

root@nas:~# ./sleeping_raid_status.sh

/dev/sdc standby

/dev/sdj standby

...
```

Inotify

Wait...  The array did not spin up because it read from cache?!  Not good, but working exactly as expected.  I have the file metadata in cache, but what happens when I want to read the file...  10 seconds later...  Normally when I find a media file, I want to read/watch/listen to it.  I accessed the metadata; preemptive spin up?  Time for a fun script using inotify.

I actually took this script one step further then just preemptive spin up and have it do all drive power management.  Turns out different drive manufactures interpret `hdparm -S 84 $DRIVE` (Go to sleep in 7m) differently. This whole NAS was built on the cheap and I have 4 different types of drives in my array.

```
#!/bin/bash

WATCH_PATH="/data"

ARRAY_NAME="data"

SLEEPING_TIME_S="600"

ARRAY=`ls -la /dev/md/$ARRAY_NAME | awk -F"../" '{print $5}'`

PARTS=`ls /sys/block/$ARRAY/slaves | sed 's/[^a-z]*//g'`

set -m

while [ 1 ];do

  inotifywait $WATCH_PATH -qq -t $SLEEPING_TIME_S

  if [ $? = "0" ];then

    #echo -n "Start waking: "

    for i in $PARTS; do

      (hdparm -S 0 /dev/$i) &

    done

    #echo "Done"

  else

    #echo -n "Make go sleep: "

    for i in $PARTS; do

      STATE=`hdparm -C /dev/$i | grep "drive state is" | awk '{print $4}'`

      #Really should check that the array is not doing something block related, like a check or rebuild

      if [ "$STATE" != "standby" ];then

        hdparm -y /dev/$i > /dev/null 2>&1

      fi

    done

    #echo "Done"

  fi

  sleep 1s

done

```

A few other key points have been addressed in this thread.  There is much greater details in the below posts:

Spinning drives up/down puts wear on drives, but it is more cost effective to sleep the drives and wear them out then it is to pay for the power.

Spinning up X drives at once puts a huge load on the PSU (Power Supply Unit).  According to Western Digital, their 7200RPM drives spike at 30 watts during spin up. You have been warned.

Warning, formatting a drive for bcache will remove ALL your data.  There is no way to remove bcache with out reformatting the device.

5400RPM drives take about 10 seconds to spin up.  7200RPM take about 14 seconds to spin up.

####Original starting post####

Everything is working as expected, which is really frustrating.  

I have a home NAS with 6X 2TB drives in a software RAID 6 configuration. The array is formatted with XFS and holds all my media, such as movies and music.  After 7m all my drives fall asleep. I don't want to run 6X drives at 10W each 24/7.

```
#Sleep time in inc's of 5s. (84*5)/60=7m

for disks in `ls -1 /dev/sd?`

do

  hdparm -S 84 $disks

done

```

I share out all my media to my windows, or android systems with Samba.  When I first go to access my media everything hangs (In windows, or android) for about 15~30s while the drives spin up.

```
# Global parameters

[global]

log file = /var/log/samba/log.%m

server string = nas

workgroup = lan

max log size = 50

read raw = yes

write raw = yes

#Showing up on the network

local master = yes

os level = 255

preferred master = yes

[Media]

path = /home/public/Media

mangled names = no

read only = no

```

Is there any way to mitigate, mask, or cache the drives so the spin up time does not seem so painful?Last edited by DingbatCA on Thu Aug 28, 2014 8:55 pm; edited 1 time in total

----------

## eccerr0r

I think you're pretty much asking for two conflicting desires.  If the data you want is not in cache, it has to spin up the disks which means you wait.  So pretty much if you don't want to wait, keep the disks spinning or keep the data you want frequently on a disk that remains spinning.

(If your PSU is very hefty, I don't know if there's a way to get mdraid to simultaneously spin up all disks, as currently it will stagger spin - which is much less wear and tear on your system.)

I end up having to run my 4x500GB RAID5 spun up 24/7 since it's being used so randomly, albeit lightly - the the spin up/down will get annoying as well as start eating into the lifetime of the disks.  Which may or may not be the case for you...

----------

## DingbatCA

I very much so have two conflicting desires.  

My PSU has the power.  I am running a Ablecom SP762-TS which is a 3 way redundant power supply.  My whole system is a re-purposed server.

I was not aware that mdraid did a staggered spin up, by default.  I will hunt around and see if I can find out how to disable/adjust that.

----------

## DingbatCA

This is just strange.  So I wrote a simple script to look at the state of my drives as they spin up:

```
while [ 1 ]; do

date

hdparm -C /dev/sdb1 | grep "drive state"

hdparm -C /dev/sdc1 | grep "drive state"

hdparm -C /dev/sdd1 | grep "drive state"

hdparm -C /dev/sde1 | grep "drive state"

hdparm -C /dev/sdf1 | grep "drive state"

hdparm -C /dev/sdj1 | grep "drive state"

hdparm -C /dev/sdi1 | grep "drive state"

sleep 0.1

done 
```

But it looks like hdparm freezes when the drives go to spin up.  Nothing in the logs about it.

```
Sun Aug 10 14:34:34 PDT 2014

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

Sun Aug 10 14:34:34 PDT 2014

 drive state is:  standby

 drive state is:  active/idle

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

Sun Aug 10 14:34:44 PDT 2014

 drive state is:  standby

 drive state is:  active/idle

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

 drive state is:  standby

 drive state is:  active/idle

Sun Aug 10 14:35:01 PDT 2014

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

Sun Aug 10 14:35:20 PDT 2014

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle

 drive state is:  active/idle
```

I might have to take this question over to the mdraid guys for help.

----------

## eccerr0r

I think the IDE commands are serialized so yes they will stop when there's an outstanding request to spin the disk...

Also it is possible for two disks to spin up but eventually all need to be spun up.

I have to say it's not "staggered" but rather "serialized" - it will fetch from the disks as needed but this has the effect of staggered startup as getting all the requests out at the same time isn't likely...

Also keep in mind "server quality" means "24/7 99.999% availability" not "spin up spin down as needed" - so you are still using it in an unintended manner :D

----------

## DingbatCA

Wow.  If it is truly "serialized" than this is going to become a big problem as I add more drives to grow the array.  Any way of caching the file system's metadata? Trying to give the drives time to spin up in the background with out completely hanging the clients request.

----------

## Cyker

The problem I found was that even if you had gigantic caches, everything would still hang as soon as you requested something outside the cache, as requests that hit the cache don't necessarily wake the disks up.

I have yet to find a nice way around what you describe.

In the end I just got some low-RPM WD Greens and just let them stay spinning! They automatically park the heads when not in use but keep the disks spinning, so you save some power while idling, albeit not as much as a fully sleeping drive, but obv the recovery is a lot faster!

(On a slight tangent, I recently switched to the newer Reds; They run 15C cooler and draw slightly less power vs the 1st gen Greens!)

----------

## DingbatCA

Time to have some fun!  This is Linux, we can solve this.

My main array with 7X 2TB Western Digital Caviar Green drives.  I have two other arrays in the same system.  The OS array is a mirror running 2X OCZ Deneva 240GB SSD's. The archive array is a mirror running 2X Hitachi Deskstar 7K500 with btrfs and compression.

What type of gigantic caches were you able to put in place? Here is my idea.  Setup inotify to watch the cache. When it is accessed, start all disks in the array.  This falls apart if the cache cant be watched by inotify, like the generic system cache.  Or if the cache is global, and not per array.  In a worst case scenario, this trick might be employed against the array its self to start all drives up in parallel, but that would only save a few seconds.

----------

## Cyker

That's the spirit!  :Very Happy: 

Well 'gigantic' was about 2GB on my old server  :Laughing: . I haven't played with it much on my new one (Currently the cache is 12GB  :Laughing:  ) since all the disks just spin perpetually (I find running a torrent server with 400+ seeds keeps it busy and random enough that it never gets to sleep!)

One thing to watch out for is that the IO system tends to block while it waits for the disk to spin up. I know the Explorer threads on my Windows machines would lock up until any sleeping disks woke up and started doing Samba's bidding.

I just had a thought tho' - IIRC Linux 'recently' added the ability to use other devices as an intermediary cache; I wonder if you could set up a small fast SSD as an intermediary cache - Theoretically it would be easier to monitor that for access than the cache in RAM? - and then use that to trigger the disk wakeup?

----------

## DingbatCA

I have my OS on 2X 240GB SSD's.  There are lots of ways I can cut a chunk of SSD out for an intermediary cache. I think, in this case, you are referring to bcache (http://en.wikipedia.org/wiki/Bcache).

A RAM based cache also works, as long as it is treated as read only. Dont want a power outage causing loss of data or corruption. 

I have the RAM, or the SSD storage.  I would rather use a non-persistent RAM cache.  Something like a cache in tmpfs.

Using the SSD mirror works, but kinda defeats the point of my RAID 6. 

Ideas?

----------

## eccerr0r

I'd just say, just keep the disks spinning and at least allow the heads to unload, at least you'll get some saving there.  The I/O blocking is indeed very annoying during interactive use.

No matter how big your cache, chances are, you'll always be fetching something that's not in cache (why would you be reading the same thing over and over again?)...

(As a side issue, I hate my raid5, IOPS is awful for some reason or another... the drives I have are not blacks or reds, I have two WD "blue" and three Hitachi disks in my 4+1 hotspare system and it bogs down badly during nfs use...)

----------

## DingbatCA

Dont want to be burning 70 watts of power 24/7.  Or at least that is the power draw of my 7 disks when spinning according to my power strip.

The cache would only be in place to see read requests.  I am really only after caching the FS metadata.  This comes into play when I walk the file system from windows.  I need to get to the correct directory before I can watch a movie/listen to music.  I am tired of windows explorer hanging until the drives spin up.  In this case I think a cache of 16mb would be plenty!  But I can fling GB's at it.

RAID5 perf.  In the case of Linux software raid, you really need 4 drives to get the equivalent of 1 solo drives speed.  This is do to the fact that there is no write-through cache capabilities.  Raid 6, requires 5 drives before you will get the equivalent performance of 1 stand alone drive.  Most of the time the drives are not even the problem.  People like to run to many drives on a slow PCI interface.  There is also one basic tweak that most n00b's forget to set.  Stripe cache size.  If it is default, your system will run like trash.

```
root@nas:~# cd /sys/block/md125/md

root@nas:/sys/block/md125/md# cat stripe_cache_size

256

root@nas:/sys/block/md125/md# echo $((16*1024)) > stripe_cache_size

root@nas:/sys/block/md125/md# cat stripe_cache_size

16384
```

And if you really want to have fun, watch the cache during a big write.

```
root@nas:~# cd /sys/block/md125/md

watch -n 0.1  cat stripe_cache_active
```

But Linux software RAID performance is a very large subject that should be on a different thread.

----------

## Cyker

Yea, I remember messing around with a bunch of settings to try and speed up my old mdadm RAID5.

I had stuff like this in my local.start for a while  :Laughing: 

```

blockdev --setra 8192 /dev/md0

blockdev --setra 2048 /dev/sda /dev/sdb /dev/sdc /dev/sdd

echo 8192 > /sys/block/md0/md/stripe_cache_size

```

btrfs RAID5 speed seems to be pretty good; I can hit 100MB/s (!!?!) on each RAID element whereas before I'd be lucky to get 150MB/s off the whole array! Beefier CPU and faster bus probably helps, but I also suspect btrfs isn't actually doing real RAID5 at the moment...  :Sad: 

I forgot about tmpfs; That should work!

I wonder if caching the metadata will be enough tho' if this is to avoid pausing in Windows - Windows doesn't just pull directory table data, but is like the bloatier Linux DE's in that it reads the contents of a lot of the files it touches to generate previews and thumbnails.

That said, I think they split that off into a worker thread in Vista+ so you might be able to get away with it...

Come to think about it, doesn't Linux already prioritise caching the file tables?

Maybe it'd be easier to just set the spindown for like, an hour or two, then it'll spin down when you aren't using it, but stay spinning when you are?

----------

## DingbatCA

As a rule, when I am using my array it does not spin down.  The primary job of the array is media (Music and Movies).  In the case of a movie there is almost always disk IO going on.  If I set the spin down for 7m or 2 hours it wont really help.

I am good with tmpfs and building the inotify scrip but I don't know how to build the metadata cache..  Can you point me in the right direction?

I wish btrfs RAID5/6 was more stable. :-(

----------

## DingbatCA

Just adding some more info.  Spin up takes about 9.6 seconds.  Need at least 5, of the 7 drives spinning to access data.  9.5 x 5 = 48 seconds. I need to find a fix for this...  When I fill my drive cage with 15-2 drive the spin up time will be 125s.  OUCH!!!

```
root@nas:/data# smartctl -a /dev/sdd | grep Spin_Up

  3 Spin_Up_Time            0x0027   150   137   021    Pre-fail  Always       -       9608

root@nas:/data# time (touch foo ; sync)

real    0m49.004s

user    0m0.000s

sys     0m0.004s

root@nas:/data# time (touch foo ; sync)

real    0m50.647s

user    0m0.000s

sys     0m0.008s

root@nas:/data# df -h /data

Filesystem      Size  Used Avail Use% Mounted on

/dev/md125      9.1T  3.8T  5.4T  42% /data

root@nas:/data# mdadm -D /dev/md125

/dev/md125:

        Version : 1.2

  Creation Time : Wed Jun 18 07:54:38 2014

     Raid Level : raid6

     Array Size : 9766909440 (9314.45 GiB 10001.32 GB)

  Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)

   Raid Devices : 7

  Total Devices : 7

    Persistence : Superblock is persistent

    Update Time : Mon Aug 11 16:30:16 2014

          State : clean

 Active Devices : 7

Working Devices : 7

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 512K

           Name : nas:data  (local to host nas)

           UUID : 74f9ce7a:df1c2698:c8ec7259:5fdb2618

         Events : 1038642

    Number   Major   Minor   RaidDevice State

       0       8       17        0      active sync   /dev/sdb1

       1       8       33        1      active sync   /dev/sdc1

       3       8       49        2      active sync   /dev/sdd1

       4       8       65        3      active sync   /dev/sde1

       5       8       81        4      active sync   /dev/sdf1

       7       8      145        5      active sync   /dev/sdj1

       6       8      129        6      active sync   /dev/sdi1
```

----------

## eccerr0r

I don't find the power draw a big deal, then again I only have four disks and service requests are not only local, so I can't control who powers the disks up.   I've been running a RAID5's for quite a while now, though I was running an Athlon as the server CPU, now I'm running a Core2 Quad.  Mostly as this machine is a shell box/VM server/webserver/mailserver.  I have another machine that far exceeds the power draw of these disks... and another machine with just its GPU eat more power than the HDDs.

The problem with any cache is that it's still LRU and if you use the cache enough it will discard from the cache.  I don't think there is a metadata-only cache available... that would be interesting but potentially wasteful.

Perhaps something easier is just to monitor the network, if you see a SMB packet come by and the disks are sleeping, go ahead and try to spin all of the disks up?

Maybe another way is to break up your raid so you don't have to pay the penalty for spinning up all disks when you only need to use one volume?  Then again this complicates other things...

All of my RAID members are on an ICH10 onboard PCIe SATA 3Gbit.  Disk sequential read is fine on the server, it's on the order of around 2-3x of a single disk speed (around 150 MB/sec), but random i/o over NFS is awful - even if it's NFS to a VM on the same machine.  And yeah I was setting the read ahead and stripe cache larger.  The readahead and stripe size (64K) may actually be hurting the performance of small files - I recall my 32K stripe system marginally better than the 64K stripe setup, but it definitely helped hdparm -t /dev/md1 speeds...

----------

## DingbatCA

Still trying to find a good solution for getting access to the data faster.  I think I am getting close to an acceptable solution.  I asked for help from the linux-raid mailing list and Larkin was kind enough to give me the idea of writing a daemon that controls all sleeping/waking of the array.

So I am currently just playing with ideas.

```
root@nas:~# inotifywait /data/

Setting up watches.

Watches established.

/data/ OPEN foo
```

This worked.  The second I touch a file on the array it responds. Even though the array does not respond for 50 seconds.  I will roll this into a script/daemon in the morning that will keep track of the arrays activity and, most importantly, issue sleep/wake command in parallel, NOT serial.

----------

## DingbatCA

As far as wear and tear on the disks. Yes, starting and stopping the drives shortens their life span. I don't trust my disks, regardless of starting/stopping, that is why I run RAID 6. 

Lets say I use my NAS with it's 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive.  The current price for power in my area is $0.11 per kilowatt-hour. That comes out to be $5.62 per year to run my drives for 2 hours, daily.  But if I run my drives 24/7 it would cost me $67.45/year.  Basically it would cost me an extra $61.83/year to run the drives 24/7.  The 2TB 5400RPM SATA drives I have been picking up from local surplus, or auction websites are costing me $40~$50, including shipping and tax.  In other words I could buy a new disk every 8~10 months to replace failures and it would be the same cost. Drives don't fail that fast, even if I was start/stopping them 10 times daily. This is also completely ignoring the fact that drive prices are failing.  Sorry to disappoint, but I am going to spin down my array and save some money.

----------

## eccerr0r

But is it worth ripping your hairs out getting annoyed at waiting for the disks? :D

It's a quality of life issue really then.  Replace a disk every year or not have to be annoyed at disk spinup - always available.

I think it's the same cost either way really.  Well, for me at least as I don't have as many disks.

----------

## Cyker

Well it's definitely not worth the zots required to do this, but it is a fun little experiment  :Smile: 

Who knows, we might see a paper on The DingbatCA Early Pre-emptive Midline-Storage Wakeup Algorithm in the future  :Very Happy: 

It'll be cool to see you come up with and how well it performs!

The ionotify thingy looks to be a good start; The next tricky bit will be caching enough stuff to give the disks time to spin up,

I wonder, if you can cache the filesystem metadata entirely, but also have some sort of learning predictor cache that tries to spot access patterns in order to cache enough relevant stuff to give the array time to spin up.

This really is the sort of thing that a Linux hacker should be doing for a final year project or something  :Laughing: 

----------

## DingbatCA

I am good with waiting for 10 seconds. With a little bit of caching I could mitigate that; if I can get the array to spin up as one unit.  But I agree with eccerr0r that my quality of life is not worth waiting a minute every single time I want to use the array.  Most of the media devices I have connected to the array will fail before waiting that long.

So, back to hacking, and my latest problem.  Inotify work perfectly and responds with in 0.01 seconds of my array being accessed (Watching the mount point /data).  But I can not get the disks to spin up in parallel.

```
root@nas:~# hdparm -C /dev/sdh /dev/sdg                                         

/dev/sdh:

 drive state is:  standby

/dev/sdg:

 drive state is:  standby

#Two terminal windows dd'ing sdg and sdh.

root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s

real   0m28.139s  ############# WHY?! ################

user   0m0.000s

sys   0m0.000s

#A single drive spin-up

root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s

real   0m14.424s

user   0m0.000s

sys   0m0.000s
```

I need a way to spin up the drives with an ATA command, not through the Linux block layer.  This is starting to feel like I am running into a problem with the kernel it's self?!

----------

## Cyker

Possibly relevant?

http://linux.slashdot.org/story/14/04/12/1833244/linux-315-will-suspend-resume-much-faster

Also, what's your PSU like? HDD spinups, esp. 3.5" disks, have a surprisingly high amp draw and I'm slightly concerned your PSU might blow if it gets repeatedly spiked like that...!

----------

## DingbatCA

Well. Thanks for the tip Cyker!

 *Quote:*   

> The Linux 3.15 kernel ... ensured the kernel is no longer blocked by waiting for ATA devices to resume.

 

Power is not an issue.

 *Quote:*   

> I am running a Ablecom SP762-TS which is a 3 way redundant power supply. My whole system is a re-purposed server.

 

Off to play with a new kernel. I will report back soon.

----------

## DingbatCA

Running the shiny new 3.16 kernel.

```
root@nas:~# time dd if=/dev/sdg of=/dev/null bs=4096 count=1 iflag=direct

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 13.8612 s, 0.3 kB/s

real   0m27.819s  #################Still blocking###########

user   0m0.000s

sys   0m0.000s
```

I also tried this same test against my 7 disk array.

```
#Two terminal windows dd'ing sdg and sdh.

root@nas:~# time dd if=/dev/md125 of=/dev/null bs=512k count=7 iflag=direct

7+0 records in

7+0 records out

3670016 bytes (3.7 MB) copied, 47.8668 s, 76.7 kB/s

real   0m47.869s

user   0m0.004s

sys   0m0.000s
```

Failure is always an option? :-(

----------

## John R. Graham

There should be a non-blocking ioctl that could be issued against all drives to spin them up. Let me do some experimentation on my 4-drive RAID5 setup.

- John

----------

## DingbatCA

John, I hope you are fairing better then I.

From the dd man page.

 *Quote:*   

>        nonblock
> 
>               use non-blocking I/O

 

And my normal test:

```
#Two terminal windows dd'ing sdg and sdh. 

root@nas:~# time dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct,nonblock

hdparm -C /dev/sdh /dev/sdg

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 28.1493 s, 0.1 kB/s

real    0m28.151s #################Still blocking, single drive = 14s ###########

user    0m0.000s

sys     0m0.000s
```

----------

## DingbatCA

Victory will be mine!!!!!!

```
#Normal two drive spin up test.

root@nas:~# time hdparm --dco-identify /dev/sdh

/dev/sdh:

DCO Revision: 0x0001

The following features can be selectively disabled via DCO:

   Transfer modes:

       mdma0 mdma1 mdma2

       udma0 udma1 udma2 udma3 udma4 udma5 udma6(?)

   Real max sectors: 976773168

   ATA command/feature sets:

       SMART self_test error_log security PUIS AAM HPA 48_bit

       (?): streaming FUA selective_test

   SATA command/feature sets:

       (?): NCQ NZ_buffer_offsets interface_power_management SSP

real   0m14.246s  ##########MOHAHAHAHAHAHAHA##############

user   0m0.000s

sys   0m0.000s
```

I have found my non-blocking IO command.  Now to just finish out my script.

----------

## DingbatCA

Awww crap...  :-(    The system cached --dco-identify and no longer fetches the information new.  Thus no drive dont wake up.

I also found the following option with hdparm.  But it to is blocked.  Back to the drawing board.

```
hdparm --read-sector
```

----------

## DingbatCA

PROBLEM FOUND!!!!!

Hummm... A controller issue?

```
lspci | grep LSI

07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 02)

09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)

0b:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 02)

lspci | grep -i sata  (On-board)

00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA IDE Controller (rev 09)
```

All but 1 of my drives are run through my 3X 4-port LSI cards. /dev/sdb is running through the onboard Intel SATA controller. Each drive takes 10 seconds to spin up. With a 7 disk RAID 6, I would expect a read/write to succeed 50 seconds (5 drives) after the request.  But on my system it always takes 40 seconds?!

Quick test. 

```
sdb & sdc at the same time (Intel + LSI):

root@nas:~/dm_drive_sleeper# time (dd if=/dev/sdc of=/dev/null bs=512k

count=16 iflag=direct)

16+0 records in

16+0 records out

8388608 bytes (8.4 MB) copied, 10.2006 s, 822 kB/s

real 0m10.202s   #####Expected#####

user 0m0.000s

sys 0m0.000s

sdf & sde at the same time (LSI + LSI):

root@nas:~/dm_drive_sleeper# time (dd if=/dev/sdf of=/dev/null bs=512k count=16 iflag=direct)

16+0 records in

16+0 records out

8388608 bytes (8.4 MB) copied, 10.2417 s, 819 kB/s

real 0m20.208s  ######Blocked######

user 0m0.000s

sys 0m0.000s
```

I can blame the LSI cards!??!?   I have been looking for an excuse to upgrade, and now I have it!

In other news, I owe Larkin from the Linux-raid mailing group a beer/coffee/tea for pointing me in the right direction.

----------

## DingbatCA

Found a spin up delay in the LSI firmware.  It was set to 2, I set it to 0.  Does not look like it has any effect on the spin up problem.  I have the latest firmware on all 3 cards. At this point I think I need new, non-LSI controller cards. :-(

At this point I am stuck with the spin up problem.  I will find a new card some time in the next few weeks.

Still hunting for a good caching solution.

----------

## madchaz

You might want to move your OS to the array and have a look at this. 

https://github.com/facebook/flashcache/

----------

## DingbatCA

Good link Madchaz.  After poking around a bit I found this:

https://wiki.archlinux.org/index.php/EnhanceIO

Looks like I might need to shuffle around my OS and main storage array to make some partitions for testing.

----------

## DingbatCA

Updated the first post.

----------

## Cyker

Woah, major kudos dude!  :Very Happy: 

Glad your perseverance paid off!

----------

## DingbatCA

Well this sucks:

```
root@nas:~# inotifywait /data

Setting up watches.

Watches established.

#From a different window:

touch /data/home/adam/foo

```

Turns out that that inotifywait watches the directory and not the inodes.  Now what?

----------

