# File system freezes

## Niethi

Hello

Since some time I am encountering file system lags on my /home/* directories resulting in applications freezes which can last from several seconds to minutes.

And it seems to get worse as I use the system ...   :Twisted Evil: 

My actual configuration uses the 3.0.6-gentoo Linux kernel. 

/home is a ext4 file system inside an dmcrypt container locate on a 1.5TB WD Caviar green HDD.

The file system is mounted with the options "noatime,nodev"

My studies showed neither under top nor under iotop suspicious user applications.

But there is a process "jbd2/dm-5-8" coming up every 2-5 seconds. As I found out this belongs to the ext4 journaling system.

That would not bother me as metalog also shows up frequently but dumping a list of kernel tasks that are in uninterruptable (blocked) state shows also an entry for jbd2. 

For me it seems now that there must be something very bad going on in the kernel (modules) as I see a lot of "sleep_on_buffer", "wait_on_buffer" and "io_schedule" calls in the call stacks.

Unfortunately I am totally stumped, now.

So I would be very happy if anyone had some hints how to analyze the problem in more detail or may come with a solution.   :Smile: 

----------

## Hu

Some WD Caviar Green drives are 4K sector drives.  Is your drive one of those?  Are your partitions laid out to handle this?  Have you disabled the drive's 8 second spindown?

----------

## Geizeskrank

Hi Niethi,

take a look to "###%wa" in top.

I`ve a similar fault and my %wa goes up to 100%.

----------

## Niethi

Hi and thanks a lot for the replies so far.   :Smile: 

I already checked most of this but double checking can't hurt anyway. So here's all the info.  :Rolling Eyes: 

As far as I see my 1.5TB WD has 512B sectors and therefore should be aligned without any additional effort:

```

# hdparm -I /dev/sdb | grep size

        Logical/Physical Sector size:           512 bytes

        device size with M = 1024*1024:     1430799 MBytes

        device size with M = 1000*1000:     1500301 MBytes (1500 GB)

        cache/buffer size  = unknown

#

# fdisk -l /dev/sdb

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes

255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x939bc474

   Device Boot      Start         End      Blocks   Id  System

/dev/sdb1              63  1953503999   976751968+  83  Linux

/dev/sdb2      1953504000  2930272064   488384032+  83  Linux

```

Now I tried also to disable the disk spindown using

```

# hdparm -S 0 /dev/sdb

/dev/sdb:

 setting standby to 0 (off)

```

But unfortunately this does not seem to work.

Lags are as bad as before. Starting a file system sync during a freeze results regularly in something like this:

```

$ time sync

real    1m27.163s

user    0m0.000s

sys     0m0.037s

```

 :Evil or Very Mad: 

To complete this post I also see 80% to 98% of wa in top. Did not look at this value up to now. Really good hint!  :Smile: 

Looking at the output of iotop at the same time shows

```

Total DISK READ: 0.00 B/s | Total DISK WRITE: 61.10 K/s

  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND

 3220 be/3 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [jbd2/dm-5-8]

```

So what could I try now to track down the cause for these annoying lags?

----------

## Hu

 *Niethi wrote:*   

> As far as I see my 1.5TB WD has 512B sectors and therefore should be aligned without any additional effort:

 Unfortunately, this is not a valid assumption.  WD shipped some green drives with 4K sectors, but firmware that lied and claimed to have 512B sectors.  Use hdparm or smartctl to look up the drive model and research online whether that model has 4K sectors.

----------

## Geizeskrank

Hello,

your partition /sdb1 starts at 63, so I think that this is not align.

You can also check this with parted´s align-check "opt".

Your wa from 80% to 100% is alike my problem.

https://forums.gentoo.org/viewtopic-t-902410-highlight-.html

edit:

Hallo Niethi,

hab grad gesehen das du der Deutschen Sprache mächtig bist  :Wink: 

Ich hab das problem bei meinen beiden SATA Festplatten, dass diese auch immer stocken bei exzessivem Dateizugriff.

Bin grad in dem oben genannten Thread zugange um eine Lösung des Problems zu finden, vllt. läuft es bei uns beiden aufs gleiche hinaus.

Guck doch mal in dem Moment oder kurz danach in dmesg was er dir da rauswirft.

Dass deine Partition bei 63 startet ist schonmal ein schlechtes Zeichen, es sei denn du hast den Jumper an der Festplatte für den Firmware Hack gesetzt.

Ich denke aber nicht das es an dem Partitionsalignment liegt...

----------

## Hu

Starting at 63 is fine if the OP has a 512B sector drive.  If he has a 4K sector drive that claims to be 512B, then starting at 63 will cause performance problems.

----------

## Geizeskrank

It was my mine that all of the greeb series has 4K sectors.

----------

## Niethi

Thanks a lot for the response.

At them moment I am preparing some more things for further tests. 

* Bonnie++ tests for - hopefully - reproducibility and comparability of results

* Modifications in ext4 journal sync times

* Switching off ext4 write barriers

* Switching back to ext3 

* Re-partitioning the HDD with proper alignment

* Try different dm-crypt payload offsets

If you have any more suggestions please let me know.

----------

## Niethi

Bonnie++ tests:

I run Bonnie++ 3 times using the following command

```
/usr/sbin/bonnie++ -b -q
```

on 3 different file system / HDDs combinations:

(a) dm-crypt ext4 on WDC WD15EADS-00P8B0  (the lagging partition)

(b) dm-crypt ext3 on WDC WD5000AAKS-00TMA0

(c) ext3 on WDC WD5000AAKS-00YGA0

Here the results:

```
Version      1.96   ------Sequential Output------ --Sequential Input- --Random-

                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

(a)              8G   358  98 72682  13  9222   2  1247  99 98350   9 105.8   3

Latency             23131us    3951ms    7248ms   22196us   21951us    1093ms

(a)              8G   363  98 74140  13 36779   8  1325  99 95791   9 103.1   3

Latency             22876us    3488ms    1619ms   20139us   17166us     955ms

(a)              8G   350  98 74259  13 36183   7  1348  99 94467   9 102.7   1

Latency             24612us    2913ms    1977ms   17115us     118ms     831ms

(b)              8G   350  98 61572  20 33352   7  1472  99 91161   9 140.0   2

Latency             23082us    3892ms    1860ms   16392us     134ms     270ms

(b)              8G   346  98 61344  20 33475   7  1444  99 91608   9 137.9   2

Latency             23375us    3673ms    1962ms   19063us   86970us     304ms

(b)              8G   373  98 62351  19 33265   7  1615  99 91616   9 135.0   4

Latency             22152us    3191ms    2292ms   16349us   86163us     254ms

(c)           3528M   370  99 62589  40 32069  16  1292  91 89881  20 234.6   8

Latency             46937us     514ms     455ms     130ms     218ms     507ms

(c)           3528M   405  99 62859  35 31422  14  1307  89 93087  20 236.3   8

Latency             44193us     592ms     799ms     169ms     119ms     480ms

(c)           3528M   394  99 63360  36 32003  15  1201  86 93577  21 239.2   8

Latency             43818us     558ms     440ms     188ms     126ms     471ms

                    ------Sequential Create------ --------Random Create--------

                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

(a)              16    16   0 +++++ +++    25   0    17   0 +++++ +++    25   0

Latency              3204ms     799us     528ms     588ms      62us     660ms

(a)              16    16   0 +++++ +++    25   0    17   0 +++++ +++    25   0

Latency              1800ms     864us     564ms     564ms      62us     576ms

(a)              16     8   0 +++++ +++    25   0    17   0 +++++ +++    25   0

Latency              7272ms     825us    3360ms     564ms      26us     744ms

(b)              16    39   0 +++++ +++    58   0    39   0 +++++ +++    58   0

Latency               425ms     897us     175ms     125ms      72us     169ms

(b)              16    39   0 +++++ +++    58   0    39   0 +++++ +++    58   0

Latency               376ms     899us     169ms     116ms      15us     175ms

(b)              16    39   0 +++++ +++    58   0    39   0 +++++ +++    58   0

Latency               426ms     849us     200ms     133ms      16us     167ms

(c)              16   417  36 +++++ +++   755   4   433  34 +++++ +++   884  23

Latency             45773us     491us   52527us   33194us     118us   36470us

(c)              16   412  35 +++++ +++   751   4   439  35 +++++ +++   828  22

Latency             44680us     185us   49911us   39973us     159us     120ms

(c)              16   413  35 +++++ +++   740   4   434  34 +++++ +++   883  23

Latency             29546us     263us   54785us   75700us     123us   17323us

```

While the sequential output and input look nearly the same on all tested combinations 

random seek performance seems to drop on the dm-crypt devices by a factor of two 

compared to the non dm-crypt device - which should be okay.

More interesting are the create tests: Here the create tests show around 16/sec for (a), 

40/sec for (b) and 410/sec for (c). Results for delete look similarly bad.

----------

## Niethi

Modifications in ext4 journal sync times:

I now added the mount option commit=60 increasing the sync time from 5 to 60 seconds. 

So my mount options in /etc/fstab look like follows:

rw,nodev,noatime,commit=60

From my experience in the last days the system was more responsive - I would say good usable again  :Smile:  - though not perfect.

However, my bonnie++ tests don't show big improvements:

```
Version      1.96   ------Sequential Output------ --Sequential Input- --Random-

                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

(a)-sync         8G   440  98 75463  12 37465   7  1995  98 89838   9  95.5   3

Latency             18712us    2018ms    1455ms   12641us     119ms     976ms

(a)-sync         8G   441  97 74380  12 36726   7  2005  98 87541   8  93.8   3

Latency             19371us    2378ms    1039ms   21790us     143ms     990ms

(a)-sync         8G   442  98 75414  12 37261   7  2313  98 85032   8  98.0   2

Latency             20115us    1902ms    1204ms   20354us   38658us    1122ms

                    ------Sequential Create------ --------Random Create--------

                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

(a)-sync         16    16   0 +++++ +++    25   0    17   0 +++++ +++    25   0

Latency              2256ms     797us     588ms     648ms      92us     624ms

(a)-sync         16    16   0 +++++ +++    25   0    17   0 +++++ +++    25   0

Latency              2268ms     691us     684ms    1561ms      14us    1800ms

(a)-sync         16    16   0 +++++ +++    25   0    17   0 +++++ +++    25   0

Latency              1836ms     727us     456ms     612ms      72us     660ms
```

----------

## bourane

Hi !

I've got a Fitpc2 with internal 2.5 WD Caviar and noise is a problem for me : i use it as a multimedia station powered 24/7

I had the exact same problem and believed it was because of JBD2 and ext4... 

during my tests, i discovered some:

hdparm -B values (power management level)

255: power management disabled so the disk was always spinning and working fine

65 to 254: disk was still spinning forever but the heads were parking after 8 seconds of inactivity  (due tu "intelipark" pseudo technology), i figured out it by the sound it makes

below 65: disk was spinning down  as soon as the heads were parked

hdparm states that values between 0 and 127 are alowing spindown, values between 128 and 254 does not allow spindown but power management is on. 255 means no PM. The drive behavior does not conform to this.

hdparm -S values (spindown timeout)

whatever i was setting, the drive accept it but was not working as specified. It seems that this value is discarded by the drive

To put the drive to spindown, -B value below 65 works but are very agressive (8s timeout)

Finally i found the solution thanks to this note :  http://idefix.net/~koos/newsitem.cgi/1265644111

So the problem seems to be caused by hdparm or by the harddrive intelispark technology which discard spindown timeout value.

It can be "fixed" as said in the link, by using "hdparm -y" and spindownd service. The only is that you have to activate PM (-B 254) which enable head parking every 8 seconds (the noise is annoying) 

I Hope it helps !

----------

## Niethi

Hi!

Thanks a lot for pointing this out.

It really seems as if there is (also) a problem concering spin downs. smartclt reports a very high number of load cycles:

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   182   178   021    Pre-fail  Always       -       5883

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       452

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   078   078   000    Old_age   Always       -       16105

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       254

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       124

193 Load_Cycle_Count        0x0032   131   131   000    Old_age   Always       -       209902

194 Temperature_Celsius     0x0022   114   102   000    Old_age   Always       -       36

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0
```

But unfortunately my HDD revision (WDC WD15EADS-00P8B0) does not support the advanced power management feature:

```
# hdparm -B /dev/sdb

/dev/sdb:

 APM_level      = not supported
```

----------

## Niethi

Switching off ext4 write barriers:

I now added the mount option barrier=0 switching of write barriers for the last week. 

Mount options in /etc/fstab are now:

rw,nodev,noatime,commit=60,barrier=0

System is responsive again. But from time to time the system still freezes and is unusable for some Minutes due to an unresponsive home.

The bonnie++ results show tremendous improvements with respect to random seek and create operations:

```
Version      1.96   ------Sequential Output------ --Sequential Input- --Random-

                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP

(a)              8G   371  98 75960  13 37184   8  1221  98 83394   9 143.9   4

Latency             21906us    1753ms    1224ms   54852us   24917us    1961ms

(a)              8G   373  99 75902  14 36985   8  1500  98 87602   9 152.1   4

Latency             21907us    2653ms    1116ms   16136us   49953us    1193ms

(a)              8G   379  98 75052  13 36818   8  1484  98 88407   9 147.4   4

Latency             22915us    2329ms    1042ms   36364us   98653us    1341ms

                    ------Sequential Create------ --------Random Create--------

                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

(a)              16   720   3 +++++ +++   689   2    19   0 +++++ +++   598   2

Latency               273ms     847us   83810us    7203ms      72us     259ms

(a)              16   711   3 +++++ +++   651   2   683   3 +++++ +++   536   2

Latency               250ms     822us     241ms     120ms      72us     332ms

(a)              16   688   3 +++++ +++   715   2   574   2 +++++ +++   646   2

Latency               148ms     818us   98531us     249ms      72us     209ms
```

In the next days I will try to get a callstack of the blocked state (using sysRq) which I missed so far.

I would be very happy about hints what I could do to get e.g. more helpfull statistics/infos from the kernel before reverting to ext3.

----------

## HeissFuss

Do you have CONFIG_TRANSPARENT_HUGEPAGE set in your kernel config?  I had similar issues until I disabled that.

----------

