# IO hang for 60-90 seconds deleting many small files [solved]

## lyz

Can't seem to fix this.  Bear with me as this will be a lot of information.

The bottom line is that when deleting a unpacked kernel with rm -rf the system just hangs for 60-90 seconds.

I have two xfs filesystems that this has been tested on.  Both of them are inside of LVM.

One is on an ssd:

/dev/mapper/ssdvg0-usr on /usr type xfs (rw,noatime,nodiratime,discard,delaylog,logbufs=8,logbsize=256k,nobarrier)

One is on software raid consisting of two hard drives: 

/dev/mapper/vg-home on /home type xfs (rw,noatime,nodiratime,delaylog,logbufs=8,logbsize=32k)

Most of the mount options listed are from reading up on this for a few hours.   

I have changed the scheduler on the ssd to deadline.

The filesystem on the ssd is newer than the one on the disks:

```
core24 ~ # xfs_info /home

meta-data=/dev/mapper/vg-home    isize=256    agcount=108, agsize=1638400 blks

         =                       sectsz=512   attr=1

data     =                       bsize=4096   blocks=176783360, imaxpct=25

         =                       sunit=0      swidth=0 blks

naming   =version 2              bsize=4096   ascii-ci=0

log      =internal               bsize=4096   blocks=12800, version=1

         =                       sectsz=512   sunit=0 blks, lazy-count=0

realtime =none                   extsz=65536  blocks=0, rtextents=0

```

```

core24 ~ # xfs_info /usr 

meta-data=/dev/mapper/ssdvg0-usr isize=256    agcount=4, agsize=2621440 blks

         =                       sectsz=512   attr=2

data     =                       bsize=4096   blocks=10485760, imaxpct=25

         =                       sunit=0      swidth=0 blks

naming   =version 2              bsize=4096   ascii-ci=0

log      =internal               bsize=4096   blocks=5120, version=2

         =                       sectsz=512   sunit=0 blks, lazy-count=1

realtime =none                   extsz=4096   blocks=0, rtextents=0

```

There was a suggestion to run echo w > /proc/sysrq-trigger during a lockup .  Here is the output from that.

```
SysRq : Show Blocked State

  task                        PC stack   pid father

kswapd0         D 0000000100ef3f4f     0   530      2 0x00000000

 ffff8802150998a0 0000000000000046 0000000000000000 ffff880200000000

 0000000000010b00 ffff880216be6900 0000000000010b00 ffff880215099fd8

 0000000000004000 ffff880215098000 0000000000010b00 ffff880215099fd8

Call Trace:

 [<ffffffff8143e273>] ? ata_scsi_queuecmd+0xb3/0x280

 [<ffffffff813194f7>] ? kobject_put+0x27/0x60

 [<ffffffff813ddf52>] ? put_device+0x12/0x20

 [<ffffffff81413aca>] ? scsi_request_fn+0xaa/0x3c0

 [<ffffffff81632c2a>] schedule+0x3a/0x50

 [<ffffffff81633115>] schedule_timeout+0x1a5/0x200

 [<ffffffff81301d6f>] ? generic_make_request+0xbf/0x100

 [<ffffffff81301e25>] ? submit_bio+0x75/0xf0

 [<ffffffff81631fb0>] wait_for_common+0xc0/0x150

 [<ffffffff81043180>] ? try_to_wake_up+0x260/0x260

 [<ffffffff816320e8>] wait_for_completion+0x18/0x20

 [<ffffffff8126d698>] xfs_buf_iowait+0x18/0x30

 [<ffffffff8126e12d>] xfs_bwrite+0x2d/0x60

 [<ffffffff812ae1ff>] xfs_iflush+0x1ff/0x250

 [<ffffffff8127c8eb>] xfs_reclaim_inode+0x1cb/0x2a0

 [<ffffffff8127d347>] xfs_reclaim_inodes_ag+0x207/0x340

 [<ffffffff8127d5fe>] xfs_reclaim_inodes_nr+0x2e/0x40

 [<ffffffff81279ff0>] xfs_fs_free_cached_objects+0x10/0x20

 [<ffffffff810f2c81>] prune_super+0x101/0x1b0

 [<ffffffff810bd685>] shrink_slab+0x135/0x1f0

 [<ffffffff810bfdb2>] kswapd+0x662/0x970

 [<ffffffff8103b6a9>] ? __wake_up_common+0x59/0x90

 [<ffffffff81068e60>] ? wake_up_bit+0x40/0x40

 [<ffffffff810bf750>] ? shrink_zone+0x4e0/0x4e0

 [<ffffffff810689a6>] kthread+0x96/0xa0

 [<ffffffff81637574>] kernel_thread_helper+0x4/0x10

 [<ffffffff81068910>] ? kthread_worker_fn+0x190/0x190

 [<ffffffff81637570>] ? gs_change+0xb/0xb

xfsaild/dm-1    D 0000000000000001     0  4797      2 0x00000000

 ffff880215561d30 0000000000000046 ffff880215561d40 ffff880214afbda8

 0000000000010b00 ffff8802151f8d20 0000000000010b00 ffff880215561fd8

 0000000000004000 ffff880215560000 0000000000010b00 ffff880215561fd8

Call Trace:

 [<ffffffff8103c67d>] ? cpuacct_charge+0x5d/0x70

 [<ffffffff812be78e>] ? xlog_cil_force_lsn+0x8e/0x110

 [<ffffffff81632c2a>] schedule+0x3a/0x50

 [<ffffffff812bcf2f>] _xfs_log_force+0x21f/0x280

 [<ffffffff81043180>] ? try_to_wake_up+0x260/0x260

 [<ffffffff81058a20>] ? run_timer_softirq+0x240/0x240

 [<ffffffff812bcfa3>] xfs_log_force+0x13/0x40

 [<ffffffff812c0861>] xfsaild+0x131/0x550

 [<ffffffff8103b6a9>] ? __wake_up_common+0x59/0x90

 [<ffffffff812c0730>] ? xfs_trans_ail_cursor_first+0xb0/0xb0

 [<ffffffff810689a6>] kthread+0x96/0xa0

 [<ffffffff81637574>] kernel_thread_helper+0x4/0x10

 [<ffffffff81068910>] ? kthread_worker_fn+0x190/0x190

 [<ffffffff81637570>] ? gs_change+0xb/0xb

gnome-terminal  D ffff880216be6270     0  4192      1 0x00000000

 ffff88016b20f8f8 0000000000000086 ffff8801b6b1a780 0000000100ef4356

 0000000000010b00 ffff880216be6270 0000000000010b00 ffff88016b20ffd8

 0000000000004000 ffff88016b20e000 0000000000010b00 ffff88016b20ffd8

Call Trace:

 [<ffffffff8103d4c4>] ? dequeue_entity+0x104/0x290

 [<ffffffff81632c2a>] schedule+0x3a/0x50

 [<ffffffff81633115>] schedule_timeout+0x1a5/0x200

 [<ffffffff810b7b4d>] ? __alloc_pages_nodemask+0x10d/0x700

 [<ffffffff810430dc>] ? try_to_wake_up+0x1bc/0x260

 [<ffffffff81631fb0>] wait_for_common+0xc0/0x150

 [<ffffffff81043180>] ? try_to_wake_up+0x260/0x260

 [<ffffffff816320e8>] wait_for_completion+0x18/0x20

 [<ffffffff81062d9b>] flush_work+0x2b/0x40

 [<ffffffff810617c0>] ? do_work_for_cpu+0x30/0x30

 [<ffffffff81100101>] ? set_close_on_exec+0x41/0xc0

 [<ffffffff81350ca0>] tty_flush_to_ldisc+0x10/0x20

 [<ffffffff8134b3b2>] n_tty_poll+0x62/0x1d0

 [<ffffffff81347c7a>] tty_poll+0x8a/0xa0

 [<ffffffff81103488>] do_sys_poll+0x268/0x4f0

 [<ffffffff81102210>] ? poll_freewait+0xb0/0xb0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81102300>] ? __pollwait+0xf0/0xf0

 [<ffffffff81071328>] ? ktime_get_ts+0xa8/0xe0

 [<ffffffff811024f2>] ? poll_select_set_timeout+0x82/0x90

 [<ffffffff811037e7>] sys_poll+0x77/0x100

 [<ffffffff816356fb>] system_call_fastpath+0x16/0x1b

kworker/1:0     D 0000000100ef43a7     0 21546      2 0x00000000

 ffff8801bab3ba80 0000000000000046 ffff8801bab3b980 ffffffff00000000

 0000000000010b00 ffff880001e26270 0000000000010b00 ffff8801bab3bfd8

 0000000000004000 ffff8801bab3a000 0000000000010b00 ffff8801bab3bfd8

Call Trace:

 [<ffffffff8143cab0>] ? ata_scsi_rw_xlat+0x1d0/0x1d0

 [<ffffffff8143ad61>] ? ata_scsi_translate+0xa1/0x180

 [<ffffffff8143e273>] ? ata_scsi_queuecmd+0xb3/0x280

 [<ffffffff81632c2a>] schedule+0x3a/0x50

 [<ffffffff81633115>] schedule_timeout+0x1a5/0x200

 [<ffffffff8130484b>] ? blk_queue_bio+0x22b/0x2d0

 [<ffffffff81631fb0>] wait_for_common+0xc0/0x150

 [<ffffffff81043180>] ? try_to_wake_up+0x260/0x260

 [<ffffffff816320e8>] wait_for_completion+0x18/0x20

 [<ffffffff813097ba>] blkdev_issue_discard+0x1aa/0x1f0

 [<ffffffff812b5b42>] ? xfs_perag_get+0x42/0x50

 [<ffffffff8126ef82>] xfs_discard_extents+0x72/0xc0

 [<ffffffff812bdc8d>] xlog_cil_committed+0xed/0x120

 [<ffffffff812bb648>] xlog_state_do_callback+0x178/0x2e0

 [<ffffffff812bb827>] xlog_state_done_syncing+0x77/0xa0

 [<ffffffff8126d320>] ? xfs_buf_delwri_split+0x140/0x140

 [<ffffffff812bb89c>] xlog_iodone+0x4c/0xa0

 [<ffffffff8126d341>] xfs_buf_iodone_work+0x21/0x50

 [<ffffffff81061c92>] process_one_work+0x112/0x390

 [<ffffffff81063ef1>] worker_thread+0x161/0x340

 [<ffffffff8103b6a9>] ? __wake_up_common+0x59/0x90

 [<ffffffff81063d90>] ? manage_workers.clone.26+0x240/0x240

 [<ffffffff810689a6>] kthread+0x96/0xa0

 [<ffffffff81637574>] kernel_thread_helper+0x4/0x10

 [<ffffffff81068910>] ? kthread_worker_fn+0x190/0x190

 [<ffffffff81637570>] ? gs_change+0xb/0xb

kworker/0:6     D ffff880001e22df0     0  1758      2 0x00000000

 ffff880054829980 0000000000000046 ffff880054829990 ffffffff81632413

 0000000000010b00 ffff880001e22df0 0000000000010b00 ffff880054829fd8

 0000000000004000 ffff880054828000 0000000000010b00 ffff880054829fd8

Call Trace:

 [<ffffffff81632413>] ? __schedule+0x323/0x9b0

 [<ffffffff810430dc>] ? try_to_wake_up+0x1bc/0x260

 [<ffffffff81632c2a>] schedule+0x3a/0x50

 [<ffffffff812bc0ee>] xlog_state_get_iclog_space+0xde/0x2f0

 [<ffffffff81043180>] ? try_to_wake_up+0x260/0x260

 [<ffffffff8104555e>] ? find_busiest_group+0x1ae/0xc50

 [<ffffffff812bc61f>] xlog_write+0x1bf/0x730

 [<ffffffff81282245>] ? kmem_zone_zalloc+0x35/0x50

 [<ffffffff812bd4e1>] ? xlog_ticket_alloc+0x101/0x160

 [<ffffffff812bdeef>] xlog_cil_push+0x22f/0x410

 [<ffffffff8103acff>] ? walk_tg_tree_from+0x6f/0xd0

 [<ffffffff812be804>] xlog_cil_force_lsn+0x104/0x110

 [<ffffffff810e8f65>] ? kmem_cache_free+0x15/0xb0

 [<ffffffff812b7f9e>] ? xfs_trans_free_item_desc+0x2e/0x30

 [<ffffffff812b9317>] ? xfs_trans_free_items+0x87/0xb0

 [<ffffffff812bd020>] _xfs_log_force_lsn+0x50/0x2d0

 [<ffffffff812be698>] ? xfs_log_commit_cil+0x348/0x3b0

 [<ffffffff810e8f65>] ? kmem_cache_free+0x15/0xb0

 [<ffffffff812b9c2b>] xfs_trans_commit+0x27b/0x290

 [<ffffffff8127c3a0>] ? xfs_sync_inode_attr+0xf0/0xf0

 [<ffffffff812727fd>] xfs_fs_log_dummy+0x5d/0x90

 [<ffffffff8127c424>] xfs_sync_worker+0x84/0x90

 [<ffffffff81061c92>] process_one_work+0x112/0x390

 [<ffffffff81063ef1>] worker_thread+0x161/0x340

 [<ffffffff8103b6a9>] ? __wake_up_common+0x59/0x90

 [<ffffffff81063d90>] ? manage_workers.clone.26+0x240/0x240

 [<ffffffff810689a6>] kthread+0x96/0xa0

 [<ffffffff81637574>] kernel_thread_helper+0x4/0x10

 [<ffffffff81068910>] ? kthread_worker_fn+0x190/0x190

 [<ffffffff81637570>] ? gs_change+0xb/0xb

emerge          D 0000000100ef4044     0  2613   2609 0x00000004

 ffff88020c4f99e8 0000000000000086 ffff88020c4f98d8 ffff880100000000

 0000000000010b00 ffff88020fcdc830 0000000000010b00 ffff88020c4f9fd8

 0000000000004000 ffff88020c4f8000 0000000000010b00 ffff88020c4f9fd8

Call Trace:

 [<ffffffff81301e25>] ? submit_bio+0x75/0xf0

 [<ffffffff8126c20d>] ? _xfs_buf_ioapply+0x17d/0x1f0

 [<ffffffff81632c2a>] schedule+0x3a/0x50

 [<ffffffff812bc0ee>] xlog_state_get_iclog_space+0xde/0x2f0

 [<ffffffff81043180>] ? try_to_wake_up+0x260/0x260

 [<ffffffff812bc008>] ? xlog_state_release_iclog+0xd8/0xe0

 [<ffffffff812bc61f>] xlog_write+0x1bf/0x730

 [<ffffffff81282245>] ? kmem_zone_zalloc+0x35/0x50

 [<ffffffff812bdeef>] xlog_cil_push+0x22f/0x410

 [<ffffffff810e8f65>] ? kmem_cache_free+0x15/0xb0

 [<ffffffff812be6ba>] xfs_log_commit_cil+0x36a/0x3b0

 [<ffffffff812b9b89>] xfs_trans_commit+0x1d9/0x290

 [<ffffffff8127790b>] xfs_setattr_nonsize+0x37b/0x580

 [<ffffffff81277eeb>] xfs_vn_setattr+0x1b/0x40

 [<ffffffff8110a68a>] notify_change+0x19a/0x360

 [<ffffffff810ee10c>] chmod_common+0x6c/0xa0

 [<ffffffff810eeb24>] sys_fchmodat+0x34/0x50

 [<ffffffff810eeb53>] sys_chmod+0x13/0x20

 [<ffffffff816356fb>] system_call_fastpath+0x16/0x1b
```

uname -a

```
Linux core24 3.2.5-gentoo #11 SMP PREEMPT Sat Feb 11 15:46:22 EST 2012 x86_64 Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz GenuineIntel GNU/Linux
```

Motherboard is a GA-EP45-UD3P.  All drives are connected to the jmicron sata controller.

```

00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)

00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03)

00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4

00:1a.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5

00:1a.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6

00:1a.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2

00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller

00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1

00:1c.3 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 4

00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5

00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6

00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1

00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2

00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3

00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)

00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller

00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller

00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller

01:00.0 VGA compatible controller: nVidia Corporation Device 0e23 (rev a1)

01:00.1 Audio device: nVidia Corporation GF104 High Definition Audio Controller (rev a1)

03:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 02)

03:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 02)

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

06:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)

```

BIOS set to AHCI.

Logs and dmesg show nada.

I am flat out stuck, just like the i/o on my system.

Any other things to look at?Last edited by lyz on Mon Mar 12, 2012 7:25 pm; edited 1 time in total

----------

## roarinelk

can you test each disk individually (i.e. without LVM) ?  So we could narrow it down

to either the SSD firmware being buggy (NCQ), a bug in LVM, or buggy SATA controller (many are

known to have subtle problems with NCQ).

 I'm no expert but judging from the backtraces it seems that command submission to

one of the drives is stuck. So either buggy firmware on one of the disks or buggy

SATA controller/driver or both).

----------

## lyz

I am able to test the two hard disks without LVM, but not the SSD.  When just mounted alone, the disks fly.

/dev/sda2 on /media/temp/sda type xfs (rw,noatime,nodiratime,delaylog,logbufs=8,logbsize=256k)

/dev/sdb2 on /media/temp/sdb type xfs (rw,noatime,nodiratime,delaylog,logbufs=8,logbsize=256k)

core24 sda # time rm -rf linux-3.2.6-gentoo/

real	0m1.452s

user	0m0.000s

sys	0m1.420s

core24 sdb # time rm -rf linux-3.2.6-gentoo/

real	0m1.690s

user	0m0.060s

sys	0m1.490s

My home mount is much slower.  It is striped raid and LVM.

core24 lyz # pwd  

/home/lyz

core24 lyz # time rm -rf linux-3.2.6-gentoo/

real	0m6.631s

user	0m0.000s

sys	0m1.670s

The ssd time is terrible and has the freezes.  There isn't a way I can test this outside of LVM without a few resize commands, I'm hoping that I don't have to do that.

/dev/mapper/ssdvg0-usr on /usr type xfs (rw,noatime,nodiratime,discard,delaylog,logbufs=8,logbsize=256k,nobarrier)

core24 usr # time rm -rf linux-3.2.6-gentoo/

real	4m5.474s

user	0m0.040s

sys	0m2.000s

When a freeze occurs, I cannot save a document to /home.   That's one of the things that is throwing me off.  Gonna reboot and try without the discard option.

----------

## lyz

It was discard!

/dev/mapper/ssdvg0-usr on /usr type xfs (rw,noatime,nodiratime,delaylog,logbufs=8,logbsize=256k,nobarrier)

core24 src # time rm -rf linux-3.2.6-gentoo2/

real	0m1.702s

user	0m0.050s

sys	0m1.570s

I just the question now what is the impact of turning the discard mount option off?   I read a few things on it.  Isn't is supposed to make things faster?

----------

## invasivenorman

From reading around a bit, it isn't well-tested at this point.  Also, your SSD has to support TRIM for discard to make any sense.

----------

## lyz

Thanks for looking it up.  I didn't mention the type of SSD.  It is a PP120GS25SSDR.  It does come with trim support.

----------

## lyz

I did a test on ext4.  discard works fine.  By process of elimination, that would mean the problem is xfs combined with discard.  It looks like 

```

/dev/mapper/ssdvg0-testLV on /media/temp type ext4 (rw,noatime,nodiratime,discard)

core24 temp # cp -R /usr/src/linux-3.2.6-gentoo/ .                  

core24 temp # time rm -rf linux-3.2.6-gentoo/

real   0m0.943s

user   0m0.050s

sys   0m0.830s

```

Reformatted as xfs

```

/dev/mapper/ssdvg0-testLV on /media/temp type xfs (rw,noatime,nodiratime,discard)

core24 temp # time rm -rf linux-3.2.6-gentoo/

real   5m7.139s

user   0m0.080s

sys   0m1.580s

```

I am going to post something to their mailing list to see what is up.

----------

## lyz

The result from a discussion on their mailing list is, essentially, don't mount a drive using discard.  Run fstrim using cron instead.

----------

