# High Overhead and Kernel (mis?)Configuration Issues

## Mousee

Sorry if the topic name is a bit ambiguous/non-descript. This is a problem I've been trying to work out for over two weeks now without any real success and I think I've narrowed it down to an issue with my kernel configuration - I'm just not sure where exactly.

Let me explain and I'll post the relevant information following it in hopes that someone can find where my fault lies. I apologize in advance if I'm too verbose or descriptive - I'm just trying to go over every detail in case someone catches something I may have missed. See the very end of this post for system information also.

In short: This post is about me trying to get rtorrent to stop crashing while hashing torrents over 8gigs in size and my associating it to a possible kernel (mis-)configuration issue.

Problem:

Intro: I recently migrated to dedicated server hosted by OVH/Kimsufi. In doing so I decided I wanted to start torrenting some of the programs I use and normally download via bittorrent (ie. OpenOffice). I therefore installed rtorrent and its dependencies, modified the port number, home directory, and session directory in .rtorrent.rc, and fired rtorrent up all ready to start downloading and seeding. This worked really well until I closed rtorrent and reopened it. Once re-opened, rtorrent naturally started to re-hash each fully and partially downloaded torrent. At random percentages, usually less than half-way through, larger torrents of 8gigs or more would completely crash my system. By completely crash I mean I had to perform a hard reboot on the system - though I did later find out that after a long period of time rtorrent would actually crash, but this was like an hour or two later and my server was completely unusable until then.

Troubleshooting:

Filesystem: My first thoughts were the file system being overloaded. I'd setup a separate partition mounted as /storage and formatted it with ext4 initially. Now I'd read great things about using ext4 with torrent programs, especially where high speed downloads (100mbps/gigabit) and a lot of reading/writing to the disk is involved. But I decided to try a vanilla ext3 next and got the same results only faster. I then did some tweaking with journaling and such on the ext3 partition hoping that would help and again same result. I tried XFS as well to no avail.

Libraries and System Packages: My next idea was, and mind you this is after a lot of Googling and reading the rtorrent bug posts, to completely re-emerge all of the packages on my system. Thus emerge -e world && sleep time (Zzz). Ran dispatch-conf and made sure all of my config files were up-to-date and correct, verified all normal packages were running (ie. vim) fine, ran revdep-rebuild and depclean to make sure the system was normal, and fired up rtorrent again. Same crashing/hanging/lockup issue once more.

Rtorrent Config: I already suspected a kernel issue by now but I figured I'd try tweaking the rtorrent.rc file a bit to see if that helped any. I modified several values such as hash_read_ahead, hash_interval, max_memory_usage, etc to try and reduce either memory or CPU overhead as I'd noticed I was completely maxing the 4gigs of memory I had and CPU usage skyrocketed as soon as the hashing began. By skyrocketed I mean I was hitting 5.0-10.0 in a matter of a minute or so. All of that produced either the same or worse results.

Kernel Config: OVH offers several flavors of netboot kernels compiled by them for Gentoo. I tried a couple of these with the same results as my own compiled kernel. For my own kernel I disabled SMP and nearly all forms of debugging within the kernel, disabled any extra features/options enabled for the file systems I used, disabled all of the ATA drivers and any SATA drivers I didn't use, and I disabled all of the security modules/features I had compiled in. From reading several posts by Gentoo users with a similar problem I also set Deadline as my main scheduler (via grub though, not in the kernel). Deadline actually produced semi-positive results in the sense that now my system didn't completely hang, but rtorrent hung and then crashed after only several minutes. Unfortunately the error I've gotten back from rtorrent isn't helpful in the slightest, as my hard drive has over 480gigs free. The crash report is listed below at the end of this post along with my system information.

Memory: Just thought I should add this, but I did indeed use ulimit -n to increase the amount of open files and such to something insane - but to no avail.

Operating System: This was a completely desperate idea and move on my part, but I thought perhaps I had configured my Gentoo system wrong and I'd heard reports that rrorrent was working without these issues on Debian and Ubuntu from others. I therefore built a 32-bit chroot environment (my Gentoo system is x86_64) for Debian Lenny, built rtorrent from source, and attempted to hash one of my problem torrents with the same crashing/hanging results as before. I'm almost considering making it a bootable partition just to see if I get the same results.

Conclusion:Based on my troubleshooting steps above and information I've gathered from others, I'm lead to believe that this *has* to be a kernel issue - unless I'm missing something obvious that just isn't hitting me in the face enough(?). My lack of knowledge of how the kernel handles time of such heavy overhead and the resulting rtorrent crash report has left me without further ideas as to where to turn other than here. If someone could at least let me know if my kernel config *is* sane or not that would be a great start at least. Let me know if any further info is needed.

Many thanks for any input anyone can give on this matter  :Smile: 

The following is a list of my system's information, including the rtorrent crash report.

Rtorrent Crash Report:

 *Quote:*   

> 
> 
> (21:30:47) Using 'epoll' based polling.
> 
> (21:30:47) XMLRPC initialized with 517 functions.
> ...

 

uname -a

```

Linux cheddar 2.6.30-gentoo-r6-Calvin09 #6 SMP Sat Sep 19 02:04:58 CEST 2009 x86_64 Intel(R) Xeon(R) CPU X3220 @ 2.40GHz GenuineIntel GNU/Linux

```

Storage Filesystem

```

Filesystem (currently) = ext4

/dev/sda7             606G   58G  542G  10% /storage

```

# lspci

```

00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)

00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03)

00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)

00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)

00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01)

00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)

00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)

00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)

00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)

00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)

00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)

00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)

00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01)

00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)

```

# emerge --info

```

Portage 2.1.6.13 (default/linux/amd64/10.0, gcc-4.3.4, glibc-2.10.1-r0, 2.6.30-gentoo-r6-Calvin09 x86_64)

=================================================================

System uname: Linux-2.6.30-gentoo-r6-Calvin09-x86_64-Intel-R-_Xeon-R-_CPU_X3220_@_2.40GHz-with-gentoo-1.12.12

Timestamp of tree: Mon, 21 Sep 2009 14:30:01 +0000

distcc 3.1 x86_64-pc-linux-gnu [disabled]

ccache version 2.4 [enabled]

app-shells/bash:     4.0_p28

dev-lang/python:     2.6.2-r1

dev-python/pycrypto: 2.0.1-r8

dev-util/ccache:     2.4-r8

sys-apps/baselayout: 1.12.12

sys-apps/sandbox:    1.6-r2

sys-devel/autoconf:  2.63-r1

sys-devel/automake:  1.5, 1.7.9-r1, 1.9.6-r2, 1.10.2

sys-devel/binutils:  2.19.1-r1

sys-devel/gcc-config: 1.4.1

sys-devel/libtool:   2.2.6a

virtual/os-headers:  2.6.27-r2

ACCEPT_KEYWORDS="amd64"

CBUILD="x86_64-pc-linux-gnu"

CFLAGS="-O2 -march=nocona -pipe"

CHOST="x86_64-pc-linux-gnu"

CONFIG_PROTECT="/etc"

CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d"

CXXFLAGS="-O2 -march=nocona -pipe"

DISTDIR="/opt/portage/distfiles"

EMERGE_DEFAULT_OPTS="--with-bdeps y --load-average=5 --quiet"

FEATURES="autoconfig candy ccache collision-protect distlocks fixpackages metadata-transfer nodoc noinfo noman parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox"

GENTOO_MIRRORS="http://gentoo.virginmedia.com/ http://mirror.qubenet.net/mirror/gentoo/ http://mirror.ovh.net/gentoo-distfiles/ ftp://mirror.ovh.net/gentoo-distfiles/ ftp://mirror.bytemark.co.uk/gentoo/"

LANG="us"

LC_ALL="en_US.utf8"

LDFLAGS=""

LINGUAS="en"

MAKEOPTS="-j5"

PKGDIR="/opt/portage/packages"

PORTAGE_CONFIGROOT="/"

PORTAGE_RSYNC_EXTRA_OPTS="--exclude-from=/etc/portage/rsync_excludes"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"

PORTAGE_TMPDIR="/var/tmp"

PORTDIR="/usr/portage"

PORTDIR_OVERLAY="/usr/local/portage/layman/php-testing /usr/local/portage/layman/perl-experimental /usr/local/portage"

SYNC="rsync://rsync.gentoo.org/gentoo-portage"

USE="acl amd64 apache2 bash-completion bzip2 cli cracklib crypt erandom fastbuild fortran gdbm gif gpm hddtemp iconv jpeg logrotate mmx modules mudflap multilib ncurses netboot nls nptl nptlonly openmp pcre perl png pppd python readline reflection session spell spl sse sse2 ssl sysfs tcpd truetype unicode userlocales xml xorg zlib"   ELIBC="glibc" KERNEL="linux" LINGUAS="en" USERLAND="GNU"

Unset:  CPPFLAGS, CTARGET, FFLAGS, INSTALL_MASK, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS

```

Kernel Config: Link to Pastebin

EDIT: Additions below as requested by Pappy

lspci -n

lsusb

cat /etc/fstab

cat /proc/cpuinfo

Last edited by Mousee on Wed Sep 23, 2009 10:45 am; edited 1 time in total

----------

## hyena

Considering you have tested other kernels I am reticent to think it's a kernel config issue.

Before you get into system troubleshooting you could try to take the steps that rtorrent is doing yourself to see where the problem exists.  Also try another torrenting program to see how it survives the startup phase.

The bittorrent spec is SHA1 hash on variable length file chunks on so hit some big files with sha1sum or for giggles:

```

cd /usr/portage/distfiles

sha1sum *

```

You may have to get creative on some IO stress tests before you can satisfy yourself that the issue is systemic.  Personally, my money is on a 64bit issue with rtorrent, rtorrent and libcurl not playing nice, or maybe a bit of both.

http://libtorrent.rakshasa.no/ticket/1648

http://libtorrent.rakshasa.no/ticket/1666

http://libtorrent.rakshasa.no/ticket/1463

In fact just traversing http://libtorrent.rakshasa.no/report/1 worries me some about the stability of said setup.

----------

## pappy_mcfae

There are definitely kernel issues. In addition to what you have posted, I'll need the results of lspci -n, lsusb, and cat /proc/cpuinfo as well as your /etc/fstab file.

Blessed be!

Pappy

----------

## Mousee

 *pappy_mcfae wrote:*   

> There are definitely kernel issues. In addition to what you have posted, I'll need the results of lspci -n, lsusb, and cat /proc/cpuinfo as well as your /etc/fstab file.
> 
> Blessed be!
> 
> Pappy

 

Attached em to the very end of my first post. Many thanks for looking into it Pappy  :Very Happy: 

@hyena - I actually forgot to include that I've tried several other bittorrent programs (which would have been smart, doh!). Namely bittornado, transmission, and the torrentflux interface I tried on both of those as well. I also installed, against my better interests, XFCE4 with Wine and Utorrent with much better results than with any of the other clients - but eventually it still crashed while hashing torrents larger than 8gigs in size. It just seemed for it to take longer to crash than rtorrent or the others. Very odd. While I agree with you it does indeed seem like a 64bit issue in general, I find it highly unlikely as someone would surely have found a fix by now. There's far too many people using 64bit OS's these days for it to just go unpatched - assuming that was the issue. Thus I suspect it to be a kernel issue as my research above hopefully indicates.

----------

## hyena

 *Quote:*   

> OVH offers several flavors of netboot kernels compiled by them for Gentoo. I tried a couple of these with the same results as my own compiled kernel. 

 

Well that is the line that draws my attention.  Why would so many fail at the same thing?  I'm not thinking that you are caught in some kinda grossly unloved area of 64bit development; I'm just stressing (as experience has led me) to cut the problem into chunks and think small, first.

Is it possible to see the kernel configs from the tests so that one could compare settings, and perhaps, versions.  You gotta love problems like this, they really teach you stuff.

----------

## Mousee

 *hyena wrote:*   

>  *Quote:*   OVH offers several flavors of netboot kernels compiled by them for Gentoo. I tried a couple of these with the same results as my own compiled kernel.  
> 
> Well that is the line that draws my attention.  Why would so many fail at the same thing?  I'm not thinking that you are caught in some kinda grossly unloved area of 64bit development; I'm just stressing (as experience has led me) to cut the problem into chunks and think small, first.
> 
> Is it possible to see the kernel configs from the tests so that one could compare settings, and perhaps, versions.  You gotta love problems like this, they really teach you stuff.

 

Heh, ya I've definitely learned a lot more about troubleshooting things I'd never thought I'd have to before.

I only wish I could get the kernel configs for OVH's netboot kernels. Unfortunately they don't provide them as far as I'm aware. I'm fairly certain they use genkernel anyways, so it wouldn't be of much help.  :Sad: 

----------

## pappy_mcfae

I have you set and ready. I dropped your kernel in favor of something a little more stable. 

Click here for your new .config. Compile as is.

For the best results, please do the following:

1) Move your .config file out of your kernel source directory (/usr/src/linux-2.6.30-gentoo-r6).

2) Issue the command make mrproper. This is a destructive step. It returns the source to pristine condition. Unmoved .config files will be deleted!

3) Copy my .config into your source directory.

4) Issue the command make && make modules_install.

5) Install the kernel as you normally would, and reboot.

6) Once it boots, please post /var/log/dmesg so I can see how things loaded.

7) After you start X, post /var/log/Xorg.0.log as well, and we'll go from there.

Blessed be!

Pappy

----------

## Mousee

DMESG Log:  http://chipsncheese.pastebin.com/m1a496ad4

Errr and being that it's a dedicated server I never had any intentions of leaving Xorg/XFCE4 on there, it was just for testing Utorrent, so ya... no Xorg or the like on there any longer  :Razz: 

Many thanks Pappy! Hopefully we can figure this out lol

I tried rtorrent once again after I pasted the dmesg log and definitely the same results, only rtorrent crashed a lot quicker this time and didn't completely hang the system, which was not my previous experience when using the CFQ scheduler. I'm going to give you a small snippet of the errors that rtorrent generates in botth dmesg and /var/log/messages when it does hang... perhaps it'll be useful? Unfortunately I don't quite understand the output of it. Oh and the error continues to repeat itself over and over - thus only a snippet of the log cause it's rather large and seems to be the same data.

/var/log/message output: http://chipsncheese.pastebin.com/m341d21f0

----------

## DirtyHairy

That's not generated by rtorrent at all but by the kernel SATA driver; rtorrent is merely triggering it. Looks like a bad drive / controller / cabling / power supply to me...

----------

## Mousee

 *DirtyHairy wrote:*   

> That's not generated by rtorrent at all but by the kernel SATA driver; rtorrent is merely triggering it. Looks like a bad drive / controller / cabling / power supply to me...

 

It's being generated by the usage of rtorrent, is what I meant to say. And ya I'm beginning to think it could be bad hardware based on that. I'm just hoping/praying that's not the case as OVH has a really crappy backup system (aka FTP isn't a backup system to me) and that means paying more to offload my services elsewhere until they (hopefully) fix it.

----------

## DirtyHairy

I remember having a similar issue with a IDE raid. Under heavy load, the controller would freak out, stop responding for some 60 seconds and then reset the port. After a lot of fiddling with power supply, cables etc. I replaced the controller and the problem was gone for good. If your problem only happens under heavy load, then it might be not related to the drive going bad, meaning that at least you won't be loosing data  :Smile: 

----------

## pappy_mcfae

Good eyes, Dirty Hairy. Now is the time to check out Good ol' Bugzilla! Either write a bug for this, or see if someone else has beaten you to the punch. 

There is also the possibility that the error Dirty Hairy noted might be a hard drive or other hardware going bad.

Blessed be!

Pappy

----------

## hyena

Well I may have lost money on this one but something looks queer here.  Maybe because it doesn't look like an answer and so I feel cheated.

Someone is gonna ask and I'm sure at this point you're wanting more as well so try compiling all

The suspense it kills me.

```

Sep 24 12:21:51 cheddar [  457.548892] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Sep 24 12:21:51 cheddar [  457.548894] ata3.00: BMDMA stat 0x25

Sep 24 12:21:51 cheddar [  457.548901] ata3.00: cmd 25/00:00:69:37:e7/00:04:2e:00:00/e0 tag 0 dma 524288 in

Sep 24 12:21:51 cheddar [  457.548902]          res 51/40:09:60:3a:e7/40:01:2e:00:00/ee Emask 0x9 (media error)

Sep 24 12:21:51 cheddar [  457.548906] ata3.00: status: { DRDY ERR }

Sep 24 12:21:51 cheddar [  457.548908] ata3.00: error: { UNC }

```

```
Action 0x0 counts re-attempts made.
```

if the bus were buggin' it would probably be climbing.

```
ata3.00: status: { DRDY ERR }
```

means it reinitialized with the quickness.  all good here

```
ata3.00: error: { UNC }
```

Uh oh you prolly got a bad disk.

Read more into the libata errata => http://ata.wiki.kernel.org/index.php/Libata_error_messages

I just noticed both configs use SLUB and since this might be a disk swapping issue (since this happens late in the game like when swapping would be useed) you can try a kernel with go ol' SLAB.  You can also try a fsck -vn to run a mock yet talkative file system check.  Otherwise create a new swap into say a file and place it somewhere other than the one you formatted earlier and not the present partition either, [dev/sda2] according your logs.

Hey, it could be worse; You could be that guy who can't reproduce the errors and no one believes him.  

 *Quote:*   

> "But it's real" he says. 

 

I believe you Mousee.

----------

