# 2.6.25-gentoo-r9 is VERY slow [Solved]

## bfdi533

I just upgraded my kernel from 2.6.23-gentoo-r9 to 2.6.25-gentoo-r9.

Now that I have done this, every time a program starts, it has a 20-second pause before the program runs, longer if it is an X app.

What information do I need to share to help debug this slowdown?

Any ideas on why this would be would be GREATLY appreciated.Last edited by bfdi533 on Wed Dec 24, 2008 7:17 pm; edited 1 time in total

----------

## mgrela

Run the slow starting program with "strace" like this:

```

strace bash

```

You may be able to spot the syscall that causes the wait and thus locate the problem.

----------

## NeddySeagoon

bfdi533,

You are probably missing DMA for your hard drive.

Please report what hdparm /dev/...  shows.

If it shows DMA is off, also post your lspci, so we can describe how to fix it

----------

## bfdi533

mgrela, not really showing anything significant that I can tell with strace.  top shows 80-95% id -- not sure what "id" is though.

NeddySeagoon, here is the data requested:

```

localhost ~ # hdparm /dev/hda

/dev/hda:

 multcount     = 16 (on)

 IO_support    =  1 (32-bit)

 unmaskirq     =  1 (on)

 using_dma     =  1 (on)

 keepsettings  =  0 (off)

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 65535/16/63, sectors = 78165360, start = 0

localhost ~ # hdparm /dev/hdb

/dev/hdb:

 multcount     = 16 (on)

 IO_support    =  1 (32-bit)

 unmaskirq     =  1 (on)

 using_dma     =  1 (on)

 keepsettings  =  0 (off)

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 65535/16/63, sectors = 78198750, start = 0

localhost ~ # lspci

00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub (rev 02)

00:01.0 PCI bridge: Intel Corporation 82875P Processor to AGP Controller (rev 02)

00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)

00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)

00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)

00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)

00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)

00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)

00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)

00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)

00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)

00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)

01:00.0 VGA compatible controller: nVidia Corporation NV18GL [Quadro NVS with AGP8X] (rev a2)

02:0c.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)

localhost ~ # 

```

----------

## NeddySeagoon

bfdi533,

id in top is idle. 

You have two drive controller there:-

00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)

00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02) 

With your hardware and that kernel, I would move to the libata driver, like this

Its not clear if you have two IDE drives on the IDE controller, in which case it looks to be ok or two SATA drives on the SATA controller running with the old depreciated IDE SATA driver.

----------

## bfdi533

 *NeddySeagoon wrote:*   

> With your hardware and that kernel, I would move to the libata driver, like this
> 
> Its not clear if you have two IDE drives on the IDE controller, in which case it looks to be ok or two SATA drives on the SATA controller running with the old depreciated IDE SATA driver.

 

I followed those directions and that seems a lot cleaner to me in the long run.

However, the system is now slower than before as you can see here:

```
user@localhost ~ $ time ls /var/log

XFree86.0.log      cups                 messages.1.gz  ntpd.log

XFree86.0.log.old  dmesg                messages.2.gz  portage

XFree86.20.log     dmesg.20050729       messages.3.gz  python-updater.log

XFree86.8.log      emerge-sync.log      messages.4.gz  remote

XFree86.8.log.old  emerge.log           messages.5.gz  samba

Xorg.0.log         emerge.log.20080427  messages.6.gz  sandbox

Xorg.0.log.old     emerge_fix-db.log    messages.7.gz  scrollkeeper.log.1.gz

Xorg.8.log         faillog              messages.8.gz  smsclient.log

Xorg.8.log.old     g-cpan               messages.9.gz  tor

apache2            galleon              mysql          wtmp

boinc.log          gdm                  mythtv         wtmp.1.gz

boinc.log.old      genkernel.log        news           xdm.log

boot.dmesg         lastlog              nmap-out.log

btmp               messages             ntp.log

real    0m9.206s

user    0m0.000s

sys     0m0.000s

user@localhost ~ $ time ls /var/log

XFree86.0.log      cups                 messages.1.gz  ntpd.log

XFree86.0.log.old  dmesg                messages.2.gz  portage

XFree86.20.log     dmesg.20050729       messages.3.gz  python-updater.log

XFree86.8.log      emerge-sync.log      messages.4.gz  remote

XFree86.8.log.old  emerge.log           messages.5.gz  samba

Xorg.0.log         emerge.log.20080427  messages.6.gz  sandbox

Xorg.0.log.old     emerge_fix-db.log    messages.7.gz  scrollkeeper.log.1.gz

Xorg.8.log         faillog              messages.8.gz  smsclient.log

Xorg.8.log.old     g-cpan               messages.9.gz  tor

apache2            galleon              mysql          wtmp

boinc.log          gdm                  mythtv         wtmp.1.gz

boinc.log.old      genkernel.log        news           xdm.log

boot.dmesg         lastlog              nmap-out.log

btmp               messages             ntp.log

real    0m0.003s

user    0m0.000s

sys     0m0.010s

user@localhost ~ $
```

However, I have managed to isolate one factor.  When accessing a file or "area of disk" that I have not used before, it is VERY slow.  But if I do the same thing, or similar thing, again, it is "normal" the next and subsequent times.  See above the second ls "run".  Don't even think about something like emerge as it is now, it will take 30 minutes or so to just read portage dependencies.  

As it is now, my system has been up for 13 minutes but the startup/init scripts are still running.

Seems like some sort of cache issue.  Does any of that make sense?

----------

## eccerr0r

Is your hard drive making strange noises or otherwise failing?  Any SMART issues?

Are you -sure- there are no background tasks running, and does the old kernel exhibit proper behavior?

I'm having a hard time believing that any kernel change would cause a 9 second directory listing.

----------

## bfdi533

 *eccerr0r wrote:*   

> Is your hard drive making strange noises or otherwise failing?  Any SMART issues?
> 
> Are you -sure- there are no background tasks running, and does the old kernel exhibit proper behavior?
> 
> I'm having a hard time believing that any kernel change would cause a 9 second directory listing.

 

The reason I switched to the new kernel was that I was having trouble getting modules to load properly and I was not able to track down the problem.  Even after recompiling the kernel and the modules, I was still having issues.  Anyway, I switched to a newer kernel and it seemed to go okay until I started using it more and realized there was a huge lag when doing disk access.  I did not notice it at first and thought it was services running and X just being slow.  I turned off a few things like ossec and seemed to fix the problem but it just seemed that way.

There are no signs of this in the old kernel and no background tasks that I can account for.  The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait.  No idea what gome process monitor thinks is eating CPU.

As to smart, no issues that I am aware of:

```
localhost~ # smartctl --all /dev/sda

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar family

Device Model:     WDC WD400BB-75AUA1

Serial Number:    WD-WMA6R3065709

Firmware Version: 18.20D18

User Capacity:    40,020,664,320 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   5

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Wed Dec 17 12:22:31 2008 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

                                        was suspended by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                 (2040) seconds.

Offline data collection

capabilities:                    (0x1b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        No Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        No General Purpose Logging support.

Short self-test routine

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        (  32) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000b   200   198   051    Pre-fail  Always       -       1

  3 Spin_Up_Time            0x0007   111   104   021    Pre-fail  Always       -       3275

  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       611

  5 Reallocated_Sector_Ct   0x0032   198   198   112    Old_age   Always       -       7

  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0032   034   034   000    Old_age   Always       -       48372

 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       467

196 Reallocated_Event_Count 0x0032   197   197   000    Old_age   Always       -       3

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0

199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1

200 Multi_Zone_Error_Rate   0x0009   200   198   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%         0         -

Device does not support Selective Self Tests/Logging

localhost~ # smartctl --all /dev/sdb

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family:     Seagate Barracuda ATA IV family

Device Model:     ST340016A

Serial Number:    3HS2Q6ZG

Firmware Version: 3.10

User Capacity:    40,037,760,000 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   5

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Wed Dec 17 12:26:45 2008 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

                                        was completed without error.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                 ( 422) seconds.

Offline data collection

capabilities:                    (0x1b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        No Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        No General Purpose Logging support.

Short self-test routine

recommended polling time:        (   1) minutes.

Extended self-test routine

recommended polling time:        (  31) minutes.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   070   065   034    Pre-fail  Always       -       232989971

  3 Spin_Up_Time            0x0003   072   070   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       44

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       791011282

  9 Power_On_Hours          0x0032   060   060   000    Old_age   Always       -       35623

 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       289

194 Temperature_Celsius     0x0022   042   056   000    Old_age   Always       -       42

195 Hardware_ECC_Recovered  0x001a   070   064   000    Old_age   Always       -       232989971

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0

202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%         8         -

Device does not support Selective Self Tests/Logging

localhost ~ #

```

Here is the output from strace.  Maybe it will make sense to someone who can say why this is happening:

```
localhost ~ # cat /tmp/strace.ls

execve("/usr/bin/ls", ["ls", "/usr/src/linux"], [/* 51 vars */]) = 0

brk(0)                                  = 0x8063000

access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)

open("/etc/ld.so.cache", O_RDONLY)      = 3

fstat64(3, {st_mode=S_IFREG|0644, st_size=187402, ...}) = 0

mmap2(NULL, 187402, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f7f000

close(3)                                = 0

open("/lib/librt.so.1", O_RDONLY)       = 3

read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\30\0\0004\0\0\0\250"..., 512) = 512

fstat64(3, {st_mode=S_IFREG|0755, st_size=30632, ...}) = 0

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7e000

mmap2(NULL, 33356, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7f75000

mmap2(0xb7f7c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6) = 0xb7f7c000

close(3)                                = 0

open("/lib/libc.so.6", O_RDONLY)        = 3

read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@a\1\0004\0\0\0\314"..., 512) = 512

fstat64(3, {st_mode=S_IFREG|0755, st_size=1237356, ...}) = 0

mmap2(NULL, 1242576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e45000

mmap2(0xb7f6f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12a) = 0xb7f6f000

mmap2(0xb7f72000, 9680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f72000

close(3)                                = 0

open("/lib/libpthread.so.0", O_RDONLY)  = 3

read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 H\0\0004\0\0\0\320"..., 512) = 512

fstat64(3, {st_mode=S_IFREG|0755, st_size=84256, ...}) = 0

mmap2(NULL, 90592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e2e000

mmap2(0xb7e41000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13) = 0xb7e41000

mmap2(0xb7e43000, 4576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7e43000

close(3)                                = 0

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e2d000

set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e2d6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0

mprotect(0xb7e41000, 4096, PROT_READ)   = 0

mprotect(0xb7f6f000, 8192, PROT_READ)   = 0

mprotect(0xb7f7c000, 4096, PROT_READ)   = 0

mprotect(0x8061000, 4096, PROT_READ)    = 0

mprotect(0xb7fc8000, 4096, PROT_READ)   = 0

munmap(0xb7f7f000, 187402)              = 0

set_tid_address(0xb7e2d708)             = 16951

set_robust_list(0xb7e2d710, 0xc)        = 0

rt_sigaction(SIGRTMIN, {0xb7e32320, [], SA_SIGINFO}, NULL, 8) = 0

rt_sigaction(SIGRT_1, {0xb7e323a0, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0

rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0

getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0

uname({sys="Linux", node="ebdhome", ...}) = 0

brk(0)                                  = 0x8063000

brk(0x8084000)                          = 0x8084000

ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0

ioctl(1, TIOCGWINSZ, {ws_row=25, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0

stat64("/usr/src/linux", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0

open("/usr/src/linux", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3

fstat64(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0

fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0

getdents64(3, /* 42 entries */, 4096)   = 1312

getdents64(3, /* 0 entries */, 4096)    = 0

close(3)                                = 0

fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 1), ...}) = 0

ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fac000

write(1, "COPYING        Module.symvers  cr"..., 71) = 71

write(1, "CREDITS        README\t       driv"..., 57) = 57

write(1, "Documentation  REPORTING-BUGS  fs"..., 50) = 50

write(1, "Kbuild\t       System.map      inc"..., 57) = 57

write(1, "MAINTAINERS    arch\t       init\tn"..., 48) = 48

write(1, "Makefile       block\t       ipc\ts"..., 55) = 55

close(1)                                = 0

munmap(0xb7fac000, 4096)                = 0

close(2)                                = 0

exit_group(0)                           = ?

localhost ~ #

```

----------

## NeddySeagoon

bfdi533,

The two different times you posted are due to the kernel buffering disc reads incase the data is needed again.

Your first ls forces the kernel to read the drive, the second one only reads the in RAM cache.

I'm not sure what the data in the RAW_VALE fiels indicates but /sdb is clearly in a poor state.

Seek errors cause retries to read the data. A retry costs a single revolution of the disk at minimum, sometimes several.

If it also needs the head to be recalibrated, the retry process will take a lot longer.

Hardware_ECC_Recovered errors mean the data was recovered from the platter incorrectly but the drive electronics was subsequently able to correct the errors.

I would suggest that sdb is dying. Its been operating for 35623 hours, which is over 4 years nonstop. Its working hard to return valid data both with error correction and retries, What the SMART data does not tell is if the errors occur all over the drive surface, or if its a small part that is read repeatedly. I'm inclined to think its the former, as kernel caching should minimise the latter.

For a more thorough test, get the manufactuers test software from their website. However, it will need to write all over the drive so you will need to move your data off.

----------

## bfdi533

 *NeddySeagoon wrote:*   

> bfdi533,
> 
> The two different times you posted are due to the kernel buffering disc reads incase the data is needed again.
> 
> Your first ls forces the kernel to read the drive, the second one only reads the in RAM cache.

 

I can see that.  Definitely makes sense why subsequent execution of stuff is faster.

 *NeddySeagoon wrote:*   

> I would suggest that sdb is dying. Its been operating for 35623 hours, which is over 4 years nonstop. Its working hard to return valid data both with error correction and retries, What the SMART data does not tell is if the errors occur all over the drive surface, or if its a small part that is read repeatedly. I'm inclined to think its the former, as kernel caching should minimise the latter.

 

Obviously I am not in a position to deny that.  However, sda is where my root is and /bin/ls and /var/log are both on sda.  So the failing sdb aside, which I see I need to fix, that does no explain why there is no much delay in the ls execution (and other disk reads either).

----------

## eccerr0r

 *bfdi533 wrote:*   

> The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait.
> 
> 

 

Just to confirm, what process is in iowait?  Is it zombied?  For sure something is consuming disk bandwidth.

Is your HDD LED always on?  Are there messages constantly being added to your log files?  Any mess in your dmesg(1)?

Is udevd being io-waited?  Is it in poll mode to look for new devices?  Are CONFIG_DNOTIFY and CONFIG_INOTIFY turned on?

----------

## bfdi533

 *eccerr0r wrote:*   

>  *bfdi533 wrote:*   The gnome process monitor shows that the CPU stays near 80-90% most of the time but top shows this to be about 5-10%, with 80% idle or 80% wait.
> 
>  
> 
> Just to confirm, what process is in iowait?  Is it zombied?  For sure something is consuming disk bandwidth.
> ...

 

I must admin that although I know most of what you are asking I do not know how to get you all of that info.

Config_notify:

```
# grep NOTIFY .config

# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set

CONFIG_DNOTIFY=y

CONFIG_INOTIFY=y

CONFIG_INOTIFY_USER=y

```

udev:

```
# ps axl | grep udevd

F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND

5     0  8457     1  16  -4   2624  1376 -      S<s  ?          0:00 /sbin/udev  --daemon

```

```
# udev.conf

# The initial syslog(3) priority: "err", "info", "debug" or its

# numerical equivalent. For runtime debugging, the daemons internal

# state can be changed with: "udevcontrol log_priority=<value>".

udev_log="err"

# If you need to change mount-options, do it in /etc/fstab

```

Dmesg does not show any problems.  The log files are written to somewhat regularly but not constantly, just the normal stuff every couple of minutes like any other linux system.

How to I determine what process is in iowait and consuming this wait state?

----------

## eccerr0r

Run 'ps ax' and look for any processes whose STATe are "Z" or "D"...

Also cat /proc/interrupts and see if there are any interrupts that are "ringing off hook"? screwed up USB?

----------

## bfdi533

It turned out to be just the hard drive.  I copied all of the contents to a new drive and replaced it and the system is now zippy again. My guess is that about the time of one of the kernel builds and reboots, the hard drive started to have issues since I KNOW it was coincident with the new kernel and reboot.  

Thanks for all for the helpful tips and insight.

----------

