# SOLVED: "bash invoked oom-killer" and "out of memory" errors

## saffi

Hey guys. After making a system update and compiling a new kernel (2.6.22-gentoo-r5), I started having some problems. After the machine boots, when I try to login from the console (text-mode), it hangs after saying that "Last login: Tue Oct  9 09:11:51 2007 from tty1" message. The shell is just available after about 2 minutes, and when I try to enter ANY command, it takes every letter of what I'm typing as a command. So things happen like this, suppose I want to input "exit":

[quote]eagle ~ $ e -bash: e: command not found eagle ~ $ x -bash: x: command not found eagle ~ $ eagle ~ $

eagle ~ $[/quote

I then thought of checking the metalog info. Rebooted and used init=/bin/bash at grub menu. Then read the /var/log/everything/current file, when I found the problem. Just don't know what to do to get rid of this. Any ideas? Re-compile my kernel without some option? Which one?

I below post the contents of some files (also available here: http://www.las.ic.unicamp.br/saffi/gentoo/):

/etc/make.conf

 *Quote:*   

> # These settings were set by the catalyst build script that automatically
> 
> # built this stage.
> 
> # Please consult /etc/make.conf.example for a more detailed example.
> ...

 

2.6.22-gentoo-r5 kernel .config

 *Quote:*   

> #
> 
> # Automatically generated make config: don't edit
> 
> # Linux kernel version: 2.6.22-gentoo-r5
> ...

 

/etc/fstab

 *Quote:*   

> /etc/fstab:
> 
>     # /etc/fstab: static file system information.
> 
>     #
> ...

 

/var/log/everything/current on the next post

----------

## saffi

/var/log/everything/current

 *Quote:*   

> 
> 
> Oct  6 07:32:14 [kernel] [    0.000000] Linux version 2.6.22-gentoo-r5 (root@(none)) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #3 SMP Fri Oct 5 17:49:01 BRT 2007
> 
> Oct  6 07:32:14 [kernel] [    0.000000] BIOS-provided physical RAM map:
> ...

 

----------

## ferg

Hi,

I'm getting the same error.  I noticed it first during a Emerge -e system following a GCC upgrade (> 4.2.1).  Following this X died and kept restarting.

I'm not sure what's causing it, but later today I plan some investigation!

Cheers

Ferg

----------

## eccerr0r

1.  You should make a small swapfile, maybe 256MB or so to help the debug.

2.  What on earth are you doing with your memory?  Are you running something in the background?  Does 'top' report anything consuming all your RAM?  Are you using the RAMdisk?

3.  Last resort is a kernel memory leak...

----------

## ferg

I use a 3Gb ramdisk (mounted on /tmp).  Your comment made me think about this.  It was only 10% full.  However, I ran rm -rf /tmp/, and reran emerge -e system.  Everything now works OK.

Nice one!

Cheers

Ferg

----------

## saffi

@ferg: I actually did make an "emerge -DNuva system". And later, as a suggestion of a friend trying to help me solve this, I made an "emerge -DeNv system". Anyway...

@eccerr0r:

1) I have 2GB of RAM, I saw no reason on having a swap partition. Thing is that I won't shrink and create a new partition. Maybe I'll use a file as swap space.

2) What on earth am I doing with my memory? That's what I'm talking about. I'm doing NOTHING. That thing is happening on a fresh new gentoo instalation. There is not even a graphical environment yet. The systems goes up already complaining of a leak of memory. And before anyone asks: my 2x1GB of RAM are fine. I dual boot with Windows and already run Gentoo LiveCD without problems. If that was a hardware issue, I guess Windows AND the LiveCD would have complained.

I don't remember about "top". When I get home I'll check it out. Thing is that with the system up, there's not even how I can run top. Just when I crack into it by using the grub init=/bin/bash like I said.  :Sad: 

3) I'm pretty sure is something I've chosen in my kernel configuration. Just don't know what.

@ferg (again): I will try to check my /tmp, though I doubt there's something related to that. When my system boots up, it cleans the /tmp folder.

Thanks for now ferg and eccerr0r.

Hugs!

----------

## ferg

I think you can rule out a swapfile as a problem as before I deleted the contents of /tmp, i was getting the error with 5 Gb of swapspace and without any at all.

Swap is a good idea though, even if you have a lot of ram.  If you get a runaway process, having swapspace will allow it to be swapped out, rather than the kernel running potentially out of memory and starting to kill processes.  It may make the difference between a crash and surviving the problem. I use a swap file even though I have 6Gb of ram on my workstation.

Cheers

Ferg

----------

## ferg

I think you can safely discount anything I said before.  clearing out the ramdisc was a redherring.  I removed the ramdisc and pointed my tmp directory to the hard drive.  I am still getting those error messages:

```
Total swap = 4891784kB

Free swap:       4891784kB

1835008 pages of RAM

1638400 pages of HIGHMEM

278729 reserved pages

7257 pages shared

0 pages swap cached

703 pages dirty

0 pages writeback

1445 pages mapped

4183 pages slab

149 pages pagetables

Out of memory: kill process 25278 (ebuild.sh) score 125 or a child

Killed process 31614 (make)

1835008 pages of RAM

1638400 pages of HIGHMEM

278729 reserved pages

7025 pages shared

0 pages swap cached

703 pages dirty

0 pages writeback

1445 pages mapped

4183 pages slab

149 pages pagetables

Out of memory: kill process 4966 (sh) score 124 or a child

Killed process 4968 (tic)

```

Further investigation methinks.

Cheers

ferg

----------

## eccerr0r

The OOM messages in dmesg are actually of marginal value in debug.

Running 'top' and hitting 'M' (shift-M to sort by memory) and also the 'free' command would be useful information.

Then /proc/slabinfo may help out finding kernel memory leaks.

----------

## saffi

@eccerr0r: Ok, I'll try. But I insist that it's gonna be difficult since I can't even type one single command (not even top) after the system goes up and I login. I will be able to do that if I crack into it by the grub init thing, but it won't be the same.

----------

## saffi

Ok, I'm at home now and just tried what you suggested:

Under the root user, I can't do anything. Every letter I type on the shell it takes as "exit" and it won't let me do anything.

I then logged in as "rsaffi", my username. Here's what happens:

 *Quote:*   

> This is eagle.unknown_domain (Linux i686 2.6.22-gentoo-r5) 17:53:15
> 
> eagle login: rsaffi
> 
> Password:
> ...

 

I'll now try to run "top" under the cracked-via-grub system and post the results (if possible) here. Though I still say that top under the cracked system is different than the one with the whole system up. BRB.

----------

## saffi

Under the cracked-via-grub system I get the expected top: both CPUs 100% idle, with bash being the process that takes the most ammount of RAM: 0.1%.

 *Quote:*   

> Mem: 2072368k total,     17608k used,     2054760k free,     300k buffers
> 
> Swap:      0k total,     0k used,     0k free,     2572 cached

 

Man, I'm pissed for not being able to use my Gentoo at home!

----------

## eccerr0r

You'll have to get the result from the malfunctioning setup.

Try creating the swapfile in single-user mode or hack-init mode and set /etc/conf.d/local.start to swapon to that swapfile.

Then try logging in and seeing if you can run top.  Try 256M to 512M or so, but probably hopeless past that.

And yes, swap is a good idea.  It will give you some breathing space when your machine spirals out of control.  As it requires more and more RAM it will take longer and longer to get more RAM (because HDD is slow) - and during that time you may get a command in before it gives up and OOM kills every process you start.

----------

## erik258

 *Quote:*   

> And yes, swap is a good idea....

 

They're right.  At least that way it's so slow to keep swapping everything around on context changes that virtually \

nothing gets done -- usually, a quick but patient operator can eek out a 'top' before all hell breaks loose.  I would recommend you do put it in a file on disk -- I think, perhaps, you can use a sparse file for this.  At any rate, I bet it can be done 'well' or 'poorly'; as always, consult google for more information.  This advice comes from someone who learned the hard way, that is to say, myself. 

```
emerge -DNuva system
```

A time-consuming effort, but one that usually isn't very helpful,  unless you change around the configuration for portage or some other relevant detail beforehand.  

 *Quote:*   

> Rebooted and used init=/bin/bash at grub menu. 

 

Nice trick!  I think the fact that it works probably proves your kernel as not the source of the problem.  What in init.d is starting on boot? 

Why not start some init scripts manually, and see when things bomb?

----------

## saffi

Ok, I solved the problem. I followed the suggestion I received at gentoo-user@gentoo.org and removed the call to bash_completion. Here's the whole story:

When I tried to turn off bash_completion by not invoking it when a shell opens (and it didn't work) I had an insight.  :Smile: 

My $HOME/.bash_profile makes theese two calls:

    [[ -f $HOME/.bashrc ]] && . $HOME/.bashrc

    [[ -f /etc/bash_completion ]] && source /etc/bash_completion

My $HOME/.bashrc, among other things (regular ones), calls a script I have with many (many many) aliases and mods for my shells. So this is my .bashrc:

    [[ -f $HOME/.loginscript ]] && source $HOME/.loginscript

When I decided to comment the above call to $HOME/.loginscript, everything worked ok. No more memory leak problems, no more weird shell command interpretation. Everything started to work Super (as Al Pacino would say in Scent of a Woman).  :Smile: 

What do I have inside .loginscript? Here it is:

 *Quote:*   

>     #! /bin/bash
> 
>     #PS1='\[\033[01;32m\]\u@\h\[\033[01;34m\] \w \$\[\033[00m\] '
> 
>     DATE=$(date +"%Y-%m-%d")
> ...

 

Again: I already do this with other computers I work with and never had a problem with such thing. Two funny things: the very same sequence os scripts runs when I login with my local user (rsaffi). No problem, just with root.

Another funny thing: after commenting the "[[ -f $HOME/.loginscript ]] && source $HOME/.loginscript" from the .bashrc file, the problem was gone. Then, after I log in, I ran: "source ~/.loginscript" and had no problem. That's weird.

Anyway, I appreciate your help, guys. Thanks for all the patience and tips.

Regards,

Saffi

----------

