# WebCVS + SMP + Sandbox Problem -> CRASH + DEAD SYSTEM

## Ondrej

Hi everyone,

we run Gentoo Linux on a 2 CPU (P2 350s) system. The kernel is 2.4.19-gentoo-r5 with SMP enabled, and the low-latency and non-preempt patches disabled.

On a newly installed system, emerge webcvs crashes during the 'configure' step.

```

./conf.sh: configure has_map_fd, has_mmap, has_madvise, mmap_signal ... kernel BUG at filemap.c:130!

invalid operand: 0000

CPU:    0

EIP:    0010:[<c012b35c>]    Not tainted

EFLAGS: 00010282

eax: 0000002a    ebx: c11289a8   ecx: c039c640   edx: c797bcb0

esi: c11289a8    edi: c797bd6c   ebp: 00000000   esp: c797bd08

ds: 0018   es: 0018    ss: 0018

Process a.out (pid: 22757, stackpage=c797b000)

... more ...

```

The system's dead.

After a restart, portage is broken. Emerge will still rsync, however no package will emerge, instead emerge will freeze after it calculates the dependencies. NOTE: this has already been discussed at https://forums.gentoo.org/viewtopic.php?t=2529&start=0&postdays=0&postorder=asc&highlight=

Changing FEATURES in /etc/make.conf and switching sandbox to -sandbox fixes the problem.

However, even after re-emerging portage sandbox is still messed.  This appears to be a kernel issue: refere to /usr/src/linux/mm/filemap.c:130 to see the problem for your selves ...

PLEASE MAKE PORTAGE USABLE IN THE EVENT THAT SOMETHING LIKE THIS HAPPENS!!!

----------

## delta407

Alternatively, Gentoo could not officially recommend a kernel prepatch, which (though more stable than 2.5) is not nearly as stable as the latest kernel release (vanilla-sources, 2.4.1 :Cool: .

----------

## Ondrej

We are aware of the risks of using a non-official kernel, however the point of the message was to illustrate the need for portage to improve... how is it possible for a kernel bug to effectively and permanently disable part of portage's functionality (sandbox)?

----------

## delta407

Well,  when the kernel gets toasted, you really can't expect user-mode apps to stand a chance. When your system went up in flames, there wasn't much Portage could have done to recover. So, if Portage got killed in a not so opportune spot (sandboxing), strange and horrible things can result.

----------

## Ondrej

well  portage works for the most part ....  except sandbox  What is the danger of not using sandbox?  It is not explained in depth what exactly it does.

Also the interesting thing again is that even after re-emerging portage, it is still broken.  (One guy from the original thread even re-emerged his entire system!)  If you look again at the EXEC process trace it just dies when it does a ...  rm -rf ...   Wondering if XFS is then the problem... it was the __remove_inode_page function after all ...

----------

## bcressey

I encountered a very similar problem, I think. My system is a dual Athlon MP 1600+ running kernel-2.4.18-xfs; all of my partitions (except boot/swap) are XFS.

Last night while emerging sox my system died with a kernel crash. Having never seen one before I wasn't sure what to look for, although I believe it mentioned something about swap. 

There is an error message in my kernel logs which seems related:

Jun  8 00:02:00 [kernel] swap_free: Bad swap file entry 20747562

Jun  8 06:15:01 [kernel] swap_free: Bad swap file entry 07200720

I am inclined to doubt that this is caused by a faulty hard disk, since I'm running a mirrored RAID setup and haven't seen any warnings from either drive. 

Anyhow, after a reboot I couldn't emerge anything; sandbox would hang and begin devouring system resources. Following a suggestion in the other thread and README.RESCUE I unpacked the portage bz2 file, which had the nasty side effect of rendering my system unbootable.

At that point I booted off the rescue CD, xfs_repair'd each partition, and restored my backups of /usr, /bin, /sbin, /lib, /etc, and /var. They are only a few days old, and I know sandbox worked at the time I made the backups. 

However, sandbox still fails.  Anyone out there who can hazard a guess as to why? More importantly, how do I fix it, if restoring all the related files on top of the broken ones doesn't solve it?

Ben

----------

## Ondrej

Hi,

I've been trying to figure out what happened, and finally found the problem.

Once executed, sandbox stores its PIDs in /tmp/sandboxpids.tmp. It's apparent that as the kernel crash occured, sandbox was still active, and the last PID it ran as was stored. However, I am assuming that the file was never properly closed by the kernel, and the last PID line (the first line in the file) was corrupted. In my case, it looked something like "@@@@1234".

So, if the kernel crashes when something is being emerged within sandbox and, after reboot, portage doesn't work anymore, all that needs to be done to fix things is rm /etc/sandboxpids.tmp.

Why i didn't bother to search for all '*sandbox*' files and look at their content earlier is beyond me. It only cost my friend and I a whole reinstall and countless reboots. I hate my life.

Hope this helps!

Ondrej

P.S. In my case the kernel crashed during the configure step of rcs (revision control system). Has anyone else had problems emerging this package? I run SMP-enabled 2.4.19-gentoo-r7 without the preemption and no-latency patch. Thanks!

----------

## bcressey

Ahh, thanks! That's very useful to know.

As it happens, /tmp was the only partition I didn't restore. Go figure.

I also decided to reinstall. Whoops. At least if this ever happens again I'll know what to do.

Thanks again.

Ben

----------

