# MLdonkey 2.6.4 keeps crashing without feedback [SOLVED]

## Master One

Yesterday I upgraded from 2.5.16 to 2.6.4, because I recently had some troubles with the core crashing all the time. Unfortunately the situation did not get any better.

I tried to investigate that matter the whole afternoon, but it didn't bring me any step forward.

There seem to be absolutely no hints in the system-log (/var/log/messages) or in the mldonkey-log (/var/log/mldonkey.log).

mlnet (running on a server with gentoo-sources-2.6.12-r10 / gcc 3.4.4-r1 / glibc 2.3.5-r1 / nptl / mldonkey emerged with USE flags "gd" & "threads") generally works, but always suddenly crashes after a seemingly random period (can be after half an hour, or even some hours). In that case sancho-gui (0.9.4-47 running on a WinXP workstation) disconnects, and the mlnet process on the server just disappears without further notice (strangely can only be restarted , after I delete the mlnet.pid file in the mldonkey-home-dir, no idea why that is, because I didn't have to do this with mldonkey 2.5.16, and I even can't remember, that the mlnet.pid file was stored in the home-dir -> isn't that what /var/run is supposed to be there for?).

I have emerged mldonkey 2.6.4 normally, since it is in portage now, and before that, I upgraded ocaml to 3.08.3 the same way (so emerged it from portage, and not using the "batch" USE-flag).

As already mentioned, there is no hint, why mlnet crashes, and what exactly happens then. The only abnormal messages in mldonkey.log are the repeating lines of "[BT] Unknown BT client found please report the next line to the dev team: BTUC:.....", also I do not expect this to be causing the problem.

I already searched bugs.gentoo.org, and found bug #103411, but that one is about a memory problem, which does not occure here (mlnet just only stays at a memory usage of about 6% -> that machine has 1 GB RAM).

The only changes I made recently, was playing arround with the NICE setting in /etc/conf.d/mldonkey, which was set to "19" by default. At first I lowered that setting to "3" and then to "0", because I thought, it may have something to do with CPU usage. That machine has a P4 2.4, but I let the ondemand CPU govenor scale it down to 300 MHz on low load. It could be a coincidence, but I think, lowering the NICE value really helped, so that the number of crashes went down (means I have the feeling, that the periods between the crashes have become longer).

I use mldonkey only on bittorrent at the moment, all other protocols are deactivated. Could it be, that mldonkey can be killed by "fake"-datapackages, "hostile"-uploaders or "hostile"-clientsoftware?

Those crashes did not appear in the past. When I started with 2.5.16, the core ran stable for days without any problem. It really only got worse within the past few month, that's why I thought it may be an influence from outside (changes in the BT protocol, or problems with other client-software of uploaders). The upgrade of ocaml and mldonkey itself did not help at all.

On the mldonkey forums it was suggested, that it could be a Gentoo problem, because such an issue is not known on other Linux or *BSD distributions.

Isn't there any possibility of analysing that problem any further, so why the core crashes without any hints in the logs and seemingly after a random period? I would expect, that traces remain somewhere in the systems, when a process disappears.

Hopefully someone has any idea concerning this matter, or is fighting with the same problem, so that this issue can be solved with collective thinking. The actual situation is very depressing, I use Gentoo on all my machines, and the mentioned server also handles some other services, so swapping to another distribution (or even FreeBSD) is not possible. I can't believe, that it stays with "mldonkey simply does not work on Gentoo linux".

P.S. For anyone, who cares: I just submitted a report to bugs.gentoo.org, it can be found here.

----------

## dmvianna

I get the same thing. I am only hooked to the Edonkey network. The same problem appeared both in 2.5.16-r9 and 2.6.0. And yes, I never had that problem when using mldonkey on a Fedora 3 Celeron box, which had some 500 Mb RAM and cold barely use a naked X.   :Sad: 

----------

## Master One

 *dmvianna wrote:*   

> I get the same thing. I am only hooked to the Edonkey network. The same problem appeared both in 2.5.16-r9 and 2.6.0. And yes, I never had that problem when using mldonkey on a Fedora 3 Celeron box, which had some 500 Mb RAM and cold barely use a naked X.  

 

Those strange mlnet crashes were also reported by others, but especially by Gentoo users. I already tried some special settings provided by TripleM (from the German mldonkey forums) concerning /etc/security/limits.conf & /etc/sysctl.conf, but those didn't change anything...

I am confident, that there has to be something else, that can be done, to investigate this problem.

----------

## dmvianna

I've emerge -C mldonkey and installed the precompiled core from 

http://download.berlios.de/pub/mldonkey/spiralvoice/cores/Linux/mldonkey-2.6.4.static.i386-Linux.tar.bz2. 

It crashes the same way. The 2.6.4 precomp core logs 

```
2005/09/13 22:53:39 [cF] Checksum computation failed: Exception: os_read failed: Input/output error
```

before dying.

----------

## Master One

@dmvianna

That has to be another problem, because my issue does not result in any error message.

In the meantime, I tried some different things:

- Recompiled ocaml 3.08.3 and mldoney 2.6.4 with the following settings:

```
CFLAGS="-O1 -march=pentium4 -pipe -fomit-frame-pointer"

MAKEOPTS="-j1"

FEATURES="-ccache -distcc"
```

- Added the following system settings:

/etc/security/limits.conf

```
*               soft    nproc           4096 

*               hard    nproc           16384 

*               soft    nofile          4096 

*               hard    nofile          65536
```

/etc/sysctl.conf

```
kernel.shmall = 2097152 

kernel.shmmax = 2147483648 

kernel.shmmni = 4096 

kernel.sem = 250 32000 100 128 

fs.file-max = 65536
```

I don't know, if any of these measures helped, but it seems to be more stable again. The actual uptime of the core is one day, before that it was about 9 hours (then it crashed again after adding some new torrents).

BTW Since the upgrade to 2.6.4, I (again) have the problem with those phantom-commits. When a file-download is finished, commited and moved from the incoming-folder to the final destination, files with the same name and a size of 0 KB keep showing up in the incoming-folder. No idea what's that all about...

----------

## dmvianna

Tried more stuff, no luck. Now I'm on 2.6.4-r1 (Gentoo ebuild), and it's lasting more than a couple of minutes again. Now I get

```
2005/09/15 18:16:46 [cWeb] contact.dat loading from http://download.overnet.org/contact.dat

2005/09/15 18:16:46 [cWeb] guarding.p2p loading from http://www.bluetack.co.uk/config/antip2p.txt

2005/09/15 18:16:46 [cWeb] server.met loading from http://www.gruk.org/server.met.gz

2005/09/15 18:16:47 [DNS] Resolving [download.overnet.org] ...

2005/09/15 18:16:48 [DNS] Resolving [www.bluetack.co.uk] ...

2005/09/15 18:16:48 [DNS] Resolving [www.gruk.org] ...

2005/09/15 18:16:53 [DNS] Resolving [dialspace.dial.pipex.com] ...

2005/09/15 18:16:59 [EDK] server.met loaded from http://www.gruk.org/server.met.gz, 56 servers found, 3 new ones inserted

2005/09/15 18:17:05 [Overnet] contact.dat loaded from http://download.overnet.org/contact.dat, added 500 peers

2005/09/15 18:18:42 [HTTPsv]: Exception write failed: Broken pipe in request_handler

2005/09/15 18:18:49 [HTTPsv]: Exception write failed: Broken pipe in request_handler

2005/09/15 18:19:00 [HTTPsv]: Exception write failed: Broken pipe in request_handler

------------- End of log
```

But it still works, so it really should be something else. Let's see what happens tomorrow. I still didn't see it running more than 24 h, and yes, mine crashes without logging too. Usually when it logs, it means it will stay up for some time. If it doesn't then it goes packing before I count to 60.   :Razz: 

----------

## katharsis

I've installed 2.6.4 when it was released and it ran continously since then.

No crashes whatsoever, and no weird settings in /etc.

The 0kb-file issue happens here as well, most oftenly with files downloaded from bittorrent.

That's how stuff looks here:

katharsis mldonkey # emerge -pv net-p2p/mldonkey dev-lang/ocaml

These are the packages that I would merge, in order:

Calculating dependencies ...done!

[ebuild     U ] net-p2p/mldonkey-2.6.4-r1 [2.6.4] -batch -doc -gd -gtk +gtk2 -guionly -mozilla -threads 0 kB 

[ebuild   R   ] dev-lang/ocaml-3.08.3  -latex +tcltk 0 kB

----------

## Master One

I forgot to mention, that I have set the cpufreq-govenor to "performance" since the last crash, so maybe all the other settings have no influence at all, and it was all about the P4 frequency throttling. I will do some more test with the ondemand govenor, as soon as I find the time (I really would like to have that working, the ondemand govenor works really well for all the other stuff, and why let that machine run on 2.4 GHz 24/7, if it also can operate at only 300 MHz, when load is low).

----------

## dmvianna

By the way, what's ocaml for? I don't have it installed.   :Rolling Eyes: 

----------

## dmvianna

24 h have passed and... It's alive! There are odd things happening, like the web interface is unusually slow -- I mean it -- and the up/download rates haven't been impressive lately... But it's alive!   :Cool: 

----------

## Master One

I think the problem is solved:

It was indeed the "ondemand" CPU govenor!

I have reversed the mentioned system changes, updated to ocaml 3.08.4 and mldonkey 2.6.4-r1 (both compiled with my systemwide standardsettings), and switched to the "performance" CPU govenor. Since that, mlnet runs without interruption for days without crash.

Because I used the "ondemand" CPU govenor for quite some time, and it did not cause any problems at the beginning, I think, that something changed with one of the last kernel-upgrades.

The only remaining problem is now, that I still get phantom-files with a size of 0 kb in the incoming folder after a commit. That's not really tragical, but nevertheless annoying.

----------

## chiwi

I have the same issues here....!

----------

## dmvianna

Try out the most recent unstable version. Mine is 2.6.4-r1. Before that one, all ebuilds died without reason. That one works. Salud!   :Wink: 

 :Shocked:   Carajo. 2.6.4-r2 is already out and stable. I should upgrade...    :Embarassed: 

----------

## chiwi

2.6.4-r2 is the version i currently have and the one is failing....i remerge it with batch flag set....not so sure what it is, it says "enable internaly ocaml build"...let's try it.

saludos

----------

## dmvianna

Are you using symbolic links by any chance? I did not use ocaml at all... check ~/.mldonkey/mlnet.log

----------

## spiralvoice

 *dmvianna wrote:*   

> By the way, what's ocaml for?

 

its the language mldonkey is written in.

----------

## hobbes27

 *chiwi wrote:*   

> 2.6.4-r2 is the version i currently have and the one is failing....i remerge it with batch flag set....not so sure what it is, it says "enable internaly ocaml build"...

 

Even 2.6.7 is failing here   :Confused: 

But mldonkey-mulus https://bugs.gentoo.org/show_bug.cgi?id=100060 works perfectly without any crashes   :Smile: 

----------

## dmvianna

Could yous please check this? Do yous get any strange /var/log/messages?

I've been having system hangs after about one day of mldonkey running. Essentially, the main HD goes crazy, functioning without rest, and the system hangs. mldonkey is not saving stuff on this HD, though, but on an external one connected to a slow USB 1.1 port. And /var/log/messages give me

```
Nov  7 08:27:00 thinkpad ASC=0x11 ASCQ=0x0

Nov  7 08:27:00 thinkpad end_request: I/O error, dev sda, sector 101193343

Nov  7 08:27:07 thinkpad SCSI error : <0 0 0 0> return code = 0x8000002

Nov  7 08:27:07 thinkpad sda: Current: sense key=0x3

Nov  7 08:27:07 thinkpad ASC=0x11 ASCQ=0x0

Nov  7 08:27:07 thinkpad end_request: I/O error, dev sda, sector 101193343

Nov  7 08:27:14 thinkpad SCSI error : <0 0 0 0> return code = 0x8000002

Nov  7 08:27:14 thinkpad sda: Current: sense key=0x3

Nov  7 08:27:14 thinkpad ASC=0x11 ASCQ=0x0

Nov  7 08:27:14 thinkpad end_request: I/O error, dev sda, sector 101193343
```

Now what I find intriguing is, I had that HD connected to a slower CPU before, running Fedora 3, and over another USB 1.1 port, and mldonkey never ever crashed it. How can it be possible that Gentoo is not being able to be as stable as Fedora? I can't get this.

----------

## dmvianna

Solved it. Data corruption, of course. Mldonkey's issue is solved, plus, the HD is much faster now, and so are the interfaces to the program. E2fsck rulez!   :Twisted Evil: 

----------

