# Athlon MP DualCPU system crashes

## kybber

I've had some inexplicable errors on a dual Athlon system that I've installed Gentoo on. It used to run Windows XP and was unstable at times, something I hoped would be fixed by switching to Linux. However, I am now almost convinced there's a hardware error related to the CPU and/or cooling. 

Some basic info:

Once I caught an error flying across the screen saying something about APIC error on CPU X (I didn't catch the number...) The comp. reset itself right after.

Another time, ping/ssh suddenly didn't work. When hooking up a monitor, the console was active (flashing cursor at login prompt), though when I tried to log in, I got the following message: INIT: Id "c2" respawning too fast: disabled for 5 minutes

I've run memtest86 for half an hour without any errors.

CPU 1 is typically 6-10 deg. Celcius warmer than CPU 0 (59.5 vs. 51.5 at the moment), though they use the same type of heatsink/fan combos.

CPUinfo is different for the two chips:

 # cat /proc/cpuinfo |grep Ath

model name      : AMD Athlon(tm) MP 2400+

model name      : AMD Athlon(tm) MP

The other lines in /proc/cpuinfo are identical. Is there a possibility that I have a faulty MP or a modded XP for CPU 1? How can I verify that the chips aren't modded (they were privately purchased)?

Does anyone have any insight that could help?

----------

## pem725

I can't help you with the question of whether your chips were modified but I have run a dual MP-2000+ system for over a year with mixed results like yours and can comment on what makes my system unstable.  First, the power supply is almost always the point of failure now with these systems.  I went through four different 450W PS until I finally decided to cough up some bucks for an Antec 520 TruePower and will never go back to the generics again.  Since upgrading my PS, my system runs stable as can be with no errors.  Second, I added memory because like you I thought memory was the issue but memtest found nothing.  The increased memory has added a bit of stability for my big tasks that take cpu cycles but alone did not have the effect I anticipated (NB - I did this before my PS upgrade).  Finally, I replaced both CPU fans and added a few other fans to my rackmount case.  These changed the internal system temp dramatically - almost 15C!  I attribute my now rock-solid stable dualie to the new PS and cooling.  YMMV.

----------

## palminator

I have an Athlon Dual System with two fast scsi harddiscs (power consumers). Until now I have burned every power supply I used. At the moment I have two PS's, one for the board and one for the drives, but still the PS for the board is dying, due to the winter the PS is in stasis now, but I think I'll need a new one soon. So I think I'll try that Antec 520 TruePower too.

Think about giving it a try. IMHO a good indicator whether you're burning the ps is how hot the air is coming out of the power supply.

----------

## Radheya

I have some strange experiences with an athlon MP dual.

The MoBo is an ASUS a7m266-D

Randomly it's stop all services.. SSH, IMAP, POP3, crontab, HTTP, etc.

But when i go to see the console is up.

I dont know what's the problem but I'm sure that is not temp related CPU1 y CPU0 are 35 °C.

Can it be PSU related problem?

----------

## ronmon

I would look first at my CPU's if I were you. The disparity in temperatures and /proc/cpuinfo are a good clue. Also, what does your BIOS screen report? Something is amiss and I think that a teardown may be in order.

For background, I've been running SMP on Linux for a long time. A BP6 w/ 2x Celeron366a at 550, then a VP6 w/ 2x PIII800 at 1Ghz, and now an A7M266-D w/ 2x XP1800+ that I have "defogged" and run as MP's, but are not overclocked. The method I used to convert my CPU's (which I bought new) can be seen here. If you suspect that yours are converted XP's you will need to at least remove the heatsinks and take a close look. A magnifying glass will help.

Some people made this mod with a pencil, which can work, but I used a window defogger repair kit that I got from an auto parts store. Basically, it is conductive paint and works much better than a few crumbs of graphite. Mine worked the first time and have run flawlessly for more than a year. Just make sure that the bridge is joined well and nothing is slopped over to adjacent bridges.

When you reassemble, do a good job with the heatsinks and use a high quality thermal compound. If your CPU temps vary more that a few degrees, I would suspect a problem with airflow in your case. I had to add a 120mm blowing in on the side panel of my PC-60 to get my temps down to reasonable levels. With a total of eight 80mm, one 120mm and one 40mm fan, sound levels are another problem entirely ;)

Here's a screeny, the ambient temp is 25C and the windows are wide open (gotta love subtropical weather):

Gkrellm

----------

## bhartin

A)  The /proc/cpuinfo output you mentioned is normal.

B)  Are you using ECC/Registered RAM?  I have an older Tyan board, and despite it saying you can use up to 2 unregistered non-ecc sticks of RAM, it's a load of BS.  As soon as I put in a couple 1GB sticks of ECC/Reg memory, the system has been absolutely 100% stable.

C) Also consider your power supply.  I'm using an Enermax 550W, so I know from experience it's a very stable P/S.  If you just used whatever came with your case, that is a major possibility, as unless it's a high-end case sporting a high-end P/S, it's probably highly unstable.

----------

## SpectreCS

I've also had problems with my dual MP box:

Specs:

Dual Athlon MP 2400+

1 GB Corsair XMS ECC reg

Chaintec 7KDD mainboard

and some crappy PSU either 400 or 450w

Basically, I've had some stability issues with this system from the get-go. A good part of it was the RAM. I used to run super-econoline ram in this thing... yeah... bad idea. So I upgraded that and it helped some. I had my system running pretty darn stable in Windows XP. I tried switching over to Debian over the summer, and when compiling in SMP support, my system would hang at processor detection and reset itself. I tried with a few other linux users that I knew to get it to boot, but it just wouldn't. (This was back when I had the cheap ram). Since then I've upgraded and am once again trying to install linux (this time Gentoo w/ 2.6.1 kernel). SAME PROBLEM! It's driving me nuts, because I can boot off the liveCD with SMP support, and it has no problems, but when I run my own kernel, it craps out.

To summarize what I've tried so far:

Debian w/ kernel 2.4.22

new ram

Gentoo w/ kernel 2.6.1 with manual configuration

Gentoo w/ kernel 2.6.1 with genkernel (builds a kernel for single CPU support, it actually boots, but without SMP)

Gentoo w/ kernel 2.6.1 with genkernel with SMP built in (hangs and resets)

the only thing I haven't tried yet that was recommended on several threads and by some of my friends that helped out with the debian install was a new PSU. I wouldnt be surprised if that were the problem, cause I know for a fact I'm pushing this PSU to the limit. I'm going to try to run to CompUSA today and pick up that Antec and see if that helps. I'll post back if I have any progress.

----------

## SpectreCS

well, I upgraded from my standard 400w PSU to a 430W Antec which has a sustainable output of 410w. I can now boot a little further but it still is pretty much having the same problems. It took about 7 reboots in order for the kernel to recognize both processors and get through the rest of the hardware.

Now I have a kernel panic:

VFS: Cannot open root device "hda3" or hda3

Please append a correct "root=" boot option

Kernel panic: VFS: Unable to mount root fs on hda3

My /dev/hda3 is my root partition with ReiserFS, all the latest tools installed, and with Resier support built right into the kernel....

Any thoughts?

----------

## Corw|n of Amber

Maybe you have an "initrd" line in your grub.conf or lilo.conf? Remove that.

It will kill the bootsplash, but at least it boots.

I had that problem, solved it that way.

Edit : I assume you included ReiserFS support in your kernel, passed the right arguments to it in the bootloader's configuration, ...

----------

## bhartin

One more thing to check...

Have you tried removing the second CPU and booting with just one?  Then try the other CPU alone after that.

My dual system is actually a pair of Athlon XP 1700+ procs, NOT MPs.  The motherboard (Tyan Tiger MP) and both the motherboard and, thus, the Linux kernel report them as MP processors.

BTW, this particular setup, as I said before, is rock-solid stable.

----------

## SpectreCS

Well, I'm not quite sure if I want to remove my procs one by one, I remember while assembling it, the heatsink/fans was a real b**ch to get on. Also, these are both MP 2400 straight out of the plastic from AMD, so I'm assuming these are genuine MP's. 

I think I'm going to start my own thread seeing since I kinda took over someone else's

----------

## bhartin

 *SpectreCS wrote:*   

> Well, I'm not quite sure if I want to remove my procs one by one, I remember while assembling it, the heatsink/fans was a real b**ch to get on. Also, these are both MP 2400 straight out of the plastic from AMD, so I'm assuming these are genuine MP's. 
> 
> I think I'm going to start my own thread seeing since I kinda took over someone else's

 

Ahh okay, you shouldn't worry about it then.  Those AMD bubblepacks are such a MAJOR pain in the ass to open, I can't imagine anyone switching the procs without being obvious.

----------

## SpectreCS

I've started my own thread specific to my system here:

https://forums.gentoo.org/viewtopic.php?t=129258

if there is any helpful responses there, I will relay all relevant info to this thread as well.

----------

## kybber

Thank you all for your very helpful answers. I have so far not even considered the PSU as a potential source of the problem. But the PSU that I have is 360W, so if you are having problems with 450W supplies, then that is definitely a factor.

I have also detected that a sure way of killing the system is by running burnK7 (in the cpuburn package) and at the same time obtaining temperature information with mbmon. I dies almost every time I start 'mbmon -c 1'

I tried running two burnK7 processes at a time so as to fill both CPUs. The temperature then rose to above 70 C within 10-20 seconds and the mainboard alarm went off. I therefore conclude that cooling is also a significant factor here.

I don't think I use ECC RAM. At least memtest86 said no when set to autodetect. I will check this with the owner/builder of the comp.

So to sum up:

1. Test a better PSU

2. Improve cooling

3. Verify RAM (ECC)

I will post back when and if I find a way of stabilizing the system.

----------

## Luud

Well, just too add some more experience.

I'm running two Athlon MPs 2400+ on an ASUS A7M266-D motherboard. It has 1,5 Gb of Kingston ECC Reg. Memory.

1 FDD

1 CD-Writer

1 DVD-Writer

1 DVD-Rom

1 Iomega IDE Zip

1 120Gb IDE disk

1 20Gb IDE disk

1 ATI Radeon 9700 Pro

1 3Com 3C905 Tx network card

1 Abit HOTROD 100 ULTRA ATA RAID controller (HPT370)

All powered by a single Enermax 550W Power Supply (EG651P-VE FMA) in an Antec SOHO File Server case (PLUSVIEW 1000AMG).

No problem what so ever.  :Very Happy: 

 :Exclamation:  For the ASUS mobo: DO CONNECT ALL POWER CONNECTORS TO THE MOTHERBOARD, not doing so might make the system unstable.

 :Idea:  The 360W Power Supply is definitely insufficient. Just buying any bigger one is not always good enough, it depends on the output levels of the various rails (5V, 12V etc.) See the documentation of your motherboard. For the Asus mobo the requirements are quite clear in the documentation and the specs given by Enermax are very detailed.

I know the Enermax 550W powersupply isn't cheap, but I think it worth the money.

Good luck!

----------

## Luud

O, and I just rememebered.

I had to update the motherboard BIOS to handle the 2400+ versions of the Athlon MPs. You might want to check if the installed BIOS supports your CPUs.

----------

## kybber

Thanks for the additional information, Luud. The motherboard is an MSI K7D, and it appears that support for Athlons at my frequency do indeed require a BIOS flash. However, after flashing to the newest version, the BIOS got corrupted (even though I got _no_ error messages). I am able to boot a floppy so that the BIOS can be reflashed, but for some reason the awflash program doesn't respond, so something is definitely bust. The guys who originally built the computer (I'm just hosting it as a server) have taken it back and are looking into it now. 

Thanks all, I'll try and remember to get back to this thread when and if we find out what exactly was wrong.    :Smile: 

----------

## Luud

Well, this doen't sound very good. Bad hardware is always nasty.

I hope you (or the makers) get it fixed quickly.

Good luck! I'll keep my fingers crossed for you   :Wink: 

----------

## crazy-bee

I'm running a A7M266-D for almost 2 years now; I got one of the broken MPX chipsets where you have a special USB card, with 2x2400MP

From my experience, things said in this thread (brand PSU with at least 450, brand RAM) are very important. 

But another thing you should try is using a different VGA card if you run into stability problems. My system was perfect with an old TNT2, I upgraded to a ATI 9500 Pro and had nothing but problems after a while (the famous spontaneos shutdown problem). I tried everything, and ended up in buying a Geforce 5900 XT. Splendid. Running since 3 weeks without any hitch.

----------

## kybber

An update: At least one thing that turned out to be a problem is that there was no PS/2 mouse plugged in  :Shocked: . 

No, I'm not kidding, apparently it's an APIC-problem with the AMD 768 Chipset concerning unmasked interrupts.  :Wink: 

You can read about the problem on page 7 of the AMD PBC Revision Guide. There's also some more info obtainable from our good friend google, maybe in particular this discussion thread where e.g. Alan Cox contributes. 

The server has been running fine under Windows for a prolonged period of time on a 360W power supply. Is there any reason - any reason at all - why Linux might cause more stability problems due to the low output power of the PSU?

----------

## eriq

I have had serious issues with this board for over a year.  I personally think the A7M266-D is a piece of crap and Asus has lost my respect for releasing such a POS!!  For example, I finally got Gentoo installed on this beast, but it would NOT do a stage 1 install.  It would only accept a stage 3 install.  It is hilarious, I mean I can get compile errors on the same emerge 50 times and then come back an hour later and it will compile fine.  My temps are low and my RAM is known good.  During my Gentoo install I could not get this thing anywhere if I specified an SMP kernel at boot.  I have a PC Power & Cooling 510 watt ATX power supply that puts Enermax and Antec PS's to shame.  In order for me to get KDE installed I could not just "emerge kde" I had to specify all of the dependencies one by one and when i got to kdebase and libs I had to do it about five times.  If I specify fomit-frame-pointer then my system will absolutely refuse to compile ANYTHING at all.  I have one of the first revisions 1.03 with the broken MPX chipset.  It's almost like, my machine only compiles during business hours, which I think are from 7:00PM - 8:00PM!!  Seriously, it's a flakey A$$ POS and I'm buying a P4 mobo for my desktop system to replace my this setup.

----------

