# cpu dying?[SOLVED]

## EasterParade

I need an advice on how to test the health of a cpu.

This core2duo seemingly lost a core:

```
 cat /proc/cpuinfo 

processor       : 0

vendor_id       : GenuineIntel

cpu family      : 6

model           : 15

model name      : Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz

stepping        : 6

cpu MHz         : 1869.731

cache size      : 2048 KB

physical id     : 0

siblings        : 1

core id         : 0

cpu cores       : 1

apicid          : 0

initial apicid  : 0

fpu             : yes

fpu_exception   : yes

cpuid level     : 10

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc up arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow

bogomips        : 3739.46

clflush size    : 64

cache_alignment : 64

address sizes   : 36 bits physical, 48 bits virtual

power management:

```

Also dmesg sees only one cpu.

These a bad times for me at the moment.   :Crying or Very sad: 

Nothing works out the way it should. And now this cpu seems to be dying .

Am I right? 

I have already checked processor type and family in the kernel; should be fine. 

What´s going on here?

If the second core has died how can I tell when the first dies as well?

And how can I avoid the trouble that bodes now!?   :Crying or Very sad: Last edited by EasterParade on Sun Mar 28, 2010 10:18 am; edited 1 time in total

----------

## NeddySeagoon

transsib,

Do you gave SMP enabled in the kernel ?

Wha does uname -a show 

```
Linux NeddySeagoon 2.6.33-gentoo #1 SMP PREEMPT Sun Feb 28 17:13:48 GMT 2010 x86_64 AMD Phenom(tm) II X3 720 Processor AuthenticAMD GNU/Linux
```

The SMP part is essential, or only once core will be used.

```
siblings        : 1 
```

suggests that the system thinks there is only supposed to be one core.

Do you have any BIOS settings that can disable SMP mode?

----------

## EasterParade

```
14:07:47 root@queequegs /home/both # uname -a

Linux queequegs 2.6.32-gentoo-r5 #11 SMP Thu Mar 4 16:30:15 CET 2010 x86_64 Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz GenuineIntel GNU/Linux

```

And Neddy, if SMP is vSMP and is located in processor type and features then my answer has to be "NO".

I am making the new new kernel as I am writing these lines and I DO HOPE that your suggestion to go have a look at

that feature will bring up the second core again or I´ll be up s*** creek without a paddle here.   :Shocked: 

It is our home server - queequegs - we dump all files there that all of us like to share. I´ve built it, installed Gentoo on queequegs, and recently inaugurated a new kernel version. Which meant I could not copy an old .config and make oldconfig here.

So I do hope that I just forgot to set vSMP and all´s well....

EDIT

Nothing is well! Only one core... the cpu IS dying.  What else could be missing other than the second core?

He´ll kill me...Last edited by EasterParade on Sat Mar 27, 2010 2:26 pm; edited 1 time in total

----------

## pilla

No, it is CONFIG_SMP (without the v).  I remember some weird issues with a core 2 duo when I forgot some option (not CONFIG_SMP, I think it was something about APIC).

You could try a Pappy's seed for your kernel configuration to have a sane starting point.

----------

## dE_logics

Why not just boot into windows (if you have it installed) or Ubuntu (or any nice distribution) live CD to see if the 2 cores exist?

It's very unlikely for a CPU to die. Specially this way.

----------

## EasterParade

I´ll have to boot a LiveCD as there is no Windows on queequegs and I´d have to do it

without rousing suspicion which means I have to do it when someone looks the other way   :Embarassed: 

I am now making a kernel from Pappys Seeds. 

I am loosing hope right now...  :Crying or Very sad: 

I´ll come back later as soon as I know something new even for me  :Confused: 

----------

## RedSquirrel

 *transsib wrote:*   

> It is our home server - queequegs - we dump all files there that all of us like to share. I´ve built it, installed Gentoo on queequegs, and recently inaugurated a new kernel version. Which meant I could not copy an old .config and make oldconfig here.

 

You can reuse your old config. The "safe" way to do that is to copy the old config to the appropriate location, run 'make menuconfig', and go through the configuration carefully, checking for NEW items/sections.

----------

## EasterParade

It is hopeless.

 *Quote:*   

> Do you have any BIOS settings that can disable SMP mode?

 

Good question. I´ll go have a look and then to my last resort: a livecd

----------

## EasterParade

The BIOS too knows only one core.

Cpu temp is 40°C according to BIOS and sensors tell me of 39-40°C.

So if one core is dying it isn´t due to overheating.

I usually log in via ssh but now I have plugged in a monitor and keyboard.

When I log in there it says "bash: no job control in this shell"

What the hell does that mean now?

If the BIOS doesn´t see the second core booting a Livecd doesn´t make

things better.

There is nothing else I can do at the moment, just cry and wait until

my home server completely dies of heart break   :Crying or Very sad: 

----------

## NeddySeagoon

transsib,

Not all Core2 CPUs actually have two cores honest!

There is nothing in the /proc/cpuinfo to be able to tell, so maybe you are only supposed to have one core?

----------

## Nerevar

 *NeddySeagoon wrote:*   

> transsib,
> 
> Not all Core2 CPUs actually have two cores honest!
> 
> There is nothing in the /proc/cpuinfo to be able to tell, so maybe you are only supposed to have one core?

 That's true, but I don't think Intel has a 1.86GHz Solo.

It's more likely the dual core E6300:

http://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessors

----------

## EasterParade

I am not imagining things: E6300 is not a biggie but it´s got two cores.

I remember having seen two cpu being brought up both in BIOS and kernel.

cat /proc/cpuinfo for a E6300 should look like this:

```
17:47:27 liki@aldebaran ~ $ cat /proc/cpuinfo 

processor       : 0

vendor_id       : GenuineIntel

cpu family      : 6

model           : 23

model name      : Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz

stepping        : 10

cpu MHz         : 2999.715

cache size      : 6144 KB

physical id     : 0

siblings        : 2

core id         : 0

cpu cores       : 2

apicid          : 0

initial apicid  : 0

fpu             : yes

fpu_exception   : yes

cpuid level     : 13

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority

bogomips        : 5999.43

clflush size    : 64

cache_alignment : 64

address sizes   : 36 bits physical, 48 bits virtual

power management:

processor       : 1

vendor_id       : GenuineIntel

cpu family      : 6

model           : 23

model name      : Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz

stepping        : 10

cpu MHz         : 2999.715

cache size      : 6144 KB

physical id     : 0

siblings        : 2

core id         : 1

cpu cores       : 2

apicid          : 1

initial apicid  : 1

fpu             : yes

fpu_exception   : yes

cpuid level     : 13

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority

bogomips        : 5999.64

clflush size    : 64

cache_alignment : 64

address sizes   : 36 bits physical, 48 bits virtual

power management:

```

I know you just want to comfort but I know when the game is over.

Now I´ll just have to devise a method of changing the cpu before anyone checks it is missing

respectivly money is...

----------

## NeddySeagoon

transsib,

What does dmidecode say about your CPU?

----------

## EasterParade

 *Quote:*   

> What does dmidecode say about your CPU?

 

Good morning Neddy. I will see to this as soon as possible; sorry for not 

following anymore yesterday but I have already giiven up.

Problem is that the older cpus are no longer on the market and

it should work with a Asus P5W64 WS Professional. The board is

fine but doesn´t work with latest steppings in 45nm cpus at least not with the

core2.

Could it be a BIOS corruption? I know this question is naive but at

least a straw.

Output of dmidecode

http://pastebin.com/dAaStNW9

Interesting - lots of information...but little a dummy like me could decypher   :Wink: 

Looks like the cpu works on half power, like it was throttled. But why is it called

Pentium 4 and version core2duo e6300?

I clearly remember two penguins as bootup logo, one for each core.

At least I can still wake queequegs remotely.

----------

## EasterParade

Now I am really stumped:

the BIOS version was 1201, newest and latest. The board is old and long since off the

market ( 2007 ).  So there is no newer version and despite that I just now flashed it

using the same version 1201.

And here they are again: 2 cores for the cpu 

```
12:12:20 both@queequegs ~ $ cat /proc/cpuinfo 

processor       : 0

vendor_id       : GenuineIntel

cpu family      : 6

model           : 15

model name      : Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz

stepping        : 6

cpu MHz         : 1596.000

cache size      : 2048 KB

physical id     : 0

siblings        : 2

core id         : 0

cpu cores       : 2

apicid          : 0

initial apicid  : 0

fpu             : yes

fpu_exception   : yes

cpuid level     : 10

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow

bogomips        : 3739.30

clflush size    : 64

cache_alignment : 64

address sizes   : 36 bits physical, 48 bits virtual

power management:

processor       : 1

vendor_id       : GenuineIntel

cpu family      : 6

model           : 15

model name      : Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz

stepping        : 6

cpu MHz         : 1596.000

cache size      : 2048 KB

physical id     : 0

siblings        : 2

core id         : 1

cpu cores       : 2

apicid          : 1

initial apicid  : 1

fpu             : yes

fpu_exception   : yes

cpuid level     : 10

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow

bogomips        : 3739.73

clflush size    : 64

cache_alignment : 64

address sizes   : 36 bits physical, 48 bits virtual

power management:

```

Reflashing the BIOS was just a wild goose chase, a straw I attached my hope to  :Confused: 

I don´t understand. O.k., I don´t know the history of BIOS updates on this board. But how can a BIOS corrupt when 

it is in flash rom and how come a BIOS doesn´t see the systems´ cpu correctly suddenly. 

I remember having seen the two ducks in bootlogo only in february when I installed Gentoo.

Has anyone ever seen such a behaviour?

Please, don´t think that I should have my head examined or my eyes checked now. The system really missed the second core.

It is back again. Thanks for your support all of you   :Smile: Last edited by EasterParade on Sun Mar 28, 2010 3:30 pm; edited 1 time in total

----------

## NeddySeagoon

transsib,

The bits in a FLASH memory are stored as trapped charge.  There are several sorts, most store several bits per memory cell by storing different amounts of charge. Two bits per cell are common.

While the data retention times for FLASH memory are very good, its never infinite - the charge tends to leak away over time.

I have known well used FLASH memory have a data retention time of a week, which was adequate for the application I needed but no use at all for a PC BIOS.

So while I have seen this behaviour - never in a PC BIOS, where it is flashed so rarely.

I am surprised you didn't get a BIOS Checksum Error message from the POST, as almost every bit pattern error can be detected.

----------

## asturm

 *transsib wrote:*   

> I´ve built it, installed Gentoo on queequegs, and recently inaugurated a new kernel version. Which meant I could not copy an old .config and make oldconfig here.

 

You can always reuse your old .config. I can trace back the origins of my current .config to kernel version 2.6.17.

cp /usr/src/old-kernel/.config /usr/src/new-kernel/ && cd /usr/src/new-kernel/ && make oldconfig && make && make modules_install

make oldconfig will detect any config changes and ask you what to do for each. This makes major upgrades really painless.

----------

## EasterParade

Neddy,

 *Quote:*   

> While the data retention times for FLASH memory are very good, its never infinite - the charge tends to leak away over time.
> 
> I have known well used FLASH memory have a data retention time of a week, which was adequate for the application I needed but no use at all for a PC BIOS. 

 

Isn´t flash memeory and the bios chip on a mobo the same? I am not certain whether I fully understand.

 *Quote:*   

> I am surprised you didn't get a BIOS Checksum Error message from the POST, as almost every bit pattern error can be detected.

 

Me too and I don´t like it a bit. I don´t like that I cannot explain what really happened, why the BIOS failed like this.

And it may mean that this can happen again any time. At least that´s what I fear mainly because I don´t know the

reason.

----------

## NeddySeagoon

transsib,

Yes. The BIOS uses FLASH memory.

You can be sure that it will happen again, or something like it.

Next time some of the change leaks away, it may affect something entirely different. Also, its not possible to predict when its likely to happen.

----------

## EasterParade

There must be a way to prevent it. Is it CMOS battery? But I have never seen a battery fail.

Changing the flash chip should be prblematic. It is socketed but I doubt a change is easy. 

I may not even be able to buy one for this mainboard any more. 

And all this would not make sense as long as I don´t know what causes this behaviour.

 *Quote:*   

> The bits in a FLASH memory are stored as trapped charge. There are several sorts, most store several bits per memory cell by storing different amounts of charge. Two bits per cell are common.

 

You told me that information gets lost like this but there must be something I have done with this

system that started this bitwise loss of vital data. 

Oh, btw, was it a short on the board?? 

 :Rolling Eyes: 

----------

## NeddySeagoon

transsib,

The problem with changing the BIOS FLASH is they new ones will be provided blank and if you fit a blank one the system wont boot, so you cannot program it.

You need the help of someone who can program the replacement before you fit it.

Hot swapping the BIOS chip will probably destroy something.

Its nothing you have done. It may not even be hardware failure at all. I can be that it when you flashed the BIOS last, the programming was marginal. Good enough to pass the tests but not good enough to last, so it may be fixed now, whch is why I said you cannot know how long it will last.

Its certainly not the CMOS battery.  Thats used to power the clock when the mains supply is disconneted and to retain the BIOS settings when the battery is the only power in the system. I have never seen an battery fail with an ATX system as the 5vSTBY is normally present unless the machine it unplugged. CMOS battery life on AT systems was normally a few years of normal use. The first sign of failure is the clock losing time when the system is powered off. Later, BIOS settings are lost.

You can throw the battery away if you set the clock and enter the BIOS settings every boot. Indeed, I've run a few machines like that when they were rarely powered down.

----------

## EasterParade

 *Quote:*   

> Its nothing you have done. It may not even be hardware failure at all. I can be that it when you flashed the BIOS last, the programming was marginal. Good enough to pass the tests but not good enough to last, so it may be fixed now, whch is why I said you cannot know how long it will last. 

 

Ah, it could be a process over many months? So if someone had updated BIOS via Asus Update from within Windows

for example something one shouldn´t do ( I put the rom file on a usb stick and flash from within EZflash ), then the

programming could have been slightly incomplete, couldn´ t it?

Or is that one of the reasons why you should flash a BIOS only if it is really necessary because any flash procedure

could go wrong, more or less at least?

@ genstorm

 *Quote:*   

> You can always reuse your old .config. I can trace back the origins of my current .config to kernel version 2.6.17.
> 
> cp /usr/src/old-kernel/.config /usr/src/new-kernel/ && cd /usr/src/new-kernel/ && make oldconfig && make && make modules_install
> 
> 

 

Thanks for the reassurance. Many people are opposed to this method if the new kernel has too many changes 

and differs in more than the revision number. They say that using the oldconfig method with the recycled .config

of the old kernel is insecure if the changes are not minor.

----------

## NeddySeagoon

transsib,

The danger in flashing the BIOS is that step one is erasing the BIOS, then the new BIOS is written to the blank BIOS chip.

If the power fails before this process is complete (it takes several minutes) the system will not boot.

To get round this, some motherboards have two copies of the BIOS, others have a BIOS Flash failure 'rescue' mode.

The latter divides the BIOS ROM into two parts, only one of which is erased for the update. The unerased part can do no more than flash the BIOS from a floppy disk. There are no user prompts, no on screen messages and the floppy must have all the right filenames on it so it just works.

Its quite possible that your problem was set in motion when the BIOS chip was factory programmed.

----------

## EasterParade

 *Quote:*   

> The danger in flashing the BIOS is that step one is erasing the BIOS, then the new BIOS is written to the blank BIOS chip.
> 
> If the power fails before this process is complete (it takes several minutes) the system will not boot. 

 

which is the worst case scenario; then the system is braindead.

 *Quote:*   

> To get round this, some motherboards have two copies of the BIOS

 

=the Ferraris among mainboards

 *Quote:*   

> The latter divides the BIOS ROM into two parts, only one of which is erased for the update.

 

My board (P5N32E-SLI) does the erasing/flashing in parts. You can watch it alternately erase and rewrite the flash chip. Is it what you describe here? But truely watching is all one can do which is why I often have sweating hands while flashing

a BIOS.  :Wink: 

 *Quote:*   

> Its quite possible that your problem was set in motion when the BIOS chip was factory programmed.

 

Very unsettling!

Thank you NeddySeagoon

----------

## NeddySeagoon

transsib,

Different FLASH chips support different erase mechanisms. They differ only in the size of the chunk they erase in one go.

They all have one thing in common - erase is a very slow process.

The ones with a 'blind' floppy flash mode have a small region that is never erased during a BIOS flash, regardless of how they do the BIOS erase and program steps.

----------

