# Computer freezes randomly.

## Perfect Gentleman

i7-4770K.

Computer freezes randomly, usually when idle or no load at all. It can re-emerge world and freeze after that. No kernel panic, just freeze, no input work at all.

I have no clue, but only one that kernels 4.3-4.4-git can cause that as it starts to happen at time installing 4.3. But with gcc-5.3 I can't boot to kernels older than 4.3.

wtf? what should i to investigate?

Please, help.

----------

## Keruskerfuerst

Can you some information about your hardware?

----------

## Perfect Gentleman

lshw - https://bpaste.net/show/da2a79211ac2

hwinfo - https://bpaste.net/show/38cbd10554e5

----------

## Keruskerfuerst

And the power supply?

----------

## Perfect Gentleman

http://www.pc-specs.com/psu/Aerocool/Aerocool_Strike-X_1100W/1695

----------

## Keruskerfuerst

You can calculate the power consumption of your computer here:

http://www.bequiet.com/en/psucalculator

I think you can change the language easily. The power supply is oversized.

Can you post the hardware specifcations a second time in a chart, since the output of lshw and hwinfo a hard to read.Last edited by Keruskerfuerst on Wed Dec 23, 2015 7:22 am; edited 1 time in total

----------

## Perfect Gentleman

shortly, power supply is more than enough as there is no discrete GPU, audiocard, just cpu and ram and 4 hdd.

RAM - Corsair CMD16GX3M2A2400C10 x2

CPU - i7-4770K

SSD1 - root - M.2- PCIe - PLEXTOR_PX-G256M6e - F2FS

SSD2 - SAMSUNG_MZ7WD240HAFV - F2FS

HDD1 - Hitachi_HUS724040ALE640 - XFS

HDD2 - Hitachi_HTS541010A9E680 - XFS

MB - ASUSTek Z97I - Plus

PS - Aerocool Strike-X 1100W

----------

## NeddySeagoon

Perfect Gentleman,

Boot into memtest for a few cycles.

If it passes clean all the time we have learned nothing.

If it returns random errors, its probably not the RAM.

If you get the same errors at the same addresses, that's useful info but it may not be RAM either.

memtest uses a lot more that just your RAM.  

Its not safe to assume everything else is OK because the tool is called memtest.

If you are overclocking - don't.

What does dmidecode tell us?

----------

## Perfect Gentleman

dmedecode - https://bpaste.net/show/64acefe36f53

NeddySeagoon, I ran memtest for 1 cycle - no errors, linpack (hpl) for an hour - no errors. 

Googling gives me info, that it can be X.

----------

## Keruskerfuerst

Is there any relevant information in the log?

dmesg or any other log?

Run Memtest: check output

You can replace the parts in your computer: one part after another.

I would replace the parts in the following order:

1. CPU

2. Mainboard

----------

## Perfect Gentleman

 *Quote:*   

> Is there any relevant information in the log?

 

no

 *Quote:*   

> dmesg or any other log?

 

no, errors or fails

 *Quote:*   

> Run Memtest: check output

 

run it tomorrow

 *Quote:*   

> You can replace the parts in your computer: one part after another.

 

it's impossible, no spare parts for thoseLast edited by Perfect Gentleman on Sun Dec 20, 2015 4:03 pm; edited 1 time in total

----------

## NeddySeagoon

Perfect Gentleman.

The only vaguely interesting part oy dmidecode is

```
Memory Device

   Array Handle: 0x003C

   Error Information Handle: Not Provided

   Total Width: 64 bits

   Data Width: 64 bits

   Size: 8192 MB

   Form Factor: DIMM

   Set: None

   Locator: DIMM_A1

   Bank Locator: BANK 0

   Type: DDR3

   Type Detail: Synchronous

   Speed: 2400 MHz

   Manufacturer: 0215

   Serial Number: 00000000

   Asset Tag: 9876543210

   Part Number: CMD16GX3M2A2400C10

   Rank: 2

   Configured Clock Speed: 2400 MHz

   Minimum Voltage: 1.5 V

   Maximum Voltage: 1.5 V

   Configured Voltage: 1.65 V
```

Your RAM is rated for 1.5v operation but you have it set to 1.65v.

I suspect its both overclocked and overvolted.

----------

## Perfect Gentleman

NeddySeagoon,  I've used XMP-profile for a long time, there were no problems with that.

----------

## NeddySeagoon

Perfect Gentleman,

That,s rather like the man who jumped off a tall building ...

As he passed the 13th floor he was heard to say  ... so far, so good.  

You have had no problems ... yet.

10% over volt is a lot.  That's getting to the point that permanent damage can be expected.  

Its not just the RAM either, its the CPU interface to the RAM that is over volted too.

10% on voltage is 21% on power at the same clock speed.  The on board PSU (and the metal box PSU) has to provide the extra power and still keep the ripple within limits.

21% is a lot.  Power is proportional to clock speed, so if you increase the clock speed that 21% over power goes up too.

If you want to deliberately operate your equipment outside its specified limits, that's fine.

You get to keep the pieces when you let the magic smoke out.

For debugging, do not over clock, not even accidentally.

----------

## szatox

I had a similar problem and I believe it to be gone now, though with randomness one can never be sure.

The thing I have changed was removing some unused hardware drivers from kernel. They used to be built as modules, but i have eventually completely removed them. In theory that should not matter at all, however my system didn't even freeze once afterwards, and it's been a few weeks already. So, I am not sure it helped, but it's already promising enough you may want to try it. (Drop modules for hardware you don't have)

Don't go after all hints at once though. You'd lose track of them and you'd never figure out what was that. Try resetting performance adjustments to factory defaults first, overclocking is known to be dangerous.

----------

## Keruskerfuerst

Can you test another OS?

Systemrescuecd or Ubuntu live mode?

Win or Solaris or Openinidiana?

----------

## Perfect Gentleman

@NeddySeagoon, I would agree that memory can cause it, but there would more hangs during merging which is not, only when idle.

This memory was sold as overclocked.

@szatox, I removed unnecessary modules long time ago, only modules for available hardware.

@Keruskerfuerst, I got only Gentoo installed. LiveCD is option, but I wouldn't like to use it cause there is needed software.

P.S. I rebuilt modules and x11-modules yesterday, and no freezing so far.

----------

## NeddySeagoon

Perfect Gentleman,

 *Perfect Gentleman wrote:*   

> This memory was sold as overclocked. 

 

I know that. Nobody makes 2400 memory.

It was tested in a test rig and passed in the test rig.  Its rather like running benchmarks says very little about real world performance.

The memory was not tested with your PSU, your motherboard nor your CPU.

It can still be the memory subsystem, which includes all the bits above, depending on how RAM is dynamically allocated.

The CPU switching in and out of idle stresses the motherboard PSU more than a constant load. That induces large transients in all the CPU related voltages.

I 'm quite confident, that as you say, its not the RAM, its the system level tolerance build up due to the way the system is being asked to perform.

If rebuilding software appears to fix it, that points to an error in the original build. 

The discussion above still holds good as you have no idea if all these emerges produced correct output.  Its not even ECC RAM, so errors (at any time) always go undetected.

-- edit --

I have a bridge for sale, in the middle of London  :)

----------

## Perfect Gentleman

NeddySeagoon, there is a point that overclocked memry can produces builds with errors, but then all my applications were segfaulted, hang without reason. And there is no any of those sympthoms.

----------

## steveL

Is it really so hard to clock the voltage back down for a week or two, to see how it goes?

Troubleshooting: are you using systemdbust?

----------

## NeddySeagoon

Perfect Gentleman,

That's a vast over simplification.

What about all the errors that don't lead to applications or the system stopping?

Applications still operate but produce incorrect results.

----------

## Perfect Gentleman

@steveL, of cause, not. I've already minimized overclocking, only XMP. No systemd.

@NeddySeagoon, that can be.

----------

## NeddySeagoon

Perfect Gentleman,

XMP is still overclocking. Turn it off.

----------

## Logicien

The man who jumped off a tall building was not superstitious. He haven't sense any bad omen even if he was passing the 13th floor. 

 :Very Happy: 

----------

## steveL

No "of course not" about it; that was the third post in a row, where you were arguing the toss with Neddy.

I realise this is meta-discussion, and you just want to get on and make your machine work again. Do so, by all means.

It just comes up a lot, on IRC especially, where people would rather puzzle out an argument in their heads, than attempt to do what the people they've come to for advice, suggest.

This is frustrating because the only satisfaction helpers get is from knowing that you made progress (and usually they'd like follow-up to know how, since they've already given over headspace to the discussion.)

The common example on IRC is people arguing about the fact that they don't need to quote "$1" or "$foo", because in this call they know the value is "safe".

Arguing with them is a travail that leads to support-burnout, which is why bots are so handy.

If there isn't a term for that stubbornness in the face of one's chosen helpers, there needs to be.

Sorry if I'm over-reacting to you specifically; it's not really about you. We all understand the need to know what went wrong.

The trouble is it's not very interesting for others to explore borked thinking, which far too often the ramblers end up going over afterwards in conversation, rather than on reflection.

That's sometimes appropriate for real-world conversation, but very rarely for textual communication, ime.

Again, please don't take this personally; I'm just chatting, more about IRC than the forums.

Good luck with it. :-)

----------

## Perfect Gentleman

7 hours of MemTest86 6.2 Free Edition - no errors.

Later try mprime.

----------

## krinn

Everybody goes into hw as first question Keruskerfuerst ask was about hw info.

On another note i would say it's always Keruskerfuerst's first question everywhere, like he wants to know anyone hw  :Smile: 

 *Perfect Gentleman wrote:*   

> I have no clue, but only one that kernels 4.3-4.4-git can cause that as it starts to happen at time installing 4.3. But with gcc-5.3 I can't boot to kernels older than 4.3.
> 
> wtf? what should i to investigate?

 

I would investigate first software, you seems to says it appears when you switch to 4.3 but was working fine prior 4.3.

If gcc-5.3 cannot do a <4.3 kernel, just use an older gcc (is that so hard to do, really?)

You can also just use a livecd with a >=4.3 kernel from any distro and see if your system freeze with it, allowing you to answer: "not hw problem", "not a 4.3 kernel problem", and shortening the problem to "kernel setting problem".

If your system keep freezing with livecd, go with a <4.3 livecd to shorten the problem to a kernel version (might be a regression)

So: if i were you i would dig what i have change in my kernel from 4.2 to 4.3 that lead to my computer freezing crazy.

----------

## Perfect Gentleman

@krinn, I see no point in using LiveCd as I won't use them properly.

Yep, before 4.3 i got no such problems.

I don't wanna go back to gcc-4.9

Using 4.4-rc6, no freeze so far.

Tested with mprime for 1.5 hour, no errors too.

Tomorrow test with mprime longer.

----------

## NeddySeagoon

krinn,

I think its a system problem.  Operational safety margins have been eroded here, there and everywhere and the system has producing errors every now and again.  I would expect taking any steps to improve the safety margins to correlate with an improved error rate.

Correlate does not mean that the change is the root cause. That will probably never be known.

----------

## Keruskerfuerst

If the computer freezes from time to time, the problem can not be found easily.

----------

## Perfect Gentleman

Guys, I think I've found the reason and resolved it. The problem was in F2FS for root, it'd been corrupted atfter several blackout. I thought that it's safe to use it for root, but it's not. Now root is on EXT4.

----------

