# Work around memory holes reading dmesg (w/o downtime)

## eddy89

Hi! I have a memory hole or something wrong in an exact address, or two   :Rolling Eyes: 

But now it's a production machine, and I can't lose time doing memtest and finding the addresses.

Once i did it and worked around using memmap=exactmap etc. in kernel command line.

I tried memtest=4 option in new kernel but it didn't worked.

The only thing I have to detect the address is the dmesg segfault error, that seems to report the address. 

```
apache2[5033]: segfault at b ip b7125e9b sp bfcc2120 error 4 in eaccelerator.so[b711f000+21000]

apache2[5061]: segfault at b ip b7125e9b sp bfcc2120 error 4 in eaccelerator.so[b711f000+21000]

apache2[6298]: segfault at b ip b7125e9b sp bfcc2120 error 4 in eaccelerator.so[b711f000+21000]

apache2[7389]: segfault at 9818c880 ip b7e58876 sp a11b0d98 error 4 in libc-2.8.so[b7de6000+134000]

apache2[7407]: segfault at b ip b71cee9b sp bf868d30 error 4 in eaccelerator.so[b71c8000+21000]

apache2[7446]: segfault at b ip b71cee9b sp bf868d30 error 4 in eaccelerator.so[b71c8000+21000]

```

My question is, from these errors (and I can report others), and from the map provided during the kernel initialization, 

```
BIOS-provided physical RAM map:

 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)

 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)

 BIOS-e820: 0000000000100000 - 000000003fbf0000 (usable)

 BIOS-e820: 000000003fbf0000 - 000000003fbf3000 (ACPI NVS)

 BIOS-e820: 000000003fbf3000 - 000000003fc00000 (ACPI data)

 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)

```

Can I make a memmap, like this: 

```
memmap=exactmap memmap=299M@1M memmap=473024K@302M
```

 (that used to work with old ram configuration)

maybe losing some megs of my ram, around holes and too much fragmented little zones? 

Can you help me calculating that with a bit of safety that at least the pc can reboot and return immediately on-line? 

(This is a remote computer)

Thank you   :Wink: 

----------

## Hu

Why do you think this is a RAM corruption problem?  Segmentation faults can be an indication of software errors.  Since they all happened in the same family of processes, and almost all at the same address, I would suspect eaccelerator before I suspected a hardware problem.

That data from dmesg just tells you how portions of the system address space are reserved for administrative use.  It has no direct meaning in the context of defining what areas are and are not prone to hardware induced corruption.

----------

## gentoo_ram

The addresses shown by the kernel at boot are physical addresses.  The addresses shown in the syslog dump are virtual addresses in that process.  They don't have anything to do with one another.  If that's the only program that's crashing, I'd say it's a software problem, not a hardware one.

----------

## MaximeG

Hi,

Just spend a couple minutes for memtest, or memtester please.

At least you'd know what the problem is in order to find the best fitting solution.

Regards,

Maxime

----------

## eddy89

So, I'm sure of 2 things: all of you are right, there is a software problem in eaccelerator, that I found googling, but I'm also sure that there is a ram corruption, for example look at this line:

```
apache2[7389]: segfault at 9818c880 ip b7e58876 sp a11b0d98 error 4 in libc-2.8.so[b7de6000+134000]
```

This is not eaccelerator but is libc, and the error is different from the others.

And I tried memtester and it fails sometime, but I don't know how to connect the virtual address to the physical one.

There is a way to do so, or the only solution is running memtest86+?

----------

## Mad Merlin

On multicore machines, eaccelerator tends to segfault under high concurrency. I've been able to reproduce it fairly consistently on a number of different (actually, identical specs, but physically distinct) quad core machines. For me, the solution was to use APC instead of eaccelerator.

----------

## eddy89

This is not a multicore machine, anyway i'll take a look about this APC.

----------

