# 2.6 Kernel can't handle 4GB RAM???

## peaceful

THE STORY

I have three nearly-identical 1-U, P4-3GHz servers.  One has 4GB RAM, the other two have 2GB RAM each.

I set up a Gentoo base system on one of the 2GB machines, and then booted off a live CD and used dd to clone the disk onto two other identical drives that I put in the remaining 2GB and 4GB machines.

I then added "threads" to my USE flags on the 4GB machine and re-emerged dev-lang/tcl-8.4.9 (I need to use threaded tcl on that machine)

Next, I emerge subversion on all 3 machines simultaneously.  They all download the files fine, but then the 4GB machine slows to a crawl.  Unpacking takes waaay longer, patching takes forever, compiling is so slow I can read each line as it comes on the screen.  The 4GB machine acts like a Pentium 100 with 32MB RAM, taking well over an hour to emerge subversion and its 3 dependencies.  Each 2GB machine zips along and does the same job in about 10 minutes.

THE QUESTION

How do I figure out what's going on???  /proc/cpuinfo is identical for all 3 machines.  There's nothing wierd in /var/log/messages or dmesg.  /proc/meminfo shows that the RAM amounts are detected correctly.  The hardware is all identical.  What gives??  What can I check??  (Oh, and nothing is smoking)

Could emerging tcl with threads have borked something?  Are there serious issues with the current kernel using > 2GB RAM???  I don't want my nice server to run like it's got Windows on it...HELP!Last edited by peaceful on Wed Sep 14, 2005 8:39 pm; edited 1 time in total

----------

## peaceful

I should add that I tried emerging smartmontools and running 'smartctl -t long /dev/sda', but I got:

```
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

Device: ATA      ST3160827AS      Version: 3.42

SATA disks accessed via libata are not currently supported by

smartmontools. When libata is given an ATA pass-thru ioctl() then an

additional '-d libata' device type will be added to smartmontools.
```

So my SATA drive precludes using that, at least.

----------

## peaceful

(SOLVED the performance issue...sorta)

I reduced the amount of RAM in the slow server from 4GB to 2GB.  Suddenly everything goes zippy.

??????

Is this a common issue with 2.6 kernels???  I'm running: 

```
Linux hydrogen 2.6.12-gentoo-r10 #1 SMP Tue Sep 13 11:19:30 MDT 2005 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
```

----------

## cyrillic

Have you tried changing the highmem setting in the kernel (i.e. try the 64GB RAM setting when you compile the kernel) ?

----------

## peaceful

 *cyrillic wrote:*   

> Have you tried changing the highmem setting in the kernel (i.e. try the 64GB RAM setting when you compile the kernel) ?

 

No I haven't.  I assumed the 4GB setting would work for anything <= 4GB.  I'll give the 64GB setting a try and post my results.

----------

## downey

Did you compile High Memory Support into the kernel and set up the other memory settings in the kernel correctly?  You need to provide more information on the kernel you are using and how it was built.  Do you see all 4GB of memory?  There is also a "Allocate Something in High Memory" option in the kernel that you should play with.  I would almost gaurentee that this is a kernel compile issue.  Also are you sure that Intel CPU you are using can actually address more than 2GB of memory?  Unless it's a 64-bit CPU you are limited to ~2GB of memory addressing anyway.  If you have a AMD64 system I would try loading the 4GB into it and seeing how it preforms.

Hope that helps.

----------

## peaceful

> Did you compile High Memory Support into the kernel...

Yes.  Set to 4GB.

> and set up the other memory settings in the kernel correctly?

Which settings?

> You need to provide more information on the kernel you are using and how it was built.

I posted the kernel version above: 

```
Linux hydrogen 2.6.12-gentoo-r10 #1 SMP Tue Sep 13 11:19:30 MDT 2005 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
```

Configured by following the Gentoo handbook and turning off most 'multimedia' options, since this is a headless 1U server.  Built by typing 'make', 'make install' (with /boot mounted), 'make modules_install', and making sure grub pointed to it.

> Do you see all 4GB of memory?  

Yes.

----------

## downey

The other memory option was "Allocate 3rd-level pagetables from highmem".  You may want to try playing with that.  Otherwise the only thing else worth looking into would be a BIOS update as I have seen a number of people report that some motherboards aren't working correctly with anything more than 3GB without a BIOS update.  You may want to try using 3GB and see if that improves things.  You may also want to compile your system ia64 to get see if using 64-bit helps.

----------

## peaceful

 *downey wrote:*   

> The other memory option was "Allocate 3rd-level pagetables from highmem".  You may want to try playing with that.  Otherwise the only thing else worth looking into would be a BIOS update as I have seen a number of people report that some motherboards aren't working correctly with anything more than 3GB without a BIOS update.  You may want to try using 3GB and see if that improves things.  You may also want to compile your system ia64 to get see if using 64-bit helps.

 

ia64 is a different architecture.  Compiling for that would leave me with an unusable system, just the same as if I had compiled for sparc or ppc.

I'll give the bios update a try, then try toggling the "Allocate 3rd..." option.

Stay tuned for more exciting news...  :Wink: 

----------

## widan

Also check /proc/mtrr, to be sure your BIOS is setting MTRRs correctly. If you have RAM areas that are not covered by a write-back MTRR, the CPU will consider them uncachable, resulting in a massive slowdown.

Here is a little demonstration of the "problem" (this is an extreme case with no MTRRs set up at all, but even partial MTRRs can slow things down a lot if uncacheable pages are used):

```
melanie tests # cat test.c 

int main(void)

{

  int i, vals[1];

  for (i = 0; i < 10000000; i++) {

    vals[0] = i;

  }

  return 0;

}

melanie tests # gcc -O0 test.c -o loop

melanie tests # echo "disable=00" > /proc/mtrr

melanie tests # cat /proc/mtrr 

melanie tests # time ./loop

real    0m31.857s

user    0m4.774s

sys     0m0.284s

melanie tests # echo "base=0x00000000 size=0x40000000 type=write-back" > /proc/mtrr 

melanie tests # cat /proc/mtrr 

reg00: base=0x00000000 (   0MB), size=1024MB: write-back, count=1

melanie tests # time ./loop

real    0m0.039s

user    0m0.036s

sys     0m0.002s
```

 *peaceful wrote:*   

>  *downey wrote:*   You may also want to compile your system ia64 to get see if using 64-bit helps. 
> 
> ia64 is a different architecture. Compiling for that would leave me with an unusable system, just the same as if I had compiled for sparc or ppc.

 

I think he meant x86_64, as some Pentium 4 CPUs support this, but I'm not sure yours does (I think only very recent ones support it).

----------

## peaceful

Wow, never heard of MTRR.  Here's my settings (I haven't touched them)

```
hydrogen linux # cat /proc/mtrr

reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1

reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1

reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1

reg03: base=0xe0000000 (3584MB), size= 256MB: write-back, count=1

reg04: base=0xf0000000 (3840MB), size= 128MB: write-back, count=1

reg05: base=0xf8000000 (3968MB), size=  64MB: write-back, count=1
```

So, adding that up, it appears that 4032 of 4096MB are covered.  How can I safely modify that to cover the last 64MB?

----------

## widan

Maybe the last 64MB are meant to be uncacheable (video RAM for a shared-memory on-board video card ?), but usually the BIOS will set up an explicit "uncacheable" MTRR for that. If you want to try, you can add another MTRR:

```
echo "base=0xfc000000 size=0x4000000 type=write-back" > /proc/mtrr
```

And see if the machine behaves normally (and faster). If this doesn't change anything to the speed, remove the MTRR ("echo 'disable=06' > /proc/mtrr", assuming the MTRR you added is reg06): if the area isn't meant to be cacheable (like PCI MMIO space), this could cause problems.

But I don't really think it could explain a big slowdown (completely incorrect MTRRs could, but yours look reasonable).

----------

## downey

Yes I meant amd64.  I thought ia64 was intel's 64 bit architecture but it's the Itanium architecture not the 64-bit pentium.  Man what an mess.

----------

## peaceful

Ok, so I tried adding an MTRR entry for the last 64MB RAM -- crash, hard reboot.

Tried disabling all the entries and adding one, 4GB entry -- crash, hard reboot.

So the MTRR stuff seems to be a "no go."

Oddly enough, on my 2GM RAM servers, MTRR looks like this:

```
helium ~ # cat /proc/mtrr

reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
```

One entry for the whole thing!  Weird.

I'm going to go try flashing the bios with the latest firmware update now.  What else can I try if that doesn't help?

----------

## peaceful

Grrr...I go to all the trouble of finding a floppy drive and a bootable DOS floppy to put the DOS EXECUTABLE ON (seriously intel, get WITH it.  Cross-platform in-OS flashing would be nice -- burnable CD IMAGES at a minimum), and then the servers don't have a FLOPPY POWER CABLE.  It's an INTEL board in an INTEL chassis, straight from INTEL itself!

Sigh, looks like I've got to hunt a molex->floppy power adapter down tomorrow.  I'm done for the day...

----------

## widan

 *peaceful wrote:*   

> Ok, so I tried adding an MTRR entry for the last 64MB RAM -- crash, hard reboot.
> 
> Tried disabling all the entries and adding one, 4GB entry -- crash, hard reboot.

 

So it's meant to be uncacheable. There's probably MMIOs for PCI cards in there... video memory wouldn't cause a crash. Using cache on MMIO registers is not good. It messes up register write ordering, and cards get confused. If that happens to a card that has a DMA engine (NIC, RAID controller... everything that moves a lot of data around), it can get misprogrammed and write to places in memory it shouldn't  :Arrow:  likely crash if it touches kernel data.

 *peaceful wrote:*   

> Oddly enough, on my 2GM RAM servers, MTRR looks like this:
> 
> ```
> helium ~ # cat /proc/mtrr
> 
> ...

 

Not that weird in fact. With only 2GB RAM, the PCI things go to the upper part of the address space (where there is no RAM). That area has no MTRR, so is uncacheable by default. So the BIOS can use a 2GB MTRR to set the whole RAM as write-back. With 4GB, the BIOS has to reserve some space for the PCI cards (the address space is only 4GB, so you need to give up a few megabytes of RAM, so that the PCI cards' registers/memory/... can be mapped). As MTRRs can only span ranges that have a size that is a power of 2, that explains the 2GB, 1GB, 512MB, ... MTRRs.

 *peaceful wrote:*   

> Grrr...I go to all the trouble of finding a floppy drive and a bootable DOS floppy to put the DOS EXECUTABLE ON ... and then the servers don't have a FLOPPY POWER CABLE

 

I keep a floppy drive, and a few floppies around for BIOS updates, but I never thought that the cable could be a problem...   :Sad: 

----------

## peaceful

 *widan wrote:*   

>  *peaceful wrote:*   Ok, so I tried adding an MTRR entry for the last 64MB RAM -- crash, hard reboot.
> 
> Tried disabling all the entries and adding one, 4GB entry -- crash, hard reboot. 
> 
> So it's meant to be uncacheable. There's probably MMIOs for PCI cards in there... video memory wouldn't cause a crash. Using cache on MMIO registers is not good. It messes up register write ordering, and cards get confused. If that happens to a card that has a DMA engine (NIC, RAID controller... everything that moves a lot of data around), it can get misprogrammed and write to places in memory it shouldn't  likely crash if it touches kernel data.
> ...

 

(at home now) Ok, that's a great explanation for the difference in MTRR coverage.  What else can I check?

My intel-reseller is delivering a floppy-power converter in the morning, so I'll be able to do the bios update first thing in the morning.

----------

## peaceful

I updated the BIOS from version 06 to version 09 -- no effect.  Still slower than tar.

I'll try recompiling the kernel with the "Allocate 3rd-level pagetables from highmem" kernel option checked on the wild chance that that changes something.

To recap:  The problem is that Gentoo running with 4GB of RAM is about as fast as a dead snail, while lowering the RAM to 2GB increases the performance up to super-snappy.  I've tried the 4GB/64GB settings for highmem in the kernel (no effect), playing with /proc/mtrr settings (no effect), and updating the BIOS (no effect).

Any ideas, anyone?

----------

## peaceful

Setting "Allocate 3rd-level pagetables from highmem" had no effect.  Just for good measure, I put 2GB more RAM in one of the 2GB servers and verified that it also performed dog slow with 4GB.

Unless anyone else has any more info, I've got to conclude that Gentoo just doesn't support more than 2GB RAM on my particular hardware right now.  <Sigh> I wouldn't have thought I'd be capped at 2GB in 2005.  Makes me want to get an Xserve.

----------

## beandog

Okay, this is probably going to be a totally helpless post, but have you tried ck-sources at all?  They manage memory differently than the stock kernels (gentoo included), so you may want to give that a whirl and see what it does.

----------

## bonbons

Did you try how it behaves with 3GB? (If possible, you might have only 2G dimms...)

Does telling kernel as boot-time to only use < 4GB improve performance? (e.g. 3.5GB) (mem= boot param)

This last test could help detect if problems are coming from BIOS-settings or from Linux kernel settings

----------

## peaceful

I missed the email notification for the last two replies somehow.   :Sad: 

Anyway, here's an update:  I tried 3GB of RAM, and performance was normal, so I ended up with 3GB in two of the servers, and 2GB in the last.  The boxes are all down at our datacenter in production now, and seem to be working great.

So here's my next question, now I'm speccing out a Database server and we would like it to have lots and lots and lots of RAM.  I have no experience with 64-bit architectures (other than my PowerMac G5 at home), but as I understand it I need a Xeon or Opteron for a server if I want to have an x86 architecture with > 4GB RAM.

Is that correct?  Our main supplier is pushing us really hard towards Xeons, but I haven't been able to find a x86-64 installation handbook.  There seem to be AMD64 handbook versions, forums, etc.   Maybe I should just go post about this in the AMD64 forum...

----------

## dweigert

Ok,

Your supplier wants you to fry an egg on the machine and have slow memory access.  Opterons are beating the tar out of the newer Xeons gt the moment, and completely killing them in the dual core arena. Besides you can now stuff 64 GB of RAM in some opteron servers. And you can go up to 8 way with single core, (16 way dual core).  It just depends what you want to do with this, and what budget you have.  Look up the differences in HyperTransport and the usual North Bridge type of memory architectures.  In addition, with the Opterons you get chache coherency and NUMA.

Dan

----------

## peaceful

 *dweigert wrote:*   

> Ok,
> 
> Your supplier wants you to fry an egg on the machine and have slow memory access.  Opterons are beating the tar out of the newer Xeons gt the moment, and completely killing them in the dual core arena. Besides you can now stuff 64 GB of RAM in some opteron servers. And you can go up to 8 way with single core, (16 way dual core).  It just depends what you want to do with this, and what budget you have.  Look up the differences in HyperTransport and the usual North Bridge type of memory architectures.  In addition, with the Opterons you get chache coherency and NUMA.
> 
> Dan

 

I'm hearing the same thing in #gentoo-amd64.  Xeon's are apparently not the way to go: hot, slow, 64-bit isn't very supported, expensive--apparently all-around losers.

Thanks for all the input...I'll be pushing real hard for an Opteron solution!

----------

## finalfantasy

In 2005.1 AMD64 version, it only sees 30xx MB memory. But the speed is ok.

But when I tried FC4/FC2 x86_64 , it gets the whole range. 0-4096MB.

and the speed is fast.

Don't know exactly why.

Might be the patch in the FC serious.

(There is no HIGHMEM option in kernel for x86_64. 

  I can only see it in 32 bit version. None/4GB/64GB options. 

  Am I right?)

Regards

----------

## rek2

yap I am wondering the same thing... why is there no option for memory? I am actually getting a Error 28 when I boot  :Sad: 

----------

