# MAKEOPTS -jN should depend on Core# or Thread#?

## midnite

In previous installations, I thought MAKEOPTS -jN should equal to the number of CPUs (or CPU cores) in the system plus one. However I just notice this recommendation changed in the installation manual.

Back in 27th July 2021, https://web.archive.org/web/20210727171544/https://wiki.gentoo.org/wiki/Handbook:AMD64/Full/Installation#MAKEOPTS

 *Quote:*   

> The MAKEOPTS variable defines how many parallel compilations should occur when installing a package. A good choice is the number of CPUs (or CPU cores) in the system plus one, but this guideline isn't always perfect.

 

From 22nd November 2021 to now. https://web.archive.org/web/20211122124902/https://wiki.gentoo.org/wiki/Handbook:AMD64/Full/Installation#MAKEOPTS

 *Quote:*   

> The MAKEOPTS variable defines how many parallel compilations should occur when installing a package. A good choice is the smaller of: the number of threads the CPU has, or the total system RAM divided by 2 GiB. 

 

This recommendation is quite ambiguous in the Kernel Configuration Wiki.

Back in 14th May 2021, it did not mention using nproc. So it was not ambiguous.

https://web.archive.org/web/20210514102401/https://wiki.gentoo.org/wiki/Kernel/Configuration#Build

 *Quote:*   

> For processors with multiple cores, make all the cores do the work. Add the option -j(<NUMBER_OF_CORES> + 1).

 

From 28th October 2021 to now, it suggests using Core# +1. However nproc is actually returning the number of threads.

https://web.archive.org/web/20211028220131/https://wiki.gentoo.org/wiki/Kernel/Configuration#Build

 *Quote:*   

> If the number of CPU cores in the system are known, the -jN (where N is the number of available cores plus 1) option can be used to speed up the complication process. For example, a dual core processor contains two logical cores plus one (2 + 1). An easy way to determine the number of available cores is to run the nproc command

 

Good Old Days (?) where things were simpler and consistent.

----------

## NeddySeagoon

midnite,

In the good old days where everyone had one core, life was simple.

MAKEOPTS was cores+1, so that make could have its own thread.

When CPUs became multithreading, that recommendation become threads+1.

Even threads+1 was debated as multithreading was a mixed blessing. It made some workloads faster and others slower.

All this time, the memory constraint that has been added to the Wiki was there but it was unstated.

It was also a lot less, so unlikely to be a practical problem.

With the growth in memory requirements per make thread, particularly with large C++ projects and the growth in thread counts, its become very easy to put together a memory constrained system that will swap, even if its not apparently pushed very hard.

Swap does not mean use a swap partition. The kernel has other ways of swapping too.

Given unlimited RAM, set MAKEOPTS to threads+1

However large C++ projects need 2G RAM per thread, so with 16G RAM, MAKEOPTS="-j8" is pushing your luck.

There is a third way.

In make.conf, set  MAKEOPTS to threads+1, as it will be OK most of the time.

Use package.env to set MAKEOPTS RAM/2G on a per package basis for the bigger packages.

----------

## toralf

IMO it should be threads-1 to give 1 (logical CPU) to the OS itself.

----------

## logrusx

 *toralf wrote:*   

> IMO it should be threads-1 to give 1 (logical CPU) to the OS itself.

 

That, as well as number of threads+1, the +1 being one thread for the IO, is another widespread misconception.

The number of jobs should be upper bound by the number of threads (the number of logical cores or the number of physical cores if no  or hyperthreading being turned of) and the amount of RAM in gigabytes divided by 2. If there's less than 2GB of RAM per thread you risk running into OOM. If there are more jobs than the number of available cores, logical or physical, the jobs will still be scheduled thus creating context switches overhead, if there are less threads than the number of available cores without being constrained by memory limitations, one core will just idle.

If you decide to use your computer for something else, then there will be more threads and some overhead, but it will be acceptable. Of course if you plan to run heavy load, then don't compile in the same time.

Having one thread for the OS is makes  no sense, as the kernel does not run in a thread.

Regards,

Georgi

----------

## pietinger

midnite,

there are old recommendations. Forget them. As @logrusx already said, you will get the best (fastest) results with -jx  where x = number of your logical cpu cores (=including SMT/hyperthreading) ... IF ...

... IF you have enough RAM ... AND ... dont use your system heavily with other running applications (because every compile job AND every running application needs some RAM).

So, how much ? Now this depends on every individual package. The official Gentoo recommendation is: 2GB RAM per job ... because MOST packages never needs more than 2GB per job, ... BUT ...

... this is not true vor every package. There is one package which needs upt o 4 GB RAM per job: "rust".

Now you have two choices:

1. Set your -j so small to catch all posssible situations, OR

2. Configure exceptions.

I do option 2 and I give you my configuration for an Intel i7 (4 pyhsical core / 8 logical cores)  with 16 GB RAM:

In my make.conf I have:

```
MAKEOPTS="-j8"
```

and i did additional:

```
# mkdir /etc/portage/env

# nano -w /etc/portage/env/monster.conf

=>

MAKEOPTS="-j4"

<=

# nano -w /etc/portage/package.env

=>

app-office/libreoffice monster.conf

dev-lang/rust monster.conf

dev-qt/qtwebengine monster.conf

<=
```

----------

## logrusx

One more thing I'd like to add is the probability of all threads having high memory usage decreases with the number of threads increasing. I noticed that when I upgraded form 4c/4t with 8Gb of RAM to 8c/16t with 32 GB of RAM. The ratio is still 1:2 but I haven't had problems with high memory usage so far. On rare occasions it swaps like 100 to 300 Mb.

Back when I used the 4c/4t CPU I had to use zram to prevent OOM's. (that computer used HDD and it was a terrible idea to have physical swap)

Regards,

Georgi

----------

## eccerr0r

Keep in mind if distcc is in your tool chest that -j becomes even more tricky...

----------

## NeddySeagoon

logrusx,

Very true. Also I have never seem make get more than about 30 threads running for one package, so you try emerge --jobs=3 ..., if you have 96 cores. That's a Cavium Thunder X2.

Then you run out of RAM when the 3 jobs are firefox, thunderbird and chromium. That's with 128G RAM.

----------

## logrusx

 *NeddySeagoon wrote:*   

> logrusx,
> 
> Very true. Also I have never seem make get more than about 30 threads running for one package, so you try emerge --jobs=3 ..., if you have 96 cores. That's a Cavium Thunder X2.
> 
> Then you run out of RAM when the 3 jobs are firefox, thunderbird and chromium. That's with 128G RAM.

 

You might be running in a situation where everything else depends on some or all of those 30 threads finish what they're currently building.

----------

## midnite

Sorry to interrupt. May I know how 30 threads are calculated to -j3, 96 cores and 128GB RAM?

----------

## logrusx

 *midnite wrote:*   

> Sorry to interrupt. May I know how 30 threads are calculated to -j3, 96 cores and 128GB RAM?

 

It's not make jobs but emerge jobs i.e. three emerges in parallel.

What NeddySeagoon says that no more than 30 make jobs were observed ever in a single emerge. In my opinion there's no such hard limitation and this is caused by the dependencies.

Regards,

Georgi

----------

## NeddySeagoon

midnite,

```
emerge --jobs=X ...
```

can start MAKEOPTS="-jY" for each running job.

So you have X * Y build threads running.

It my example, it was 3 * 30 for 90 threads, which is mostly OK on a 96 core system.

It not OK when it runs out of RAM and a build thread is killed by the Out Of Memory (OOM) manager.

----------

## tiffany

I have an Ryzen7 5800x 8C/16T and 32GB RAM. When I use my main Gentoo-Gnome installation with TMPFS for compilation I have "-j16" and never had a problem. Even with RUST.

But when I test things on Vmware Player things change a little bit. I always give all of the CPU and half of my system RAM. When I do a "normal" compilation again "-j16" works most of the time. BUT that damn qtwebengine hangs. So I lower the jobs to a normal 8 and everything goes fine.

----------

## midnite

Thank you Georgi and NeddySeagoon for explanations.

I thought emerge --jobs=X was overriding the -jY in make.conf. Thank you and I am clear now it does not.

May I ask, by default, emerge does not build simultaneously, i.e. emerge --jobs=1? If emerge builds simultaneously, it is clever enough to build the dependent packages before the depending packages?

I understand the differences between MAKEOPTS="-jY" and emerge --jobs=X. What is the benefit of ( X=3 , Y=30 ) over ( X=1 , Y=90 )?

I am currently pushing the limits of my PC with MAKEOPTS="-j8", on 4790K (4c/8t) and 16GB physical RAM. All cores are at 100% but RAM usage is just around 8GB. (Meanwhile I am browsing on Firefox and music on mpv --no-video.) I think 2GB per thread is just a (seldom reaching) upper limit.

----------

## NeddySeagoon

midnite,

-jobs=1 is the default for emerge. 

With emerging one job at a time, very few, if any, packages run more than 30 concurrent make threads.

Some small things won't have 90 files t build in total :)

Portage gets the dependencies right.  --jobs=3, is not a command, its advice to portage.

If the dependency tree does not allow three concurrent jobs, portage will not do it.

----------

## logrusx

 *midnite wrote:*   

> Thank you Georgi and NeddySeagoon for explanations.
> 
> I thought emerge --jobs=X was overriding the -jY in make.conf. Thank you and I am clear now it does not.
> 
> May I ask, by default, emerge does not build simultaneously, i.e. emerge --jobs=1? If emerge builds simultaneously, it is clever enough to build the dependent packages before the depending packages?
> ...

 

If you run emerge with --jobs>1 then you need to adjust make jobs accordingly. However, if you do say 8 emerge jobs with 2 make jobs each to accommodate the 16 cpu threads, occasionally you'll run into situations that the dependency graph does not allow all the 8 emerge jobs to run in parallel because all the rest depend on one or two packages which are currently being built.

I recently run into such situation, noticing there were a whole bunch of small packages that didn't ever utilize more than one core and I decided to be clever and reduce make jobs and increase emerge jobs. To my surprise, at one point only one job was running. The disadvantage is when you get to the bigger packages which utilize all threads, they have only however many make jobs you configured in make.conf and even the last one will use only 2 for the example above.

Regards,

Georgi

----------

## midnite

Thank you NeddySeagoon for the answer.

Thank you Georgi for the example. So I think MAKEOPTS="-j4" EMERGE_DEFAULT_OPTS="--jobs 2" may build faster than MAKEOPTS="-j8". In fact how do you monitor the number of threads that a package is using for the build, so to know if it is utilising?

----------

## pietinger

 *midnite wrote:*   

> So I think MAKEOPTS="-j4" EMERGE_DEFAULT_OPTS="--jobs 2" may build faster than MAKEOPTS="-j8". 

 

Yes. There is another reason: Every compiling of a package starts with exploring the configuration. This is single threaded (even with MAKEOPTS=30  :Wink:  ). During this part of working on a package you have 5 working threads with this configuration (1 on this package + 4 on theother package), or only 1 working thread when using -j8 without --jobs.

P.S.: If you want to run as fast as possible you have to decide which kind of packages you have to compile:

1. Many small packages -> low MAKEOPTS + high --jobs

2. Some BIG packages -> highest possible MAKEOPTS no --jobs setting (defaults to 1)

----------

## eccerr0r

seems it would be nice if all packages in portage has a weight number... and emerge could parallelize small packages when there is no conflict.

It does seem that when emerge --jobs that there are multiple occasions where jobs are serialized.  Was hoping that things like rust is parallelized with ... qtwebengine... hear me out... just so that qtwebengine is distcc'ed to other machines when rust is stuck on the merging machine.

Things like this make it really tough to make the right choice of --jobs, -j, and distcc...

----------

## Hu

One approach for that would be for emerge to implement the GNU make jobserver protocol, either directly or by spinning up a dummy make where its goals are the individual packages.  Jobserver tokens could then flow freely among packages, so packages with little or no innate parallelism would allow other packages to run at once.  Highly parallel packages would take most or all of the job slots for themselves, and concentrate the system on one package at a time.  The downside of all this is it requires all the parallel-capable packages to understand a common jobserver protocol.  Any build systems that implement their own custom logic for deciding how many threads they can run would misbehave.

----------

## logrusx

 *midnite wrote:*   

> So I think MAKEOPTS="-j4" EMERGE_DEFAULT_OPTS="--jobs 2" may build faster than MAKEOPTS="-j8".

 

Usually it won't be the case. Make does better job distribution than emerge, partly because emerge doesn't care how many jobs make has spawned. If you wan't to not be bothered with what's going on where, just leave emerge jobs alone and only adjust make jobs.

 *midnite wrote:*   

> In fact how do you monitor the number of threads that a package is using for the build, so to know if it is utilising?

 

With top and a lot of time to waste staring at the screen.

Regards,

Georgi

----------

## NeddySeagoon

Team,

MAKEOPTS and emerge --jobs are optimisers to try to make best use of available resources.

The idea is to optimise for the most likely case. However the further reality deviates from 'most likely', the more suboptimum the settings become.

----------

## pingtoo

Hi,

Just wonder, anybody use the "-l" option for make and emerge? I use them for both make and emerge, it seems work as design.

```

       -l [load], --load-average[=load]

            Specifies  that  no new jobs (commands) should be started if there

            are others jobs running and the load average is at least  load  (a

            floating-point number).  With no argument, removes a previous load

            limit.

```

```

       -l [LOAD], --load-average[=LOAD]

              Specifies  that  no  new  builds  should be started if there are

              other builds running and the load average is at  least  LOAD  (a

              floating-point  number).   With  no argument, removes a previous

              load limit.  This option is recommended for use  in  combination

              with  --jobs  in order to avoid excess load. See make(1) for in‐

              formation about analogous options that should be configured  via

              MAKEOPTS in make.conf(5).

```

----------

## eccerr0r

Using -l does help but I found that in multiple situations it also leads to some dead time if not tuned right.

This is because load average is an average, and I've frequently seen on slow machines that it simply sits there waiting for the load average to drop down when in fact it's idle... 

Tough to tune this too especially with the filtering load average does.

----------

## logrusx

 *pingtoo wrote:*   

> Hi,
> 
> Just wonder, anybody use the "-l" option for make and emerge? I use them for both make and emerge, it seems work as design.
> 
> ```
> ...

 

This will not protect you against running out of memory and with recent kernels and hardware it's useless for the desktop and I believe nobody runs updates on their servers. It makes much more sense to use 5.12* with -ck patches for desktops running on older hardware. As eccerr0r already stated, it can also lead to dead time and according to my observations, it usually will.

*Con Kolivas seem to have stopped maintaining the patches for newer kernels. It's not the first time he leaves, I hope it wasn't be the last time he returned either.

Regards,

Georgi

----------

## pingtoo

 *pingtoo wrote:*   

> Hi,
> 
> Just wonder, anybody use the "-l" option for make and emerge? I use them for both make and emerge, it seems work as design.
> 
> ```
> ...

 

Just want to add one more fact, my world update usage pattern usually update 300~500 packages, sometime more. So may be that make different from if there are only few packages update. And i use distcc with "--getbinpkg"

----------

## figueroa

I use the following:

```
MAKEOPTS="-j8 -l5"

EMERGE_DEFAULT_OPTS="--jobs=1 --load-average=5 --autounmask=n"
```

I have an Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz (four cores/8 threads) with 16 GB or RAM (i.e. my sig block).

I seldom have to make any adjustments for large packages or other portage issues and my system is very usable during updates. My main interest is that things work without error, without too much heat, and the ability to enjoy using the system during updates. During updates I'm mainly doing email with Thunderbird, seeking information/news on the web, and often with Youtube going in Firefox in the background on one of the tabs.

I have conky running where I can monitor the MHz of each core/thread, memory use, load, and disk i/o, which is usually pretty boring, but I can see that decent thresholds are seldom exceeded.

If I had issues, I would happily reduce those settings. I'll all for a pleasant experience.

----------

## eccerr0r

I suspect your cpu should get about as hot as mine...

Last summer my cpu was overheating building a few packages when 4 threads were being built.  Had to do a disassembly and thorough cleaning.  Now it gets to ~ 80°C when under full load (8 threads) where it had been exceeding well past 90°C if I don't throttle (on stock HS/F) even during the winter.  Just wondering what kinds of temperatures you're seeing at full load, and what HS/F.  I think the 2600 still can run at 3.5GHz with all cores up if I'm not mistaken, so it's about 100MHz less, albeit with turboboost it's hard to tell how much power the machine dissipates without actually measuring it...

I'm hoping that I won't see the overheating again this summer, nor other machines... the new to me ancient Xeon X3360 I wonder about.  Fortunately all my P4's are dead (crashy) or gone...

----------

## figueroa

I set this computer up after picking it up after somebody put it out with the trash about 40 months ago; a bit of luck on my part. I blew it out well with a shop vac, but I didn't reseat the heat sink and I'm sure that's never been done. It had no RAM or hard drives, but I added RAM and hard drives and it's been fine.

It's very likely that my -l5 and --load-average=5 settings throttles processor some, but I do see the CPU frequency kick up near max on all threads, but not continuously, during compiling. Right now temperature on the four cores is: 45, 38, 37, and 47 with just Thunderbird and Firefox running with two tabs; this one and a Youtube video playing something from DW News. I don't often watch the temperature because I know it's not getting excessively hot.

If nothing big is queued up tomorrow, I'll do a rebuild tomorrow on gcc and make a point of watching the temperature and report back on this thread.

ADDED: It takes about an hour to rebuild gcc on this box.

----------

## eccerr0r

Currently my CPU with light load (playing mythfrontend of a 720p ATSC stream, and of course I'm using Firefox right now) are 37, 37, 39, and 41°C.  I do know the whole history behind this CPU from when it came out of the box however, unlike the X3360...

----------

## figueroa

Those are very nice temps. My ambient temperature is @74F/23.3C.

----------

## figueroa

And so, the answer is, after rebuilding sys-devel/gcc taking 1:18:14, CPU frequency on all eight threads generally stayed above the 3,400 max but not going above 3,600, load average only occasionally and momentarily peeked above 5.0, and the CPU temperature never went above 79, usually hoovering around mid 70s. I have two LXTerminal windows open, Thunderbird, and Firefox with 3 or 4 tabs doing nothing special.

But, before I set the load average thresholds in make.conf a couple of months ago, I still did not experience excess CPU temperatures or load that I noticed. I set the load average to 5 just as a precaution, not to solve any problem.

----------

## eccerr0r

BTW, do you use=fortran, go, pgo, lto?  For me the build times are kind of messed up due to distcc albeit gcc does not distcc much (stage 1).

Might just be the specific workload but it seems for me that gcc isn't usually the worst in terms of heat generation, seems intrinsically it has stalls that make inefficient use of threads (efficient use of threads will increase temperatures due to less dead time!)  But this is just a wet finger guesstimate.

This IMHO is how I sort of guess at how to gauge how much speed increases:

Using a thread of SMT processor = +10 to 15% throughput of a single core (depends on parallelization and resource use.  Note that Linux will use physical cores over SMT cores)

Using 1 more physical core of an multicore processor = +90+% throughput of a single core (depends on parallelization)

Using 1 more thread than available hardware = -5% speed (OS thread switch and cache invalidation overhead if there is no dead time)

Using 10% more ram than you have = -10% throughput (depends on memory residence and swap speed)

Using 30% more ram than you have = -1000% throughput (depends on memory residence and swap speed)

----------

## figueroa

I don't use fortran or go. I gave up on lto, pgo after trying sometime last year. I gave up distcc over 10 years ago. It wasn't helping. I have a very vanilla and stable setup, with an openrc desktop profile. I do have swap set to spinning rust, but I almost never use any. There are currently 30.3 MiB in swap but that number has been static for weeks.

----------

## eccerr0r

Distcc helps quite a bit in my experience if tuned correctly.  If it's not tuned right and starts puking on the compiles, then it will not help and even detrimental.

Recently the tuning has been good on my farm and a lot of applications distcc so well that things like qtwebengine have been killing my helper machines.

Recently I've been running a really big java application that chomps on all threads and cores (processing openstreetmap data).  This has been my benchmark on cpu temperatures as it consumes lots of RAM and cpu cycles.  My next fastest machine takes twice as long to complete the same task...

----------

## figueroa

My mistake. I've not run distcc. I was thinking about ccache.

----------

