# Why multiple "emerge -e world" are actually useless

## Guenther Brunthaler

In my article https://forums.gentoo.org/viewtopic-t-494331.html I presented a guide and a helper script for recompiling an entire Gentoo system with minimum processing time effort.

Basically, the guide leads to the compilation of every package in the system in the correct order.

And every package, including gcc and glibc, is compiled only exactly once. This is a huge difference to the many similar guides floating around in the forums, which suggest rebuilding the system up to 6 times "just to be sure".

I argued in that article that my method of only compiling each package once is no worse than compiling it multiple times.

What was missing was an explanation why I think that way.

And the purpose of this article is to provide interested readers with that missing explanation.

Myth 1: GCC gets better, the more often it gets recompiled

This myth is one of the reasons most alternative "compile entire system"-kindof guides emerge the new GCC at least three times.

The rationale behind this:

When you emerge the new GCC the first time, it will be compiled using your old compiler. That means, it will be compiled by a potentially worse compiler than the new one. (Assuming each new version of a compiler generates better code than the older ones.)

So the new GCC is recompiled by itself, in order to get a new GCC that has also be generated by the new GCC's code generator.

Now we have a brand new GCC, generated by itself. Well - not actually by itself: The first new version of the GCC, which compiled the second version, was actually compiled by the old compiler. And maybe that old compiler somehow interfered with the source code of the new GCC, so that the first new GCC does not generate exactly the same code as a new GCC (compiled by itself) would have... So, in order to be sure, let the second GCC compile itself again to create a third GCC. This third version should be used then as the new compiler.

So far for the lore.

From a theoretical point of view, the supporters of this myth are mostly right. It is indeed necessary to recompile a compiler three times in order to get a new compiler which has been compiled by itself and proven to generate reliable code.

But what those people obviously don't know, is that a single "emerge gcc" does all that automatically!

Here is the explanation: The GCC makefiles, which are triggered by a single "emerge" operation, perform a "three-stage bootstrap".

This bootstrap works as follows:

Firstly, the old compiler is used to compile a first version of the new compiler.

This first version of the new compiler then compiles itself, creating a second compiler.

The second compiler ist then used to compile itself again, creating the third and final compiler.

The second and third version of the compiler should be exactly the same: The have both been compiled by a version of the new compiler, using the same source code as input.

The only reason for creating the third compiler is to verify that the second and third compilers are indeed the same.

This means that the first and the second compiler both generated the same code.

It further means that the new compiler can compile a C program that behaves exactly as a C program compiled by the old compiler.

That is, the new compiler knows "C" as good as the old one did. (At least as much knowledge is required for the "test" source codes getting compiled. And the compiler's own source code, which is very large and complex, serves as the test subject here.)

If one knows the above facts about how the GCC makefiles work, it should be clear why the GCC generated by those makefiles is already optimal, and cannot get better by recompiling it over and over again: The old compiler generated the same code for the new GCC as the new compiler did when compiling itself. And even a third compiler, compiled by the new compiler, generated the same code. This will not change, no matter how many times gcc is re-emerged.

This is why it is useless to emerge GCC more than once. It will not get any better than it already is.

Myth 2: We need to recompile everything after a glibc update

Many people consider updating the glibc to have a similar effect as updating the GCC: As the new glibc may have updated header files and may operate on different internal data structures, all applications using those header files (= virtually all applications) must be rebuilt.

Although this concern may be correct for applications which used some "experimental features" of the old glibc, such changes will effect only very few applications.

This is because the glibc ABI (Application Binary Interface) is very stable and existing functions change rarely if ever.

Typically, a new glibc version will add additional functions / features, but do this in a way that older programs not using these new features will not be affected in any way.

The only exception from this may be bugs that get fixed. But applications typically do not depend on the existence of bugs in the C library, but rather assume there are none. (Admittedly, virus writers may depend on such bugs. Well, then that virus will no longer work with the new glibc. What a pity. But hey! If you are a virus writer, then you are using the wrong OS anyway!)

Another point to consider is that the glibc is usually dynamically linked to an application.

This basically means that the application contains some record which communicates "put in a pointer to the printf() function in the glibc library here so I can call printf via this pointer" to the dynamic linker.

As long as the established semantics of the printf() function do not change, this will work with any future glibc version as well.

Another reason why the glibc ABI is very stable, is that it is (mostly) based on well-established standards, such as the ANSI-"C" or POSIX standards.

Those standards define a set of functionality which will not just "change over the night".

Thus, any new versions of the glibc can assumed to be reasonably backwards-compatible with existing applications.

Which means, for most existing dynamically linked executables, a new glibc can be used as a "drop-in-replacement" for an older version against it was linked. Without being re-built or re-linked, that is.

And what about statically linked executables?

Well, as those executables already contain all the  code they need (extracted from the old glibc at the time when they were linked), they will not be affected at all from a glibc change. They will happily keep running, using their embedded copy of the old glibc just as before. You only have to re-link those executables if you want them to use the new glibc routines. (This may be the case if the new glibc contains important bug fixes.)

However, there is one thing that can break ABI compatibility: Version scripts.

Using version scripts, a library author can explicitly define a new version of a library not to be compatible with previous versions.

But even in this case, it its not necessary to immediately re-link all executables using the new glibc: Just keep the old glibc version installed. Then the old applications can continue to use the old glibc, while newly compiled applications will use the new glibc. (I. e. a parallel installation, also called a "slot"-installation in Gentooese. Hmm. I wonder if Gentoo supports slot installation of glibc? If not, then we are in serious trouble in such a scenario. But then all the other guides will fail as well! Such a glibc upgrade can only be performed easily by using some sort of cross-compiler/cross-linker, which themselves are linked to the old glibc, but create executables linked to the new glibc.)

As long as appliations using those different library versions do not interact with each other on the ABI level, all will still be fine. (But plug-ins will typically no longer work when they are linked against a different version of the glibc than the application uses which they belong to.)

Summing up: It is not true that every application has to be recompiled/relinked as soon as a new the glibc version is available. It depends on whether the new glibc contains version scripts that are incompatible with the previous glibc. And even then, not all applications are affected, as long as both versions of the glibc remain installed.

And the question is: How often will this occur? Why should the glibc authors release a new version of the glibc that is totally incompatible (at version script level) with the old version?

One can expect such incompatibilities to happen if the major version number of glibc is incremented (i. e. as soon as we reach glibc 3.x).

But certainly not when updating from glibc 2.3.x to 2.4!

Such upgrades can safely be assumed to be drop-in-replacements.

The implication of this is: Most applications that depend on the glibc will keep running after a glibc update without requiring recompilation or even re-linking.

So far for the myths.

And now for the real problems!

There is indeed a problem when upgrading GCC versions 3.3 to 3.4 or from 3.3/3.4 to GCC 4.

But those are not glibc problems; those are C++ problems.

In fact, (as far as I know) glibc should be totally unaffected by C++ problems, because (the shared library) glibc is a pure "C" library.

In a "C" library, symbols (as visible to the linker) have very simple names. They are typically the same names as used in the "C" source code, prefixed by an underscore.

Thus, if you use the function printf in your "C" program, the GCC generates a reference to the symbol "_printf". The linker then finds this symbol in the glibc, and uses this.

This simple scheme is possible in C, because it does not support overloading. Or C++'s namespaces. Each single (and externally visible) function in "C" has a single, unique name.

In C++ things are not so easy. There may exist multiple overloaded functions which all have the same name, but have different argument types. Or two otherwise identical names exist in different namespaces.

For that reason, C++ combines the argument types and namespace names with the name of the function to generate a unique symbol name which is seen by the linker. However, the actual C++ types cannot always be used directly in literal form when combining them to form symbol names for the linker due to restrictions how a valid linker name should look like.

This leads to replacing certain characters in the type names by others and even weirder transformations to be performed on the type names, until the end result of the combined names typically looks like rubbish to an unaware user.

This is also known as name mangling.

And because C++ types can get pretty complex, the mangled names are also that complex.

But the worst thing is: There are countless possibilities how name mangling could be performed.

From time to time, compiler autors find "a better way" to do the name mangling, and the resulting compiler will then generate object files which cannot be linked against object files exporting mangled named from an older version of the compiler.

If this happens, the old object files have to be recompiled with the new GCC in order to use the same name mangling scheme.

But mangled names are not the only reason the C++ object files may be incompatible with older versions of g++ (and thus requiring recompilation).

C++ compilers also generate type information structures and exception information structures which will be referenced from the generated code at runtime (dynamic_cast, type_id, try/catch etc).

And similar to symbol name mangling, those information structures can also be implemented in various ways, and somtimes it may even be necessary (and not just for fun because it "looks better" in the opinion of the compiler authors) to change them in order to implement some new feature of the compiler.

No matter what is changed - name mangling, exception information representation or type information representation - in all those cases C++ programs which are using those C++ features must be recompiled after a GCC compiler update.

And it seems, such changes were made in GCC 3.4 as well as GCC 4.

From what we have learned above, we can conclude: When upgrading from GCC 3 to GCC 4, we need to recompile most C++ programs, but it would be unneccessary to recompile most C programs.

However, how can we know which Gentoo packages contain C++ programs and which ones do not?

Of course, one could look into the source code of each package and search for C++ files... but this cannot be implemented automatically in a safe way.

The problem here is that it is impossible to distinguish between C and C++ source files in a safe way.

Of course there are naming conventions, such as "a C file has file extension .c" or "a C++ file has file extension .cpp or .C or .cxx or c++", but that's all they are: Conventions.

Nothing can stop a weirdo C++ developer from using a .c file extension for his C++ source files.

And you cannot even tell from looking into the source files: Is a file containing the single line "extern int c;" a C source file or a C++ source file? It could be both.

Only the instructions in the Makefile determine as what language a source file will actually be compiled.

But what if our weirdo C++ programmer does not use a Makefile at all, but rather uses his super-sophisticated and absolutely non-standard Perl script for taking over the Makefile's job?

The conclusion is: You can't safely determine whether a package uses C++ or not by automatical scanning of any kind. (Well, perhaps SKYNET could. It may have got enough A.I. for that. But then we would have other problems than recompiling our systems I guess.)

It follows that the only way to be sure that all packages containing C++ source code will be compiled after an incompatible g++ upgrade, is to also recompile all the packages.

This also allows all packages to benefit from the better code generator of the new GCC, and is thus not per se an evil thing.

So, now we are back where we have started: We know that we have to rebuild each and every package of the system.

But with the information from the paragraphs above, I can now argument how to do it and why it will work that way.

How my upgrade scheme works

The first step of my guide is to do an "emerge --sync". This ensures that the latest package versions are known to portage.

Then an "emerge --update --deep --newuse world" is run. This ensures that we also have the latest versions of all packages installed, including those in "system".

It also means that tools like "flex", "bison" etc which might be used by the compiler Makefiles are available in the most up-to-date versions. The fact that those tools are still linked against the old glibc should not have any effect, as I have shown when deconstructing myth # 2.

Then the new compiler should be emerged. Note that the system still uses the old compiler as the default compiler!

But as I have shown above (myth # 1), this new compiler will be as good as it can be, irrespective of the fact that the old compiler is still the system default compiler. And that there is no point in recompiling the new gcc ever again.

The fact that the GCC contains some C++ libraries is special here: As the g++ will recompile itself using itself in the second and third stages of its three-stage bootstrapping, it will actually become the first application compiled with the new C++ mangling schemes etc to be compiled on your system ever!

This means it need not be recompiled again, because it has actually been the first package to be recompiled. The fact that it is not the system default compiler yet does not affect this.

However, the new compiler is linked against the old glibc. But this should also not affect it - see myth # 2 again: If a new version of the glibc becomes available later, it will be used as a drop-in-replacement, not requiring re-linking or even recompilation of the new GCC.

At this point, all tools are up-to-date, and the new compiler and the old compiler are both installed and operational.

Now my guide tells to change the profile using "esecect profile" if desired, as well as setting the new GCC as the default compiler using "gcc-config".

After this, I suggest a reboot just to be sure the changes made by env-update have been propagated through all the shells and processes in the system.

Then, it is time to run my generator script.

What it does is simple: It does both an "emerge --pretend --emptytree world" and "emerge --pretend --emptytree system".

In both cases Portage will output the packages already in the correct order to be rebuilt.

However, the packages from system should be rebuilt first.

Another problem is that Portage analyzes both emerges separately, which means that the output for "world" contains several packages which are also part of the output for "system".

My script therefore filters out those duplicates, and combines the two lists into one ("system" list items coming first).

While doing this, it removes GCC from the combined list: GCC ist already installed, and as explained before in this article there were no advantage compiling it again.

As a special provision, my script ensures that the most important packages are emerged first: linux-headers, glibc and binutils.

linux-headers are emerged first, because they are required for glibc.

glibc is next, because it is without question the most important library in the system. (If you have upgraded the system profile before, this will emerge the new version of the glibc already.)

And by emerging glibc as soon as possible, any of the packages emerged after this will already be linked against this new glibc.

And although the new glibc can mostly be used as a drop-in-replacement for the old one (see myth # 2), emerging it as soon as possible will eliminate the few exceptions where a drop-in-replacement would not work. (Only our virus writer from myth # 2 will still not be happy. Where are the old bugs to be exploited?! Perhaps he will relocate to Redmond now, and the Gentoo community will lose a member. What a shame.)

Now you might throw in "But linux-headers has not be linked against the glibc!"

But that is actually not a problem: The linux-headers package consists only of header files and does not generate any executable which would require linking.

Also note: Any packages emerged from this point on will be compiled by the new compiler and be linked against the new glibc.

Can there be any better? I say "no". There is no reason why the following packages should be recompiled more than once. They would again be compiled by the same new GCC and be linked against the same new glibc, yielding the very same executables as before.

Finally, binutils are emerged, because they contain the remaining essential system components required by the build system, such as "make", "as", "ld" and the like.

However, it should actually not be necessary to recompile them, because they existing versions of those tools already were based on the most current source code, and I bet they are all pure C applications not using C++.

But I may be wrong, and perhaps the profile update unlocked different binutil versions; so perhaps it actually makes some sense to recompile them. And not to forget that better code generator of the new GCC! So just let's re-emerge them also.

After those packages the filtered packages from the "system" list will be emerged, followed by the filtered packages from the "world" list.

My generator script will then write out a shell script containing all those emerges in the right order and with the right emerge options, and add a bit magic to allow incrementally building the packages (see my guide for more details).

That's it.

That's all my script does.

If you are truly paranoid, you can re-emerge GCC and binutils again after all of this:

Perhaps the changing of the system profile unlocked new versions for some of the binutil tools.

It then would no longer be true that the "emerge --update --newuse --deep world" as performed at the beginning of my guide updated binutils to the newest version.

This means, GCC has been built with older versions of the binutils, and now there are newer versions of the binutils installed.

But the same is true for binutils themselves: At the time they were recompiled (after glibc), the newest version of binutils from the old Gentoo profile was installed. After they were recompiled, the newest binutils version of the new Gentoo profile is installed. And perhaps those versions might be different.

By re-emerging GCC and binutils again one can be sure, that those packages are now being built using the newest binutil version in any case.

Why I consider this to be paranoid: Even if new versions of tools like "cp", "make", "sed" etc are used during the compilation of a package, apart from catastropic bugs I cannot see how this should make a difference in the generated executables. (At least no functional difference. Perhaps a new linker will write a different version number into some identification fields of the various ELF sections. But who cares.)

So far my explanation.

Comments welcome!

Greetings, GuentherLast edited by Guenther Brunthaler on Sun Sep 03, 2006 11:33 pm; edited 1 time in total

----------

## Hell-Razor

Have enough time on your hands?... well im guessing too much time.

----------

## Guenther Brunthaler

 *Hell-Razor wrote:*   

> Have enough time on your hands?... well im guessing too much time.

 

I felt it was necessary to also provide an explanation for my guide.

If I just provided a guide without that, wildly alleging things without explaining why I think those ways, I just left the readers in a position where they can either believe my allegations, or not.

But my intention has always been to create knowledge for the interested ones, not just belief. (I'm neither that company in Cupertino nor that from Redmond. I'm not interested in creating believers. I prefer to communicate with people who know what they are doing, and why.)

----------

## stahlsau

 *Quote:*   

> But my intention has always been to create knowledge for the interested ones, not just belief. (I'm neither that company in Cupertino nor that from Redmond. I'm not interested in creating believers. I prefer to communicate with people who know what they are doing, and why.)

 

Ty, great article. This should go into the FAQs.

----------

## Dieter@be

Great read   :Cool: 

----------

## zontar

Interesting, clear and readable. Thank you.

----------

## Gergan Penkov

there already exist scripts, which do this better - while this particular flow of logic, could make your system run, it will not optize it or make it stable.

For example, when you install new version of binutils, you need to recompile glibc and not gcc - as the dynamic loader is in glibc and not in gcc, failing to do so you are probably missing new features.

I'll not continue to dive into this, but the general upgrade guide has shown that it induces less bugs than some new methods, and there is emwrap, which probably does this much better.

----------

## Zubzub

 *Dieter@be wrote:*   

> Great read  

 

+1

----------

## Guenther Brunthaler

 *Gergan Penkov wrote:*   

> there already exist scripts, which do this better

 

Can you please provide me with a link? (If there is more than emwrap to look at. I will check out emwrap as soon as I have some time left.)

I wouldn't have written my guide if I had found such a better guide. (I really don't like to reinvent the wheel!)

 *Gergan Penkov wrote:*   

> when you install new version of binutils, you need to recompile glibc and not gcc

 

My script doesn't recompile GCC. This is done by the user (as instructed by my guide), before even starting my script. And binutils is recompiled, as well as glibc is.

However, admittedly, glibc is recompiled before binutils.

But note that my guide instructs to user to do an "emerge --update --deep --newuse world" before doing anything else!

That means the newest version of binutils will already be installed when the new GCC is emerged, and thus they are also up to date for sure before glibc and binutils are re-emerged after this!

Nevertheless, thank you for your input.

I appreciate any comments which may help eliminate potential problems in my guide.

----------

## Polynomial-C

Hi,

 *Guenther Brunthaler wrote:*   

> Finally, binutils are emerged, because they contain the remaining essential system components required by the build system, such as "make", "as", "ld" and the like.

 

```
# equery -C b `which make`

[ Searching for file(s) /usr/bin/make in *... ]

sys-devel/make-3.81 (/usr/bin/make -> gmake)
```

(GNU)make is not part of binutils. It's installed as a separate package.

Cheers

Poly-C

----------

## Guenther Brunthaler

 *Polynomial-C wrote:*   

> (GNU)make is not part of binutils. It's installed as a separate package.

 

You are right; thank you for the correction. I didn't know that... well, at least I didn't think of that. (Because now after you told me I rememember very well to have been watching when "make" was emerged some time ago. Must have been advanced stupidity making me forget about it...  :Wink: )

Fortunately, "make" is a very stable tool regarding backwards-compatibility and does not generate any code on its own (in contrary to "binutils" which contain the Assembler and Linker), so it will not invalidate my guide if an oder version of make might be used for some of the first packages.

But nevertheless, you are still absolutely right with your comment.

----------

## ThePsychoHobbit

Guenther: Just as a general rule of thumb, it's a good idea to write in full paragraphs.  One sentence per line is extremely annoying to read.

----------

## tSp

 *ThePsychoHobbit wrote:*   

> Guenther: Just as a general rule of thumb, it's a good idea to write in full paragraphs.  One sentence per line is extremely annoying to read.

 

thats step by step or logical order writing style

----------

## Guenther Brunthaler

 *ThePsychoHobbit wrote:*   

> Guenther: Just as a general rule of thumb, it's a good idea to write in full paragraphs.  One sentence per line is extremely annoying to read.

 

I did write full paragraphs! However, some paragraphs are so short, that they only fill a single line.

My general idea about using paragraphs is: A new idea or concept - a new paragraph. Unfortunately, there are lot of concepts to be communicated in the guide, leading to multiple rather short paragraphs.

I'm sorry about that, but I have no good idea how to change this with tolerable effort. Probably the best thing I could do is to re-write the guide from scratch, with focus on a better writing style from the very beginning.

But as the guide is still in a changing state, I will certainly wait for it reaching a stable state before I actually consider doing this.

----------

## Cinquero

Not that I don't believe you regarding the 3-stage gcc build (I knew that long ago), but do you know how to proof that this actually is also true in (Gentoo) practice?

Or in other words: is there a way to compare the generated gcc libs and binaries after an "emerge gcc" and after an additional "emerge -e system"? I tried but failed, probably because some sort of irrelevant (time?) information is stored along with the machine code (or because the 3-stage compilation produces different results depending on the installed glibc...).

----------

## Guenther Brunthaler

 *Cinquero wrote:*   

> but do you know how to proof that this actually is also true in (Gentoo) practice?

 

I admit I did not try to verify this "the hard way" as you did.

But as far as I could see, the Gentoo ebuild script just runs the autoconf-created installer of GCC, which *does* that tree-stage build. Normally.

However, I have not checked whether the Gentoo devs have perhaps patched the GCC build script to omit some of the GCC build stages.

But why should they? Especially on Gentoo, the compiler is more important than most other system components. I doubt the Gentoo devs would have crippled its build process just for a small speed gain, risking unstable GCC operation afterwards.

Considering this, I guess we don't need to bother seriously, regarding that issue.

 *Cinquero wrote:*   

> Or in other words: is there a way to compare the generated gcc libs and binaries after an "emerge gcc" and after an additional "emerge -e system"?

 

Not easily.

 *Cinquero wrote:*   

> I tried but failed, probably because some sort of irrelevant (time?) information is stored along with the machine code

 

Yes.

In fact, I tried to do the same a couple of years ago when I was working as a contractor for IBM.

In that past project, there was a rather complicated build chain involved, and a QA guy wanted me to check whether the binaries were identical after some subtle changes to the build chain.

To make it short: It was impossible.

I encountered the same issues you wrote about, and also additional ones.

It is true that compiler and linker add a lot of tags, version numbers, date/time information and the like to executables they produce.

But even worse, I had to learn that the compilers occasionally generated different code for equivalent source text, only differing in comments!

I even disassembled the generated code to learn more about those differences.

As it showed, the code was functionally identical, but for no appearant reason some of the functions just exchanged their order in the generated object file. (BTW, this was code generated by MS Visual C++ 6. Perhaps they let a pseudo random generator, seeded by a hash of the source file, determine the order in which functions are emitted into the object file? Or perhaps it's just a bug. Or perhaps, as M$ usually likes to phrase it, "This behaviour is by design".)

 *Cinquero wrote:*   

> (or because the 3-stage compilation produces different results depending on the installed glibc...).

 

This is theoretically possible, if the new clib defines different macros in its header files, which will then expand to different (but typically functionally equivalent) code.

However, I doubt this: Aside from fixed bugs, the libc ABI is very stable.

New versions of glibc typically add functionality, but will not effect the old functionality as used by existing applications.

----------

## Cinquero

 *Guenther Brunthaler wrote:*   

> ...

 

Yeah, I already thought that there are such problems... but I could imagine that for testing purposes some of the gcc devs implemented build options to make the output comparable/stable....

----------

## Guenther Brunthaler

 *Cinquero wrote:*   

> that for testing purposes some of the gcc devs implemented build options to make the output comparable/stable....

 

Perhaps this might even be the case! GCC has such a plethora of documented options ... perhaps there are also a couple of internal, otherwise undocumented options exactly for that purpose! Who knows...

Any volunteers for looking into the source code?  :Wink: 

----------

## irondog

Günther, mythbuster Günther!  :Smile: 

 *Quote:*   

> Myth 2: We need to recompile everything after a glibc update 

  If this myth was really true, you wouldn't be able to recompile everything because everything is broken.

Good job Günther! I've always been agitated by these while `true`; do emerge -e world; done parrot-like people.

----------

## StifflerStealth

In past compiles, I have seen GCC and it does do the three stage compile. That's why it takes so long to build. To prove it you can DL gcc from the GCC homepage and build it manually and compare the compile times with genlop or something.

Cheers.

----------

## Gergan Penkov

 *StifflerStealth wrote:*   

> In past compiles, I have seen GCC and it does do the three stage compile. That's why it takes so long to build. To prove it you can DL gcc from the GCC homepage and build it manually and compare the compile times with genlop or something.
> 
> Cheers.

 

Does it compile itself 3 times or simply bootstraps itself on three stages - I don't believe that gcc build itself more than 1 time, although never looked at the compilation output, the compilers bootstrap themselves - I also don't see any sense for a compiler to build itself more than one time...

----------

## irondog

 *Quote:*   

> Does it compile itself 3 times or simply bootstraps itself on three stages - I don't believe that gcc build itself more than 1 time, although never looked at the compilation output, the compilers bootstrap themselves -

  Then you have lazy eyes.  :Wink: 

Try building a cross compiler, compile a .c file with the -v flag. 

Just like this:

```
$ echo 'int main() {}' > bla.c

$ /usr/local/crosscompile/i386-gnulibc/bin/i386-unknown-linux-gnu-gcc -v bla.c

[snip]

GNU C version 3.4.5 (i386-unknown-linux-gnu)

        compiled by GNU C version 4.1.0 (Gentoo 4.1.0).

```

What you see: my gcc 3.4.5 cross compiler is built with my (back-in-those-days) gentoo system compiler.

Now the same with my system compiler

```
[snip]

GNU C version 4.1.1 (Gentoo 4.1.1-r1) (i686-pc-linux-gnu)

        compiled by GNU C version 4.1.1 (Gentoo 4.1.1-r1)

```

Okay. You can see that my system compiler is built by itself.  Now you have to believe me that I never have re-emerged gcc over and over again.

Your homework is now to change your gentoo compiler from any version to any version. What you will see is always this:

```

GNU C version X.Y.Z (Gentoo X.Y.Z-rP) (i686-pc-linux-gnu)

        compiled by GNU C version X.Y.Z (Gentoo X.Y.Z-rP)

```

Do you believe it now that Guenther is right????

Myths busted!

 *Quote:*   

>  I also don't see any sense for a compiler to build itself more than one time...

  Rebuilding a compiler itself is a real good sanity check. If a complex thing as a compiler can rebuild itself, than it must be a working one. If it can't, there must be a bug in the current or previous compiler.

----------

## Gergan Penkov

this does not show that the compiler was compiled three times, and even less shows that the compiler has not bootstrapped itself in three stages, instead of building it three times...

so searched google for gcc bootstrap build:

one of the results

 *Quote:*   

> 
> 
> Building a native compiler
> 
> For a native build issue the command `make bootstrap'. This will build the entire GCC system, which includes the following steps:
> ...

 

----------

## irondog

 *Gergan Penkov wrote:*   

> this does not show that the compiler was compiled three times,[...]

  This shows that the compiler at least compiled itself.

----------

## rhill

gcc (the C compiler) is built three times.  gentoo uses profiledbootstrap so you can identify the three stages as xgcc, gcc --fprofile-generate, and gcc -fprofile-use.  fortran, g++, objc, etc are only built once under the current bootstrap system.  this changes in 4.2 where everything will be built three times.

Guenther:  thanks!

----------

## ennservogt

Guenther you have written a great post! I have learned a lot by reading it. From symbol names to binutils.

"Also ein herzliches Dankeschön von Linz nach Wien  :Wink: "

But I have to ask you if your assumptions are still correct after what dirtyepic has written:

 *Quote:*   

> 
> 
> fortran, g++, objc, etc are only built once under the current bootstrap system. this changes in 4.2 where everything will be built three times. 
> 
> 

 

The changelog for GCC 4.2 from http://gcc.gnu.org/gcc-4.2/changes.html corresponds with this:

 *Quote:*   

> 
> 
> All the components of the compiler are now bootstrapped by default. This improves the resilience to bugs in the system compiler or binary compatibility problems, as well as providing better testing of GCC 4.2 itself. In addition, if you build the compiler from a combined tree, the assembler, linker, etc. will also be bootstrapped (i.e. built with themselves).
> 
> 

 

----------

## Guenther Brunthaler

 *ennservogt wrote:*   

> "Also ein herzliches Dankeschön von Linz nach Wien "
> 
> 

 Gern geschehen, lieber Landsmann!  :Smile: 

 *ennservogt wrote:*   

> But I have to ask you if your assumptions are still correct after what dirtyepic has written

 

Hmmm... good question!

As the whole GCC build procedures are that complicated, it's not easy to determine this from just looking at the makefiles.

I guess the easiest way would be to re-emerge GCC < 4.2 and just look at the build output.

But even in the worst case, if GCC is actually only built once, it will still generate a compiler using the new code generator.

However, the compiler itself will then run slightly slower than necessary, because it has been compiled by an older version of the compiler (assuming each new compiler version generates faster code).

But the executables generated by that compiler will benefit from the new code generator in either case.

Also note that stage 3 of the GCC bootstrapping process is just a verification step that won't increase the compilation speed and should yield nearly exactly the same executable as the compiler generated in stage 2.

But without the 3-stage-bootstrapping, a stage-1 compiler will actually be used, which should generate the same executables as the stage-2 or stage-3 compilers, but will run itself slightly slower.

The missing verification step also introduces a very small chance for regressions. But then, if 

anyone has ever bootstrapped the new compiler with a particular version of an older compiler successfully, then there are obviously no regressions, and hence there is no need to run the last stage ever again for the same involved compiler versions.

This is, because GCC is (as far as I know) a deterministic program, that always generates the same output  (except for embedded internal timestamps in the object files) if provided with the same input.

That means, the chances that missing bootstrapping can introduce regressions or lead to differently generated code are really very small.

The only problem might be that the new compiler will not benefit from it's own code generator, because it has still been compiled by the previous compiler.

Which means, it might indeed slightly speed up new compilations if the new compiler is rebuilt again.

But.

How long will it take to re-emerge GCC? An hour?

And how much faster will it actually get because of the new code generator? By 0.1 percent?

I seriously doubt, that even recompiling the entire system using a re-compiled new GCC can save that much cycles so that it can outweight the time that was required for recompiling the new GCC a second time.

Also remember that a stage-1 and stage-3 compiler will have the same functionality, which means a stage-3 compiler will not generate different executables than a stage-1 compiler. It will only generate them  in a slightly diminished amount of time.

But that's only the pragmatic view of the issue: Nonsense or not, I admit I would recompile GCC if I definitely knew (which I still don't) it actually does not do a 3-stage bootstrap.

Because I always want the fastest compiling compiler, even if it generates the same code.

At the very least, it's just for fun!  :Wink: 

----------

## DrunkenWarrior

Interesting.

Once I posted on the bsd forums about the 'emerge -e system && emerge -e system && emerge -e world && emerge -e world' camp of gentusers, and didn't get much of an answer. If I recall correctly the handbook says to build the kernel, reboot, build the system (OS), then the userland. But I wonder if the order is optimised like this, or if it matters. Presumably recompiling on fbsd is analagous to gentoo.

Another question, does ccache interfere with any of this?

----------

## Guenther Brunthaler

 *DrunkenWarrior wrote:*   

> Interesting.
> 
> If I recall correctly the handbook says to build the kernel, reboot, build the system (OS), then the userland.

 

Kernel and userland are mostly independent except for the linux-headers which have been used to build the glibc. (Special userland kernel-helpers like truecrypt are another exception.)

But even glibc is normally not too dependent on a specific kernel version, unless you are using a very only kernel together with a very new glibc or vice versa.

That's because existing kernel syscalls rarely change if ever, and so the potential for regressions is very low.

Which usually means, it doesn't matter in which order you build the kernel or glibc.

Also note that glibc does not directly use the kernel's header files. It uses the header files distributed in the current version of the linux-headers ebuild instead.

That's in order to avoid rebuilding glibc too often, as kernel versions tend to be updated far more often than glibc is.

glibc therefore uses a snapshot of some reasonably current set but not the most current kernel header files for its purposes, de-coupling package dependencies of the kernel ebuild from the glibc ebuild.

 *DrunkenWarrior wrote:*   

> Another question, does ccache interfere with any of this?

 

My observation is that it does not matter.

In any case, ccache is dependent on the c compiler being used only, not on what is compiled with it.

Which means that compiling the kernel or glibc is not different from compiling any other source text from the view of ccache.

However, it is important that ccache cache is cleared whenever a new c compiler is installed.

That's because ccache's job is to return a compiled object from its cache whenever it receives a request to compile a set of source files it has already compiled before.

But if a new c compiler has been installed, the same set of source file might result in differently compiled object files which might no longer be compatible with those in the ccache's cache.

So the cache must be purged as soon as a new c compiler will be installed.

However, I think Gentoo's gcc-config does this automatically when it is invoked to change the current default compiler, so there is usually no need to worry.

On other platforms such as bsd, it might be a good idea to run

```
# ccache -C

# ccache -z
```

after installing a new version of gcc, manually flushing ccache's cache.

----------

## fizik

I wonder why you didn't mention exactly in your recipe if you need to recompile and update/reinstall kernel when updating GCC?

----------

## fizik

Actually I successfully updated new Gentoo box last days with your instructions (manually), but on another old Gentoo installation it shows more then 450 packages in system in contrast to a new one - 100 packages. So I understand, that not more then 100 packages are needed to build new toolchain. How can we optimize your way accounting this fact?

Also I think that after recompiling "system" packages it's better to finish with building toolchain (emerge linux-headers glibc binutils and, if you are paranoid  :Smile:  - emerge gcc binutils) and then emerge other "world" packages, which are not in "system" (35 for new Gentoo box in my case).

----------

## Guenther Brunthaler

Hi fizik,

 *fizik wrote:*   

> I wonder why you didn't mention exactly in your recipe if you need to recompile and update/reinstall kernel when updating GCC?

 

Because it's not always (never?) necessary.

The kernel normally interacts with userspace via int 0x80 syscalls (at least on the x86 platform), and this stays the same no matter which compiler is used.

Also, the kernel is strict C - there is no C++ code involved (at least I have never encountered any). The primary reason why recompiling is required is a C++ ABI change in the compiler. Which means pure, freestanding C programs like the kernel seldom if ever actually require recompilation.

Of course, it will never hurt to recompile the kernel also. For instance, it might benefit from better code generation capabilities of a newer compiler.

But strictly speaking, it is not necessary.

Regards,

Günther

----------

## Guenther Brunthaler

Hi fizik,

 *fizik wrote:*   

> but on another old Gentoo installation it shows more then 450 packages in system in contrast to a new one - 100 packages. So I understand, that not more then 100 packages are needed to build new toolchain.

 

Well, the layout of packages and necessary build tools change over time.

For instance, the good old EGCS-2.95 required a hell less of code than a new GCC-4.3 does, but yet both are C compilers.

Development tools gain additional features over time which makes them more complex, and at some point it is common to split such tools into separate components to ease maintenance.

Just think of xorg-server - it has been a single monolithic package once, and now about the same content is shipped as multiple scores of interrelated packages.

The point is that the number of packages alone is not a useful indicator how much work is actually done, because in many cases packages are more fine-grained today as they used to be a couple of years ago.

 *fizik wrote:*   

> How can we optimize your way accounting this fact?

 

I don't know a better way than running "emerge -ep system" yet, but if you should find one some day please let me know.

However, the basic purpose for my script is to recompile the entire system.

So there is little to gain by trimming down the "world" share of packages among the total number of packages, because all the packages will eventually be recompiled anyway.

Regards,

Günther

----------

## fizik

Yes, I think that kernel is speed! So it's better to compile it with probably better C compiler  :Smile: 

Also world packages will be compiled with fully updated compiler in my recipe.

As for fine grain, I wonder why some x11 and gnome packages came to "system"? It took from me much effort to define right order to update them from old versions.

I believe that only prerequisite tools are needed to build GCC (as stated at gcc build guide). Hope Gentoo developers didn't include wild new packages in gcc managing). Did you ever look at LFS project? I still can't reproduce their way of building Linux, though ideas are very exciting.

----------

## Guenther Brunthaler

Hi fizik,

 *fizik wrote:*   

> Did you ever look at LFS project

 

Yes! Actually, I did look at it at a time I had no idea about Gentoo even existing.

I only did know SuSE and RedHat then, and I hated the fact that everything seemed to be so complicated there: In RedHat (version 5.1 at the time) there were a lot of cool wizards and GUIs for about everything, but only the fewest of those tools seemed to actually do what they should have done. "Fail silent" seemed to be the general design principle.

And SuSE's YAST also drove me crazy with its templates and script skeletons and whatever; I never had the feeling of knowing what went on behind the scenes. I just felt locked out from most administrative decisions.

After I learned about LFS in the first place I was very excited - it looked like the first usable Linux "Distribution" to me.

And it was so easy to use - no f***ing wizards or unnecessary GUI helper (sometime I feel attempted to rather call them "information obfuscation") tools; just nice, well documented plaintext ASCII configuration files. Can there be anything more simple?

But as nice LFS was, it also had its downsides.

Primary problem was: It was not really a Distribution.

It was just a bunch of scripts and a manual to compile everything myself.

While that's a cool thing to do in general, it's only cool when you do install your packages the first time.

But the problem with this approach is updates. Security updates specifically.

When using LFS, it's your own responsibility to regularly check for updates and security fixes for each and every package installed. Not to mention performing tests on all new package releases in order to detect regressions.

There is no-one there (or it least there was no one there the last time I checked out LFS) to do that for you.

No one will try to figure out which new packages are "stable", and which one will rather break things.

No automatic cross-platform-compatibility-checking either.

You are all on your own.

When I eventually found out about Gentoo, I was more than excited: It looked very close to LFS, but was using automated, already-tested build scripts. USE-Flags provided an easy means for turning on or off the most important optional package features. And most importantly, you still could easily intercept each build or installation step and manually modify it (or add patches) if required. For permanent modifications, portage overlays could be used.

But the most important advantage was: The Gentoo maintainers took care about all the updates and most of the testing.

I was in heaven.

OK, even Gentoo is not perfect. For instance, its runlevel-system is both nonstandard and also does not work correctly in certain situations. The strangely interrelated menage-a-trois consisting of "emerge -avuDN world", "emerge -a --depclean" and "revdep-rebuild -a" should also better be integrated within a single tool.

But then, it works most of the time, and Gentoo is still by far the best distro I have been able to find so far.

 *fizik wrote:*   

> I still can't reproduce their way of building Linux, though ideas are very exciting.

 

To be honest, I have not been able to even compile the basic toolchain myself yet. (Admittedly, I tried to do it myself without using any LFS scripts. Perhaps this was a bad idea.) And worse, I was not even capable of finding out which of the plethora of gcc patches to include or not include before even running the first line of configure! It was a total disaster.

And when I look into the Firefox or OpenOffice ebuild scripts, I am honest enough to admit to the authors how very thankful I am to not have been required to research all that stuff on my own!

Regards,

Günther

----------

## fizik

My last case in Gentoo was when one program required one version of dependency and some other programs - the other version. They were incompatible  :Smile:  And dev-s advice was - unmerge any programs, that give a conflict  :Smile:  So it's much to do more in Gentoo ... Though I like it more then Slackware I worked on before. And I hate RH, being RHCT  :Smile: 

----------

## neonl

So, a noob question. Imagine i download a plain stage-3 install. I want to use the ~arch "compilation toolkit" and, then, rebuild all the system to get a bleeding edge toolkit and, after all, have the a stage1 effect (every bit of binary code being compiled localy with my CFLAGS).

The best way to do this would be

```
# for the toolkit:

sys-apps/coreutils ~x86

sys-apps/groff ~x86

sys-devel/binutils ~x86

sys-devel/gcc-config ~x86

sys-devel/gcc ~x86

sys-libs/glibc ~x86

sys-apps/busybox ~x86

# for portage:

sys-apps/portage ~x86

dev-lang/python ~x86
```

to package.keywords, then

```
emerge --sync
```

After this reemerge the toolkit and system by doing

```
emerge -e system && emerge -e system
```

Is this right?

Regards  :Smile: 

----------

## node_one

 *neonl wrote:*   

> So, a noob question. Imagine i download a plain stage-3 install. I want to use the ~arch "compilation toolkit" and, then, rebuild all the system to get a bleeding edge toolkit and, after all, have the a stage1 effect (every bit of binary code being compiled localy with my CFLAGS).

 

I have tried this.  I am not sure it does exactly what you want.

----------

