Native GCC 4.3.3 for Haiku - Tales of updating the GCC4 port

Blog post by mmlr on Sun, 2009-02-01 02:38

        <p>
            Out of no real particular motivation I wanted to build a native GCC4 for Haiku. We've had a GCC 4.1.2 cross-compiler for a pretty long time now, but since there were some issues with GCC4 built Haiku installations and especially since there never was a native toolchain for GCC4 based Haiku, it has always been a second class citizen. You could experiment around with it and we've had hybrid builds able to use software for both GCC2 and GCC4 Haiku on the same install, but since you had to use the cross-compiler to build GCC4 Haiku apps it's always been a bit less convenient that just building for GCC2 Haiku. But there's a lot of software around that simply can't be built using GCC2 anymore. The reason for that being the use of coding conventions or simply features that weren't available in GCC2. So a native GCC4, meaning a GCC4 running inside Haiku and building GCC4 Haiku apps, is a really important thing to have for porting and building many future applications.
        </p>
        <p>
            Since I briefly looked into updating the GCC4 cross-compiler in summer 2008 already (without getting very far because of lack of time), I already knew pretty much what to get and where to get it. I also already built a diff between the vendor and trunk version of the 4.1.2 cross-compiler in the buildtools repository. So I had a rough idea of what I needed to touch to get things going.
        </p>
        <p>
            I was about to build a native GCC4. This means that I was going to build it on Haiku itself. Of course you could actually cross-compile a native compiler in a two step process on another platform as well. But since I use Haiku as my main and only operating system here, this was no real option. So this meant that a GCC2 Haiku with a GCC2 compiler would be the host for all the fun. Considering that GCC is a <b>huge</b> project, consisting out of many subprojects and thousands of source files, this would be a pretty tough stress test for Haiku. I wasn't even sure if the GCC 2.95.3 we are using was up to the task, but it turned out that this didn't pose any real problem.
        </p>
        <p>
            One of the less convenient things about the GCC 4.3 series is that they require two external libraries: GMP and MPFR. Both are multiple precision math libraries. I won't pretend to have any idea on what they do or what they're used for, so I just accepted the fact that those were two dependencies to get first. Luckily both libraries are pretty much self contained and easy to port. The only thing I needed to do was to update the config.guess and config.sub scripts to a version that knew about Haiku as a target platform. The rest was the usual "configure; make; make check; make install". I configured them with just "--prefix=/boot/common" and they built and installed fine.
        </p>
        <p>
            Encouraged by that smooth process I went on to GCC. The first questions come up right away when configuring. GCC has lots of configurable parameters and I wasn't really sure what to put there. Luckily I could just peek at how the cross-compiler scripts in the repository configured GCC 4.1.2. So I came up with "CFLAGS="-O2" CXXFLAGS="-O2" configure --prefix=/boot/develop/tools/gcc-4.3.2-haiku-090121 --target=i586-pc-haiku --disable-nls --disable-shared --enable-languages=c,c++" as my configure line. Yes, that's a configure invocation directly in the GCC source dir. Not really having any experience with this type of build system and of course only reading the (very good) configuration instructions after the fact, lead me to do the configure directly in the root source dir. The funny thing is that this little error later on would uncover a bug in libiberty that is now reported in the GCC bug tracker with a patch attached.
        </p>
        <p>
            Since we are on platform "i586-pc-haiku" and are compiling a GCC for target "i586-pc-haiku" this means we're doing a native build. This also means that bootstrapping will be enabled resulting in a three step process: First a pseudo GCC is built that is a subset of a complete GCC just capable enough to actually build the full GCC. Through that process you don't actually compile GCC4 directly with your host compiler. This means that using our GCC 2.95.3 as host and more exotic compilers on other platforms just have to be able to build a working subset of GCC. This lessens the dependency on the host compiler and exposure to any potential issues in them.
        </p>
        <p>
            The second so called "stage" is then to build a full GCC4 using the compiler built in stage 1. When creating a cross-compiler the process would end here. But since we're bootstrapping, there is a stage 3. Stage 3 essentially rebuilds the whole GCC4 again, this time using the compiler built in stage 2. This is to verify that the compiler that was built is actually able to build working programs. And this also explains why the process can't have a stage 3 for cross-compilers - if the compiler created in stage 2 is a cross-compiler creating binaries for other platforms you obviously can't use it to build a stage 3 compiler running on the host again. When the stage 3 compiler is built, the build system will verify that both compilers are valid. It does that by simply comparing the object files created by both stages. As both stage 2 and stage 3 compiler are built essentially by the same code, the resulting object files should be identical. If this verification passes, optional parts of the GCC4 package like the libstdc++ are built and the process is done - that's our goal.
        </p>
        <p>
            I started out with a plain unzipped tree of GCC 4.3.2 (yep .2) sources. So before going anywhere I would need to actually port that GCC of course. Using the diff I had produced of GCC 4.1.2 this was pretty straight forward though. Some things have moved, but you could easily track them down.
        </p>
        <p>
            Now I could start the whole process and it would actually get pretty far already. Some minor tweaks here and there and it would build the stage 1 compiler. This was the easy part, because now things started to go strange. Since the GCC built in stage 1 uses the config that you set up, you will notice misconfiguration when building the stage 2 compiler with it. And it seemed pretty misconfigured. First, things were found that weren't actually there and I started patching and changing ifdefs in some of the files. This was about when I really needed to go to bed and leaving it for that day. Thinking about it more and more while in bed and in the morning I came to the conclusion that this can't be it, that this would never create a proper patch and that there must be a root cause why things were failing this way. So instead of any further patching I started looking at the config.log to find out why configure would detect stuff that wasn't there. Now there are basically two things that are commonly done: 1. checking if a certain header/function/variable/define is present and 2. whether or not a certain symbol (function entry point in a library) is available. The checks that were looking for headers and prototypes did work. It does the checks by spitting out small source files and trying to compile them. So if you are looking for a header that isn't there compilation of the test program will just fail. The checks for available symbols however did not fail, the test programs simply compiled and linked.
        </p>
        <p>
            That was when it became pretty obvious. Knowing from prior experinece that there was the option to allow or disallow missing symbols I figured that the configuration I set up would simply allow missing symbols. This meant that the test programs would compile and link, but wouldn't be able to run, because the stuff they refer to simply isn't in any library the system provides. This does however go unnoticed, because configure doesn't try to run the binaries, it simply checks if they compile and link. I then took a look at the specs and the theory turned out to be true, there was no -no-undefined present. BeOS did partially allow undefined symbols, that is they allowed undefined symbols in libraries but not in executables. This was done because they were linking their libraries with symbols not available at compile time. They were later runtime patched to be available. Some might remember the libroot.so.patch which patched the memset/memcpy to an optimized version depending on the CPU found. So -no-undefined was conditionally added for executables in the BeOS specs. Since we don't really want undefined symbols for Haiku, I've just added -no-undefined to the specs unconditionally. Should this later pose a problem, this can easily be changed in the sources or overridden in a custom specs file.
        </p>
        <p>
            So with the updated builtin-specs I restarted the process, rebuilding much of the pseudo-gcc as well as starting the stage 2 compilation again. Now the configure looked much better, it only found what was there and I could remove all the hacks I've put in the different files. But of course this wasn't the only problem. Now it started with undefined stuff like "NAME_MAX" and "PATH_MAX". Anyone more or less familiar with the matter should now be able to tell: your limits.h is missing or messed up. And it was missing of course. I checked the diff and saw that there was an overridden variable for the test to find limits.h. For some reason I don't clearly remember I removed that line and replaced it with NATIVE_SYSTEM_HEADER_DIR instead. I think it was because the GCC build system tried to find the system headers in the default "/usr/include", which is of course not correct for Haiku. Anyway I got it working by moving some stuff around. Since I did all of that mostly during the timeframe I should have slept instead you have to bear with the fact that I wasn't always fully awake and don't remember all the details. After all they are pretty boring anyway.
        </p>
        <p>
            So after the basic config was corrected I actually got the stage 2 compiler built. And since the stage 3 is essentially the same as stage 2, this one ran through equally well. This meant I finally had a native GCC 4.3.2 for Haiku. We're done, start the party! Ehm not quite of course. Having GCC 4.3.2 was of course a big step, but the real goal was to have a GCC4 Haiku running the GCC4 compiler natively. And I was still on GCC2 Haiku and since this wasn't even a hybrid build, the new GCC4 couldn't even compile anything runable that used C++.
        </p>
        <p>
            Instead of party the goal now changed to get Haiku compiled with that new GCC. Since, as mentioned, we're on a non-hybrid GCC2 Haiku, the compiler can't be just dropped in as a replacement for GCC2. You still have to use GCC2 for compiling the host tools required by Haiku. This means setting up the Haiku build using GCC2 as the host compiler and just using the native GCC4 as if it was a cross-compiler. Since the Haiku build system supports such a setup, this was not very hard to do. But of course this is the latest point you could possibly notice that you forgot something. I didn't forget it, but mostly didn't care so far. Having GCC is by far not everything you need to build binaries. The only thing GCC does is it produces assembler output for the target architecture. It does that through the GCC frontend and it's language backends and a lot of optimization tricks and maybe a bit of black magic, but in the end it only produces assembler output. The rest of the process, meaning creating object files from that, archiving them into a static library or linking them into a binary is done by the binutils. The binutils are run as a separate project and provide the ld, as, ar, nm, objdump, ranlib and strip. It's the binutils that actually understand the ELF format for example. The good thing about the binutils being separate is that you can update them individually. Since I knew that we had more or less up to date binutils already (2.17) I was just lazy and instead of building the current version (2.19) I just copied the old ones into place so they could be used with the GCC4 I had set up.
        </p>
        <p>
            So off you go with the Haiku build. But I didn't get very far. Of course Haiku was buildable with GCC 4.1.2, but we are updating two minor versions here. Many things have changed and especially many things have become more strict than in previous versions of GCC. So this meant that I wouldn't only have to update GCC and eventually the binutils, but that I would have to fix all the issues to make the Haiku tree compilable with the GCC 4.3 series as well. Luckily I've found <a href="http://gcc.gnu.org/gcc-4.3/porting_to.html">porting guide</a> explaining most of the issues I ran into and offering solutions. You can see the process in Haiku <a href="http://lists.berlios.de/pipermail/haiku-commits/2009-January/014024.html">r28981</a> through <a href="http://lists.berlios.de/pipermail/haiku-commits/2009-January/014038.html">r28995</a>.
        </p>
        <p>
            Success, I now got a bootable GCC 4.3.2 built Haiku. I've set it up to produce a hybrid build and then I was shutting down to go to the "other side" booting into the GCC4 Haiku. Just to add some emphasis to this, I did that whole process up to now in two sessions. I used Haiku to build a monster project, did update Haiku sources and checked them in while always reading webmail in Firefox, doing research to find solutions to the upcoming problems, chatting on #haiku and listening to music. All under Haiku, a still pre-alpha operating system. We're talking heavy workloads and uptimes of up to 20 hours here after all.
        </p>
        <p>
            Surprisingly booting into that GCC4 installation worked out of the box. There have always been some stability issues with GCC4 built Haiku installations, so I didn't expect too much stability. But it turned out that besides three unexpected sudden reboots, things would go smoothly. Again we're talking uptimes of up to 20 hours. In principle I wouldn't have had to rebuild GCC4 again, because this thing is pure C code. So even though it was compiled on and for a GCC2 host, it wouldn't matter at all. Call me paranoid, but I still wanted to rebuild GCC4 on GCC4 just to make extra sure that we really have a fully native toolchain in the end. So I created a fresh unzipped tree and applied the patch I made for my final GCC 4.3.2 sources. Expecting to just start the process again and ending up with a proper fully native GCC4. After all this should pretty much exactly the situation of the stage 2 in the previous build, so why should it fail?
        </p>
        <p>
            But of course it failed miserably. It failed even before compiling anything, because it claimed that the C compiler wasn't able to build binaries. Huh? OK, going through config.log it seemed as if the compiler toolchain wouldn't work. The C compiler backend "cc1" wasn't found. Tried to reproduce that testcase, I wasn't able to. It worked perfectly fine, with exactly that file. So, running a pre-alpha OS I started suspecting a problem with Haiku. Of course it was strange that this wouldn't turn up when compiling Haiku with exactly that compiler. So I was digging into the sources and added debug output here and there. Chatting with Rene Gollent we took appart some of the stuff that was going on. GCC claimed that it tried to execlv (I think) "cc1" but it didn't work. So we were looking through the whole path such a call takes. We didn't spot anything obvious. After a bit of more testing it turned out that exactly this test case would only fail when run inside the root source dir of GCC. Going through it systematically I copied over the files from that dir into a testbed and checked if it would still compile. It did. So I went on creating subdirectories analogous to the ones in the source tree. And there it was: as soon as there was a "gcc" directory, things would fail. Easy, the binary is called "gcc" and there is a subdir called "gcc", it must be that Haiku tries something executing "gcc" and thinks that directory is executable, which it is of course because the executable bit on directories means "traverse ok" and not "execute ok". But looking at the sources it was exactly this case that was handled! So the bug simply didn't seem to be in Haiku.
        </p>
        <p>
            Then you have to work from the other side. I looked up what was actually called on the GCC side. And it really was "cc1" and exec. And it really couldn't find it, because it obviously wasn't in the source dir. After a bit of searching through the sources and trying to make sense of the things I saw I came to the conclusion that the prefix that was computed must simply be wrong. And tracking the prefix creation down I arrived at a pretty familiar block of code. The code looked almost exactly like the code in Haikus exec implementation. That's because it resolved the binary path using the PATH environment variable. But the difference was that exactly the directory case where X_OK is set but it's not a regular file at all wasn't handled. So this was a bug in libiberty which caused it to fail to resolve the path to the own gcc binary, making it unable to resolve the relative path to the "cc1" binary as well. Funnily I was only able to find that by doing the configure from the directory where you shouldn't do it at all. So I patched that, registered at the GCC bug tracker and submitted a bug with a diff. You can see it <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38966">here</a>, turns out another reported bug was reported that exhibited the same issue, but they didn't track it down to that problem yet.
        </p>
        <p>
            Having this issue resolved, I was able to rebuild the whole GCC4 and build a fully native one. By that time the GCC folks released GCC 4.3.3, so I made a new diff, dumped my GCC 4.3.2 tree and started over again. Since Haiku hasn't really been optimized yet, we're talking build times of 3-4 hours here. Anyway I got it working straight from that patch again. This time I also wanted to get the current binutils, so I made a diff of the 2.17 binutils we had in the buildtools trunk and applied that (by hand, I like doing that because I can then more clearly see what's going on and if something nearby may need attention as well or if something can actually be left out). Binutils were friendly and built pretty much right away. So that really was a success. A native GCC 4.3.3 plus current binutils 2.19. I packaged it up, uploaded it and added it as an optional package ot Haiku.
        </p>
        <p>
            The rest of the story was updating binutils and GCC in the Haiku buildtools repository through my totally crappy internet connection, a hundred attempts to fix configuration issues for the cross-compiler setup and finding issues that other platforms exhibited when building those tools. You can find that in the commits and in the bug tracker, so I'll refrain from prolonging this story.
        </p>
        <p>
            Now we have a native, up to date GCC and the current binutils for Haiku. This should open up the door for a lot of exciting stuff. Ports of software that weren't feasible before because of using GCC2 and optimizations in Haiku that weren't available back then. Have fun!
        </p>