Package Management: Building Things (Part 2)

Blog post by bonefish on Sat, 2013-05-25 19:29

It's been quite a while since the previous blog post. I've been waiting for an nice occasion, but the more interesting milestones are still a bit away. While nothing flashy can be presented, a lot of work has been done anyway.

At the time of the previous post we had just managed to get haikuporter, our high level package building tool, ready to hierarchically build packages. Since then it has seen a lot of updates:

  • Some were of merely aesthetical nature, like refactoring the code base to be more manageable.

  • Some changes improved the usability, like improved output and new options to better understand what is going on and why. Now haikuporter also makes working on a port easier, since it imports the original sources of the port into a new git repository, commits our Haiku specific patches, and later allows easy extraction of changes. The manual process was rather tedious before.

  • Some changes improved the correctness -- more precisely: strictness -- of how build dependencies are resolved.

  • Other changes added missing features. Most notably the possibility to build multiple packages per port, e.g. a development, a documentation, a debug info package etc.

A good chunk of our time went into creating build recipes for the various ported software needed for Haiku. While that may sound straight-forward, particularly given that for most ports there were already .bep files (the old recipe format) to start with, it wasn't in many cases. Due to our packages needing to be flexible regarding their installation location, absolute paths must not be built in or, where necessary, must use the package links indirection.

We also reorganized the directory layout of the installation locations. All kinds of documentation (often man and info pages) is required to go into the "documentation" directory. The contents of the "etc" and "share" directories goes to "data", "documentation", or "settings" as appropriate. "include" is "develop/headers" and development libraries go to "develop/lib". The latter two additionally required changes to the built-in paths in gcc, cmake, python, and various programs and scripts that search for headers or libraries.

A few times we noticed a bit late that there was a problem or something had to be solved differently, requiring us to go back a few packages and build them again. So we had a lot of "fun", but in the end we managed to build (almost) all packages in their respective version used in the Haiku master. As it turned out later we missed a few that the Haiku build system required. Nonetheless that finally allowed us to merge the master in the package management trunk, so we're no longer two years behind the current development.

A few more things have happened since. I just finished updating the format of our package files. The update had been necessary to add a few required features. We wanted the boot loader to be packaged as well, which is now the case. It lives in its own package (haiku_loader.hpkg). The content is uncompressed, so that our stage one boot loader can still load it, but otherwise it works like any other package, so we don't need any special handling for it. Some additional meta information can now be stored in a package as well, like what global settings files are included in the package, what user settings files the software creates, what Unix users and groups the package needs to work (e.g. sshd needs a dedicated user), and what scripts have to be executed after the packages has been activated (e.g. for ssh the host keys need to be created).

I used the format breakage to also optimize the format a bit. Formerly file and attribute data were compressed individually. The reason for that was for packagefs to be able to quickly access the data of a certain file. E.g. a (gz, bzip2, xz, ...) compressed TAR archive wouldn't work at all, since the whole archive would have to be read and decompressed up to the point where the file is stored. The new HPKG format concatenates all data and compresses the result. Since it does that in fixed chunks of 64 KiB size and there is a table of contents which specifies where the data for each file are stored stored exactly, it is still possible to quickly access specific file data. Due to a new cache in packagefs which caches uncompressed chunks, this should even improve the performance.

The new format definitely helps with compression ratios. In my tests those were significantly better than zip (i.e. closer to tar.gz). For downloads this is certainly desirable, even if we're nowhere close to what good compression tools (like xz) can achieve. A thinkable option, should we consider optimizing download sizes further, would be to use no compression when creating the package and compress it with xz for transport.

While I was playing with the package format, Oliver started working on something we hadn't quite anticipated we would need to deal with: cross-compiling packages. Years ago Haiku's source tree would contain the sources for all the software required for a basic Haiku installation. While that was quite nice, since the build was self-contained and targeting a new architecture or changing the (low level) ABI could be done rather comfortably, those are very rare tasks. A common task like building Haiku, however, would take a lot longer. Moreover, keeping the included third-party software up-to-date would also be complicated due to having to maintain a Jam-based build system for them, which had to be kept in sync with the software's native build system (usually based on the GNU auto tools and make). So it was decided to externalize the third-party software, i.e. remove the sources from the Haiku source tree and instead provide pre-built packages that Haiku's build system would download and include in the created Haiku image.

Since some of those packages are required on Haiku to build software (e.g. the compiler) -- including themselves -- or to run Haiku at all, this creates a chicken-and-egg problem: How do we build the packages in the first place, if we need them for building? The solution is cross-compilation, i.e. building the packages for the target Haiku system on some other system (possibly not even Haiku).

A complete bootstrap build for Haiku will work like this:

  • Configure the Haiku build as usual, including building a cross compiler.

  • Build "haiku_cross_devel.hpkg", a package that contains the Haiku headers, libroot, and the glue code, i.e. everything needed to build software for that target Haiku.

  • Check out the haikuports.cross respository. It contains build recipes for all the external software that we need to bootstrap our Haiku. Cross-build that software using haikuporter together with the haiku_cross_devel package.

  • Build a minimal Haiku, including the cross-built packages.

  • Boot this minimal Haiku and build all packages.

  • Now that all packages are available, build a complete Haiku as usual.

Note that the packages built using the haikuports.cross repository are not the same as the ones for the final Haiku. They are specially patched and built to have only a minimal set of dependencies. E.g. the final grep package will have internationalization support, but we don't need that for the bootstrap grep package.

Oliver has already prepared cross-building patches and recipes for binutils, gcc 2, sed, grep, and gawk. Several more packages are still to be done. Currently Haiku needs to be used as a host platform for cross-building packages (due to packagefs, as well as package and dependency resolution functionality used by haikuporter). It would be nice to eventually support other host platforms as well.

The reason for the whole cross-compilation topic becoming relevant for us now, is that we only have x86 gcc 2 packages ATM. Since we want to build x86 gcc 4 and x86-64 packages some time soon, we are facing the chicken-and-egg problem -- no packages, hence no system to build the packages on. We could work around it by repackaging the existing optional package zip files as HPKGs, but that wouldn't be particularly well-invested time. Moreover we intend to build the hybrid part of Haiku gcc 2/4 and 4/2 hybrid builds in a similar manner (cross-compilation on the same platform).

So, the cross-building topic is going to keep us busy for a bit. The other upcoming task is updating and rebuilding all existing packages for gcc 2, so they use the new package file format and, where necessary, the new features that come with it. This isn't that urgent, since the old format can still be read by packagefs, but it needs to be done eventually.

On a finally note, my contract has ended, actually already about two weeks ago. A new contract has been agreed upon, though, so I will continue development full-steam in June. Oliver still has a few hours left on his contract and he has also agreed to renew it afterward. Matt will post an update with the details soon.

Comments

Re: Package Management: Building Things (Part 2)

Brilliant! Most of that technical mumbo-jumbo flies a few feet over my head of course... :)
The massive commits to the PM tree convince me that you two have been working hard, though. So the most important paragraph is the last one.
Just as a rough estimate: how far are we percentage-wise to a beta worthy state of the PM?

Thanks for your awesome work!
Humdinger

Re: Package Management: Building Things (Part 2)

[quote=humdinger]Just as a rough estimate: how far are we percentage-wise to a beta worthy state of the PM?[/quote]

Here come the easy questions. :-) This depends on what you consider a "beta worthy state". E.g. does that include a GUI package manager with all bells and whistles (screen shots, user ratings,...)? We haven't even thought about such a beast. Given that several people have expressed their desire (or even started) to work on such a thing, we don't intend to work on it any time soon. Our big goal is to get the package management branch merged back to the master. For that everything should at least work as well as the current master, ideally much better. What has to be done to reach that goal is at least the following:

  • Complete the cross-building support.
  • Get the hybrid builds working again.
  • haikuporter: Implement packaging policy checking.
  • Rebuild all packages for all architectures.
  • (Mostly) complete the package management daemon (dependency resolution, post-install work) and the (CLI) package manager (distribution upgrades).
  • Implement tools/infrastructure for repository creation/management.
  • Change the build system to work with repositories.
  • boot loader: Add safe mode/boot old version support.
  • Adjust various applications (e.g. Expander) to work smoothly with a read-only system.

Now you could attach a random x-days/weeks sticker to each item to get some kind of estimate. Given that several items (particularly the cross-building support) are highly dependent on what problems we'll run into, your estimate will be as good as mine.

To at least somewhat answer your question: I think we're significantly past 50 % in total towards the goal mentioned above. More importantly, however, we can now say with great confidence that the overall concept is pretty solid and it is extremely unlikely that we'll run into an insurmountable problem that will send us back to the drawing board.

Re: Package Management: Building Things (Part 2)

Very nice progress report! Just a quick question: Will the haikuports.cross repository be automatically usable for any Haiku architecture to bootstrap? Thanks for all your hard work. It's just so motivating to see all the progress for Haiku!

Re: Package Management: Building Things (Part 2)

The haikuports.cross repository is indeed intended to work for all architectures. It cannot be ruled out that some additional tinkering is necessary for certain architectures and ports, but I expect many (most?) tools to build and work for all architectures once they do for one architecture. At least I cannot imagine a reason why e.g. grep wouldn't work for ARM once it does for x86.

Re: Package Management: Building Things (Part 2)

I would be interesting to know if the packagemanager and haikuporter, allow an easy switch to llvm clang?
When I see the latest performance tests, then it's clear that llvm clang is now on par with gcc in respect to performance, but has also a complete c++11 support, so it's obvious that haiku will also do the switch to clang later at some certain point.

Re: Package Management: Building Things (Part 2)

Package management doesn't really make it easier or more complicated to switch to clang. Most importantly the Haiku build system must be adjusted to work with clang. Other than that we do have a few ties (a build dependency) to gcc in the build recipes for ported software. Implementing an abstraction (e.g. using a variable that resolves to gcc or clang depending on a parameter/option) shouldn't be a big deal. Getting all ports' build systems to automatically detect or force them to use clang might be a bit more work. OTOH it may be as simple as adding a CC=clang.

Re: Package Management: Building Things (Part 2)

Wow, great news about renewed contracts. You guys are doing amazing work!
[in MK voice] *FINISH IT!*

Re: Package Management: Building Things (Part 2)

Wow, you guys have been busy this past month! I'm glad to hear that both of your guys' contracts have been renewed, I look forward to seeing the result of your guys' hard work!

Re: Package Management: Building Things (Part 2)

It is absolutely amazing how much work you have put into this effort! I am glad that you have had your contracts extended through June! Perhaps we could be 75% done by that time? Software development is like creating order. Creating order out of chaos is kind of like making a needle and then throwing it in a lake and then trying to find it again. Or at least that is how I imagine it. lol

A bit of advice to you though. Don't get so deeply involved in this project that you begin to burn out. Take time for yourself to relax and think of other things than programming and resolving dependencies. Sometimes we (including myself) get so excited about a contract or piece of work that sometimes we may take the developers for granted. Work that is not fun is rarely good work.

I am already looking forward to future updates regarding this effort. Work hard, but play harder!

ddavid123

Re: Package Management: Building Things (Part 2)

So does this mean you have to have the minimal haiku system running before building the final system or could the final system be cross compiled from a different system with the minimal development package?

It doesn't sound like it would be good to make the buid depend on a running Haiku system after all builds are currently done from FreeBsd aren't they? It might also complicate cross arch ports.

Anyway I have probably missunderstood but maybe you can set me straight!

Re: Package Management: Building Things (Part 2)

The minimal haiku build for cross compiling is only needed for the first time one of our contributors create the necessary packages for it.

As things progress, someone will need to apply those steps in order to create the first round of necessary packages for x86_32-gcc4, x86_64, ppc, the various ARM platforms, etc. (I'm guessing this could even be done by anyone willing and patient enough to do it.)

For example, Ingo and Oliver needed to do those steps (or something very similar) to create the first round of x86_32-gcc2 packages. Once enough packages were created, they (and everyone else) could simply build x86_32-gcc2 with the normal build steps (configure ; jam -q ; boot it).