Enhancing the scheduler

Blog post by paweł_dziepak on Sat, 2013-09-07 20:17

Soon I am going to work as a full-time Haiku developer on enhancing the scheduler. The goal is to improve performance of the whole system and finally deal with some long standing problems. To achieve this CPU affinity will be introduced what would make cache utilization better and I will implement scheduler strategies based on dynamic priorities what, hopefully, would once and for all deal with priority inversion. In addition to that, I want to make scheduler more power-aware. Haiku currently lacks low-level support for some of the more advanced power related features of CPUs but having scheduler ready for would save us from redesigning it later. Also, there are still ways to conserve energy without using the most recent technologies.

My first goal is to complete what have been stared by implementing scheduler_affine. Having scheduler that is able to improve cache utilization by trying not to change the CPU thread is executing on may improve system performance. Since the threads are going to be assigned to a certain logical processors appropriate load balancing will have to be introduced. Also, scheduler will need to be aware of the drawbacks of simultaneous multithreading what would allow it to assign threads to logical processors more wisely. That would be even more important because the threads would spend more time on one logical processor.

In order to achieve all that the scheduler has to know the CPU topology and that’s what I am going to begin with. The kernel needs to be able to obtain information where each of the logical processors belongs in the three level hierarchy: socket, core, SMT. Additionally, there is also an important matter of cache topology. In multi-core systems some cache levels may be shared between the cores, knowing which logical processors share at least one cache level may decrease the performance penalty from migrating a thread to another logical processor.

Currently Haiku suffers from priority inversion, an issue that makes it possible for an application to freeze almost the whole system for an extended period of time. While I am not a big fan of priority inheritance, a solution that has been tried but was not committed, I do believe that much better way to solve this problem is to introduce dynamic priorities and to punish threads that despite having high priority are using CPU for a long time. That would mean the basics of our priority based scheduling remains virtually unchanged and the dynamic priorities affect only misbehaving threads in order to reduce CPU starvation.

Finally, there are power saving concerns. While it may seem that performance and power are conflicting goals the experience of people dealing with such things show that in case of scheduler it is not necessarily true and, definitely, there is much more to it than that. When trying to minimize energy usage the main goal is to put as many CPU cores in idle state as possible. Moreover, since there are several levels how deep CPU can enter the power saving state (so called C-states) it is usually better to make the core idle rarely but for a longer period of time then more often but for much shorter time. That’s why “complete all tasks as quickly as possible and go to sleep” seems to be a promising strategy.

In addition to that, even when CPU is not in the idle state, there is a way to control its energy consumption by frequency scaling (so called P-states). Depending on a current load the kernel may decide how much performance it needs on a particular core. While it may seem to be simple there are many other things to consider. The requested performance may be not what the system would get. In multi-core CPUs all active cores would run at the highest requested frequency. On the other hand, there is a range of frequencies that, due to limits of CPU energy consumption, is available only when not all cores are active. Being aware of these facts certainly would help the scheduler both in terms of performance and power saving.

Unfortunately, as I mentioned before, support for controlling processor C-states and P-states is incomplete and the system will not be able to fully take advantage of the scheduler power-awareness. However, even without proper power management drivers there should be some improvement and at some point in the future Haiku eventually would get such drivers.

Dealing with performance related issues may be sometimes tricky and intuition not always is right. That’s why I intend to verify my enhancements by running appropriate performance tests. My initial plans are based on the solutions used by the other operating systems and recommendations of the hardware vendors. Only changes that will turn out to be a real improvement will eventually made it to the main repository.

Despite all these enhancements I will try to keep the scheduler relatively simple. We do not need many features present in other operating systems. Things like process groups or user level fairness would only make the code more complicated and introduce overhead without giving any real benefits. Similarly, having many tunables will not really be necessary, the goal is to make scheduler ready to work without requiring each user to set any tunable what certainly most will not do anyway.