Enhancing the scheduler

Blog post by Paweł Dziepak on Sat, 2013-09-07 20:17

Soon I am going to work as a full-time Haiku developer on enhancing the scheduler. The goal is to improve performance of the whole system and finally deal with some long standing problems. To achieve this CPU affinity will be introduced what would make cache utilization better and I will implement scheduler strategies based on dynamic priorities what, hopefully, would once and for all deal with priority inversion. In addition to that, I want to make scheduler more power-aware. Haiku currently lacks low-level support for some of the more advanced power related features of CPUs but having scheduler ready for would save us from redesigning it later. Also, there are still ways to conserve energy without using the most recent technologies.

My first goal is to complete what have been stared by implementing scheduler_affine. Having scheduler that is able to improve cache utilization by trying not to change the CPU thread is executing on may improve system performance. Since the threads are going to be assigned to a certain logical processors appropriate load balancing will have to be introduced. Also, scheduler will need to be aware of the drawbacks of simultaneous multithreading what would allow it to assign threads to logical processors more wisely. That would be even more important because the threads would spend more time on one logical processor.

In order to achieve all that the scheduler has to know the CPU topology and that's what I am going to begin with. The kernel needs to be able to obtain information where each of the logical processors belongs in the three level hierarchy: socket, core, SMT. Additionally, there is also an important matter of cache topology. In multi-core systems some cache levels may be shared between the cores, knowing which logical processors share at least one cache level may decrease the performance penalty from migrating a thread to another logical processor.

Currently Haiku suffers from priority inversion, an issue that makes it possible for an application to freeze almost the whole system for an extended period of time. While I am not a big fan of priority inheritance, a solution that has been tried but was not committed, I do believe that much better way to solve this problem is to introduce dynamic priorities and to punish threads that despite having high priority are using CPU for a long time. That would mean the basics of our priority based scheduling remains virtually unchanged and the dynamic priorities affect only misbehaving threads in order to reduce CPU starvation.

Finally, there are power saving concerns. While it may seem that performance and power are conflicting goals the experience of people dealing with such things show that in case of scheduler it is not necessarily true and, definitely, there is much more to it than that. When trying to minimize energy usage the main goal is to put as many CPU cores in idle state as possible. Moreover, since there are several levels how deep CPU can enter the power saving state (so called C-states) it is usually better to make the core idle rarely but for a longer period of time then more often but for much shorter time. That's why "complete all tasks as quickly as possible and go to sleep" seems to be a promising strategy.

In addition to that, even when CPU is not in the idle state, there is a way to control its energy consumption by frequency scaling (so called P-states). Depending on a current load the kernel may decide how much performance it needs on a particular core. While it may seem to be simple there are many other things to consider. The requested performance may be not what the system would get. In multi-core CPUs all active cores would run at the highest requested frequency. On the other hand, there is a range of frequencies that, due to limits of CPU energy consumption, is available only when not all cores are active. Being aware of these facts certainly would help the scheduler both in terms of performance and power saving.

Unfortunately, as I mentioned before, support for controlling processor C-states and P-states is incomplete and the system will not be able to fully take advantage of the scheduler power-awareness. However, even without proper power management drivers there should be some improvement and at some point in the future Haiku eventually would get such drivers.

Dealing with performance related issues may be sometimes tricky and intuition not always is right. That's why I intend to verify my enhancements by running appropriate performance tests. My initial plans are based on the solutions used by the other operating systems and recommendations of the hardware vendors. Only changes that will turn out to be a real improvement will eventually made it to the main repository.

Despite all these enhancements I will try to keep the scheduler relatively simple. We do not need many features present in other operating systems. Things like process groups or user level fairness would only make the code more complicated and introduce overhead without giving any real benefits. Similarly, having many tunables will not really be necessary, the goal is to make scheduler ready to work without requiring each user to set any tunable what certainly most will not do anyway.

Comments

Re: Enhancing the scheduler

Sounds great, Paweł! While all Haiku users will benefit from your scheduler improvements, are the power saving features you're able to implement limited to certain processors? Do you already have an idea which will profit (most)?

Thanks for your commitment! I hope the donations will be sufficient for at least a second contracting month.

Regards,
Humdinger

Re: Enhancing the scheduler

Techniques like small task packing does not require any hardware support and everyone would be able to benefit from it (assuming that it actually will reduce power usage what is yet to be seen).

However, the features that we can be sure are able to improve energy consumption: C states and P states require CPU support. It appears that the ability to set CPU C state was introduced together with SSE3 extensions (i.e. 2004) while P states seems to be a bit newer technology. That's about the programming interface, the actual possibilities of CPU are still being improved by the CPU vendors with each new microarchitecture.

I would also like to note that the management of the processor states is not exactly the scheduler responsibility, basically it just needs to be aware about how power saving technologies in current CPUs work and use that knowledge to attempt to reduce power usage. Choosing appropriate C state or P state for, respectively, idle and active logical processors is the task for the separate driver.

Re: Enhancing the scheduler

Very interesting.

Here is some additional info about power management:

In the CPU idle GSoC it was discovered that there seems to be two major components that wakes the CPU regularly, Deskbar and networking.

Networking has a timer task running at 1000Hz that is used for checking if cables are connected, bandwidth throttling, scanning for WIFI and such. (FreeBSD uses 1000Hz as well but 100Hz on ARM). I think we could probably lower that or even make it tickless (probably a lot of work). Perhaps even a timer that is not very accurate could be introduced that tries to avoid waking the CPU.

Deskbar IIRC used some polling of the filesystem or such that kept on waking the processor up. That would probably not be to hard to find and fix.

Re: Enhancing the scheduler

Introducing timer coalescing and attempting to make Haiku tickless, while certainly out of scope of "scheduler and related" work, undoubtedly is something that should be right after scheduler power-awareness on a TODO list.

Re: Enhancing the scheduler

I agree, just wanted you to be aware of it, especially when you try to measure power savings.

Re: Enhancing the scheduler

Great to hear that the scheduler will be worked on! :)

Currently Haiku suffers from priority inversion, an issue that makes it possible for an application to freeze almost the whole system for an extended period of time.

Currently I'm working again towards a solution of these 'freezes'. While priority inversion is a problem and dynamic priorities could bring relief, it is also a symptom of an underlying issue: lock-contention on some kernel resources which are currently protected by simple coarse-grained locking. I'm currently prototyping and comparing several strategies to relieve/remove the contention as such.

Of course, a general solution to counter priority inversion from misbehaving threads is a good idea. For the "freezing" problem I hope though to cure it by removing the "misbehaviour" itself in some critical places.

As for priority inheritance: the full and complete priority inhertinance protocol of the patch I wrote is a bit too much weight on performance indeed... the code in there always figures out exactly the optimal next priority. Still wondering whether a simplified, less exact algorithm could be a nice solution as well. I'm curious to hear your reasons of prefering the dynamic priorities over inheritance, care to elaborate on that? :)

Re: Enhancing the scheduler

The real cause of bugs like #8007 is CPU starvation which, together with priority inversion, results in occasional freezes. While techniques like priority inheritance or random boosting deal quite effectively with the priority inversion the lower priority threads that don't hold any locks may still be starved by "misbehaving" high priority thread. That's why I think dynamic priorities are the best solution here since, hopefully, it will solve both problems: CPU starvation and priority inversion.

Obviously, reducing lock-contention, fine-grained locking and perhaps even introducing some lock-free or wait-free strategies (that may be a bit tricky though) are also very important and none of the scheduler enhancement would be a able replace them.

Re: Enhancing the scheduler

Two developer working on #8007, i must be dreaming!

Re: Enhancing the scheduler

Which algorithm is based the haiku scheduler?

Re: Enhancing the scheduler

Currently it is basically a simple round-robin with a single runqueue shared among all logical processors with O(n) lookup and insertion. Randomly higher priority threads are skipped in favor of lower priority threads in order to prevent starvation, but #8007 shows that it doesn't work as expected.

Re: Enhancing the scheduler

Since this was one of the parts of BeOS that made it super-responsive to the user, i hope the goal is the same for Haiku. The user should ALWAYS be highest priority. i hope i don't have to explain what that means.

Re: Enhancing the scheduler

Siarzhuk has already showed me these notes. In essence, the idea described there is an extension of what Haiku currently does, i.e. randomly skipping higher priority threads. Since it does it more aggressively lower priority threads get more CPU time. However, I highly doubt that skipping real time threads is a good idea. Instead of designing algorithm inspired by lottery scheduling, I do believe that it would be much better to generally honor statically given priorities and adjust them only when necessary.

Re: Enhancing the scheduler

I thought someone would post a link to my article here :)

I had no enough brain power back then to actually introduce something worthy, glad that at least as a result Axel solved some minor problems with thread priorities which improved system responsiveness and stability a bit.

Well, I actually had one improvement over haiku's implementation (same result but less processing power wasted inside of scheduler) but I doubt if it could be applied to it anymore since developers decided to make significant changes.

Re: Enhancing the scheduler

I'd love to read that - do you have an English translation of it? Unfortunately, Google Translate mangles the formatting very badly.

Re: Enhancing the scheduler

Maybe I'll do a translation later. Also I have some code that I haven't published but it's on my old PATA hard drive and my PC has only SATA controllers now :P

Re: Enhancing the scheduler

I know that feeling ;)

There are plenty of SATA<->PATA adapters out there for cheap (less than $15 USD) - hopefully you can find and use something like that :D

Re: Enhancing the scheduler

Or.. USB <-> PATA ... those are pretty handy and it proabbly wouldn't be a good idea to trust an old PATA drive with data on an ongoing basis either.