yongcong's blog

cpuidle: final report

Blog post by yongcong on Mon, 2012-08-27 12:18

the last quarter term is mostly spent on acpi cpuidle driver implementation. I also spent about 2 days to adjust the cpuidle framework so that the cpuidle generic module is loaded by lowlevel cpuidle driver, while the later will be loaded during boot up either by bus enumeration or calling get_module() manually. I also tested the power after acpi cpuidle driver is finished. The result is as good as intel native one. Since all the goals defined in my proposal are achieved, so the project is successfully finished.

Here I want to summarize the status:
generic cpuidle
generic cpuidle module is implemented which can be used on all cpu architectures;
x86 cpuidle driver
On x86 platform, we support intel native cpuidle driver and acpi cpuidle driver. The previous one make fully use of intel mwait extension support on intel newer cpus such snb, ivb or latter, since it won't touch the complex acpi part, it's preferred if the HW supports. The acpi driver is our last choice.
power saving number
On my T420 laptop, it saves about ~2.5watt
main limitation
we don't support old platform which doesn't have ARAT(always running apic timer). To support such platform, we need to enhance haiku's timer subsystem

I'd like to continue all the related power saving work in haiku such as to find and remove all the unnecessary wakeup source, to enchance haiku's timer subsystem mentioned above. I may also work on some acpi related driver, the acpi is complex but I learned a lot about acpi during this summer ;)

Finally I'd like to thank tqh for his guidance, carefully code review and kind help about ACPI related topics, thank haiku for giving me the chance to code for such a clean and beautiful operating system. I also owe a great deal to many haiku experts for their immediate answers on IRC or great suggestions on mail-list. The last thank belong to Google :D

cpuidle: three quarter term report

Blog post by yongcong on Sun, 2012-08-05 11:34

I began to implement the acpi cpuidle driver so that the power saving feature can benefit all x86 platforms(In theory although). The acpi is more complicated than I thought. The main time is spent on implementing "_CST" evaluation and decoding.

First of all, to evaluate any acpi object/method needs acpi handle. Since haiku doesn't export AcpiWalkSpace method of acpica, so after system booting, I can get the acpi processor handle. the only solution is using the device manager so that the acpi cpuidle driver can be loaded during boot. This requires cpuidle modification so that generic cpuidle module is loaded by low level idle driver. The modification is not done because it's simple and I want to get "_CST" evaluation done firstly.

To evaluate "_CST", we need to evaluate "_PDC" or "_OSC" firstly. While "_OSC" is preferred than "_PDC" from ACPI 3.0. So my implementation evaluate "_OSC", fallback to "_PDC" if failed.

The "_CST" decoding is simpler than "_CST" evaluation. Some code is ready during gsoc bonding period. I just make it finished.

Then I came across one big problem which block me for one week--under haiku, the acpi processor is enumerated in "cpu8=>cpu7...=>cpu1" rather than "cpu1=>cpu2=>...>cpu8". The later order is taken by Linux, FreeBSD, NetBSD, etc. To evaluate "_CST" of cpu8 needs "_CST" of CPU1, so haiku's enumeration make the acpica reports AE_NOT_FOUND. But I dunno such requirement before, I even dig into acpica code but found nothing. After frustrated for about one week, I realized that the problem may exists in the evaluation order so I dumped the according processor id every acpi processor and found the reason. In no more than ten minutes, I implemented one workaround and it works!! Here is the CSTATES information dumped from my laptop

KERN: evaluate _CST @0x821e7698
KERN: cpuidle found 2 cstates
KERN: c1
KERN: Latency: 1
KERN: power: 1000
KERN: bufer length 17
KERN: cpu_reg size: 15
KERN: FFH method
KERN: c3
KERN: Latency: 104
KERN: power: 350
KERN: bufer length 17
KERN: cpu_reg size: 15
KERN: IO method

NOTE: Here, Cx is acpi reported Cx rather than OS's cx. For example, acpi's C2 is missing if AC is plugged. But Linux take ACPI C3 as the OS C2. I would like the take similar solution to make the generic cpuidle module happy

cpuidle: midterm report

Blog post by yongcong on Wed, 2012-07-11 13:04

With the good preparation in quarter term/bonding period, I have completed the generic cpuidle kernel module, native intel cpuidle module and cpuidle driver(for states/info reporting). By original plan, these tasks will be all completed by the end of 3/4 term...

cpuidle: quarter term report

Blog post by yongcong on Fri, 2012-06-22 15:42

I completed my 1/4 goal ahead of proposal schedule. By the original plan, I should investigate whether we need necessary changes to x86 idle routine and x86 architecture specific instruction usage. The results were reported to gsoc maillist on 3rd Jun. Here are the copied results:
1. no need to change x86 idle routine
2. monitor/mwait works perfectly. I have measured the power consumption when using idle implemented with "monitor/mwait", it's the same as the version implemented with "hlt".

cpuidle: GSoC community bonding report

Blog post by yongcong on Sun, 2012-05-27 13:38

As we all know, cpuidle can't save any power if cpu is wakeup frequently during idle-- cpu doesn't have chance to go to deep sleep. So to get power savings, besides cpuidle support, we must remove those unnecessary wakeups.

During the bonding period, I added some code to dump system timer wakeup events and found the cpu wakeup during idle is too high, ~550 wakeup/s. Then with the help of KDL, I found one obvious wakeup source -- the scheduler's quantumTimer. But I can't understand its duty. Then with the help of Axel, I catch its functionality and meaning. Axel also gave one suggestion:disable the timer for idle thread. So the only one patch during my bonding period was submitted and latter was merged by my mentor tqh. By my test, this patch removes ~41% unnecessary wakeups during idle.

My wakeup investigation goes further after then. I found another unnecessary wakeups source: intel e1000e ipro1000 driver. The freebsd network driver glue layer timer function: hardClock() is triggered too much(it sits in src/libs/compat/freebsd_network/clock.c). I can fix it but I think to remove all unnecessary wakeups can't be achieved in this summer and I should focus on my project. So I just decide to just remove the ipro1000 when testing.

Then with the help of mmu_man, I learned to quit process using roster. mmu_man also suggested kill Tracker and Deskbar because they are still using poll mode. Later I modified my bootscript to don't launch those services. Finally I succeeded to decrease the wakeup during idle from ~550 wakeup/s to ~30wakeup/s. It's enough for cpuidle testing.

I also read one excellent document about how to write power efficient software from intel:http://download.intel.com/technology/pdf/Green_Hill_Software.pdf.

The first goal(by the end of Jun11) should be x86 architecture specific code modifications. the first is to check whether we need to make arch_cpu_idle as a function pointer; the second is to add moniter/mwait instruction support. Because we need to use mointer/mwait extension to enter intel processors native deep states and it's simpler than ACPI solution.

gsoc2012 cpuidle project introduction

Blog post by yongcong on Fri, 2012-04-27 16:04

My gsoc2012 project is adding cpuidle support to haiku. As we all know, transistor power consumption is composed of dynamic and static ones. The former is due to charge/discharge of capacitance and other switching activity; the later is due to leakage and bias current. In the following section, I'd like to simply abstract power saving technology in nowadays cpu; powering saving technology in nowadays OS; what's missing in haiku, IOW the reason why I want to work on it.

Power saving technology in CPU
Dynamic Frequency and Voltage Scaling
Change core frequency and the corresponding supply voltage on the fly. This technology can reduce both reduce the dynamic and static power consumption.

clock gating
stop clock distribution. This technology can reduce dynamic power consumption

power gating
With the improvement of manufacturing process, transistor feature size reduced significantly. Smaller transistors consumes less dynamic power due to lower voltage and gate capacitance. On the other hand, it consumes more static power due to leakage current rises s exponentially. One manufacturing technology -- the so called power gating can almost really cut off the transistors, so the leakage current and voltage is nearly zero. This technology can reduce static power consumption.

DFVS is used in the so called p-state. Clock gating and power gating is used in the so called C-states idle.

OS technology
Basically, two components are used: macro(suspend to ram or disk) and micro. The micro component consists of two components too:
cpufreq makes use of the DFVS technology in cpu
cpuidle makes use of the clock gating and power gating technology in cpu.

what's Haiku missing
Haiku can support pstate to some extent. However, since the DFVS is taken much less use than clock gating and power gating, and the leakage current is more obvious, the cpuidle is more and more important in nowadays OS.

What I will do
In my gsoc project, I'd like to implement the general cpuidle support and specific driver: intel idle driver for intel newer CPUs such as i3,i5,i7 etc. and acpi driver. Also one userspace tool is implemented to tell us the idle stages' statistics. I believe my project will benefit haiku in power efficiency and laptops' battery life.

During the bonding period, I'd like to dig into acpi spec and try to get the c-states information from ACPI _CST table and read documents about power saving from intel and AMD.

Last but not least, I'm glad work on this project under haiku's mentors guide.I can't wait to benchmark the power saving results after cpuidle is enabled on haiku;)

Syndicate content