The most widely deployed mobile virtualization solution
My Evoke white paper has triggered an unusual amount of comments, all to do with the brief discussion of fast context-switching and the graph reproduced here. This surprised me a bit, as all this material has been long published (and references are provided in the white paper). The context-switching issue was mentioned in the white paper as a (partial) explanation for the excellent performance of the Evoke, but is a side issue as far as the white paper is concerned, which is why I didn't dwell in it.
However, the unusually large number of comments and queries tells me that I should provide a bit more explanation. Here we go (although it can really all be found in the referenced publications too...)
The graph shows that context-switching costs in OK:Linux are significantly lower than in native Linux on an ARM9 platform, but the gap narrows as the number of processes exceeds 8, and the graph ends at 16. And I said that "the improvement is particularly large if the number of presently active processes is small—a typical situation for mobile phones."
First, I have to admit I was a bit sloppy when talking about "active processes", many people (not unreasonably) interpreted to refer to the total number of non-zombie processes in the system. I should have used the technically accurate formulation "process working set", but didn't want to get too technical. Clearly, you should never be sloppy...
What does the "process working set" mean? The working set is a technical term in operating systems referring to the subset of a particular resource that is in active use over a certain (small) window of time. It can be applied to main memory, cache memory, processes or others. In this case the "process working set" is the set of processes which execute on the CPU in a particular, small, time window.
How small is "small"? That depends (and I remember that as a student I was frustrated that the prof didn't provide a good answer to that question). Given the numbers we are talking about here (order of 10 processes, and order of 0.25ms overhead in native Linux) it makes sense to look at a few miliseconds. Given that the default time slice of Linux is 10ms, let's say we're looking at a window of a single time slice.
So, how big is the process working set in a typical (embedded) Linux system? Small. Remember, as long as a running process doesn't block (waiting on a resource, such as I/O or a lock) or is preempted (as a result of an interrupt) it will execute for its full time slice. The working set size in this case is one. On my Linux laptop there are at any time at least 200 processes, but almost all of those are blocked waiting for some event (such as standard input or a mouse click). The number of processes running during a time slice will be small, I'd say it's less than ten almost all the time. On a phone it's likely to be less than on my laptop. Phones are becoming more and more like laptops, but they really aren't doing as much as a typical laptop with dozens of windows and all those background processes our highly-bloated desktop environments are running.
So, I clearly stand by my claim, disputed by some, that the process working set on a phone is typically small, and normally much less than ten. Which implies that the context-switching overhead of OK:Linux is about an order of magnitude less than that of native Linux.
What if it does get bigger occasionally? The graph ended at 16, does that mean OK:Linux cannot have more than 16 processes in the working set?
Nope, there is no such limitation. OK:Linux supports as many processes as native Linux does, no matter how many of them form the working set.
Those of you who have read the FASS papers (referenced by the white paper) will know that our fast context switching is based on the use of a feature of the ARM9 MMU called "domains", and there are only 16 of them (and one is reserved for kernel use, so there are 15 available). So, what if we have more than 15 processes? Well, we do what any decent OS does if it runs out of a resource: it recycles. So, we use "domain preemption" to share a limited number of domains among a greater number of processes. That has a cost, but it's still better than not using domains at all, as the graph also shows: With 16 processes the latency is still only about 2/3 of that of native Linux. Once the process working set size gets really large, OK:Linux overheads end up a bit higher than those of native Linux. But I've never seen a mobile system with such a large process working set (remember, my busy laptop doesn't even get there, how would a phone?)
But, of course, you don't have to believe me, you can see for yourself. Check out the Evoke and its snappy UI, and tell me whether you've seen a phone with similar functionality, running on an ARM9 processor (even a dedicated one) that does better!
Posted by Gernot Heiser on July 27 at 11:51 AMblog comments powered by Disqus
About Gernot Heiser:
Gernot Heiser, Co-founder and Consulting Scientist, never thought he would be in the business world. Prior to NICTA's creation in 2003, Dr Heiser was a full-time faculty member at the University of New South Wales. However, this die-hard academic couldn’t pass up the opportunity to see the commercialization of this research. Gernot still loves teaching, almost as much as he loves good wine and good food. And anyone will tell you that Gernot knows his wine.