The most widely deployed mobile virtualization solution
Benchmarks fall into two categories: micro-benchmarks and macro-benchmarks. Micro-benchmarks measure very small and specific units of work, like how long does it take to execute a specific Linux system call, or how can a system switch between two processes. Macro-benchmarks measure more substantial and high-level units of work that are more closely aligned with real-world workloads. In the mobile space, one might, for example, look for macro-benchmarks that measure workloads indicative of consumer experience.
LMbench is a well-known micro-benchmark for the Linux OS. GTKPerf and multimedia encoding/decoding are good examples of macro-benchmarks.
So why is LMbench (and micro-benchmarks in general) evil? Well in truth, the problem is not so much with LMbench itself, but the fact that most people do not know how to correctly interpret such numbers. People like to take complex problems like quantifying the performance impact of virtualization on a system, and boil it down to a simple set of numbers. And there's nothing wrong with that! And LMbench happens to be the easiest way of obtaining a set of performance numbers for a Linux system. But just because it provides an easy way to nicely package a set of performance numbers for you, doesn't necessarily mean the numbers are all that useful or important!
To see why, let's take a closer look at LMbench. Here's some sample output, simplified for presentation:
Native Virtualized
Null syscall: 2.37us 5.24us
File Open: 25.4us 31.3us
Context Switch: 11.2us 6.43us
Pipe bandwidth: 30.7MB/s 24.3MB/s
Protection Fault: 12.4us 39.6us
Signal Handling: 11.2us 46.2us
Each 'sub-benchmark' in LMbench (e.g. Null syscall, File Open, ...) measures a very small and precise unit of work. Virtualization can have a significant impact on any one of these numbers, for better or worse. The impact of virtualization with OKL4 on LMbench results can vary from being 50x faster[1] to being, say, 5x slower. But it would be incorrect to interpret this as meaning that virtualization can impact real-world workloads, i.e. the consumer experience, from anywhere between a 50-fold performance increase to a 5x slowdown.
And this is exactly why LMbench results, by themselves, are useless. To confer real meaning to LMbench measurements, the results must be accompanied with an understanding of exactly what role each micro-workload (e.g. Null syscall, File Open, ...) plays in the workloads you really care about. In other words, if you care about, say, the multi-media experience, then you really need to understand which LMbench micro-workloads are invoked sufficiently frequently during a typical multi-media experience, that any overhead they have in a virtualized system, has a real, non-trivial impact on the performance of the workload you really care about. This is called profiling.
Profiling is a process where by one analyses, for example, a real-world, macro-level workload such as video streaming, and determines what system-level, micro-level units of work (e.g. context switching) make up a significant portion of that higher-level workload. And if one were to profile workloads relevant to the real world, consumer experience for a mobile handset, as Open Kernel Labs engineers regularly do, you'd find that only a small subset of the micro-workloads measured by LMbench are real influencers. And micro-workloads like null system call, protection fault and signal handling just don't have any weight in the real world.
So now that we've reached this point, I'd like to confess that LMbench is actually very useful. But as a tool for analyzing and understanding the performance of real-world, macro-level benchmarks. By profiling high-level benchmarks that are important to our customers, OK Engineers determine the micro-level workloads that are actually key influencers of the performance of those high-level benchmarks, and then use micro-benchmarks for optimizing the small and precise system workloads that actually matter.
At OK we are driven by providing a high-performance embedded hypervisor that performs well where it matters. This is why you'll consistently find OKL4 delivering low single-digit-percentile overheads for real-world benchmarks, and super-fast LMbench results for those micro-workloads that are most important and actually relevant to the Mobile experience (e.g. context switching).
[1] http://ertos.nicta.com.au/research/l4/performance.pml
Posted by Abi Nourai on January 30 at 09:22 AM
blog comments powered by DisqusAbout Abi Nourai:
Abi Nourai - Sales Director has recently relocated from Sydney to OK's Paris office, and even more recently, made another move to London. The journey was a little easier this time round. Abi is excited about using the OKL4 technology he helped develop as an undergraduate to solve business problems for the mobile and embedded spaces. Abi gets away from it all by indulging in fine cuisine, roaming through museums, and hopping around Europe's many great cities.