The beginning of 1992 must have
been a busier time period for pioneers of modern operating systems and other OS
theorists around the world. The mail thread titled “Linux is
obsolete” (1) spun off by the
famous Prof. Andrew Tanenbaum attracted the attention of not only the main
defendant, Linus Torvalds, but also of other renowned experts in the field at
the time.
To give a little bit of
background on the time period, it was a time when the now-ubiquitous CISC-based
x86 CPU standard had a decent presence. RISC-based CPUs were taking off (at
present, they are confined to mobile CPUs) and many people believed RISC is
going to win over CISC-based systems because of obvious advantages (some
theoretical, some practical). This bet, on which CPU-standard was going to
become ubiquitous, seemed to have made pioneering OS designers pick sides; some
on the x86 standard, some on the self-predicted transformation from CISC to
RISC-based CPU architectures.
This created a divided landscape
in the mindsets of OS designers at the time. Some needed to focus on catering
existing software, run them on different and (possibly) free alternative
platforms and on cheaper x86 machines. Others needed to forget the legacy
systems and aim for a clean, sophisticated future in OS design by considering
many theoretical aspects of OS design and the varying nature of CPU architectures.
Prof. Andrew S. Tanenbaum was a
strong supporter of the latter group, where he created the Minix microkernel,
mainly to demonstrate how an OS can be modularized so that each module of the
OS is isolated, simple, and hence easy to maintain and secure. In the meantime,
Linus Torvalds, came up with Linux, the free-and-open-source alternative to
UNIX, (both, which are implemented as monolithic kernels) that could run the
(still) wildly popular GNU software packages. Out of these two opponents, Linux
started gaining popularity among the technical crowd. Tanenbaum, preferring the
theoretical cleanliness for the future over practical use at present, pointed
out that the monolithic design of Linux was the wrong way to go into the future
and in fact, it was a step back into the dark ages of the past. This is the
point where the famous debate between Microkernels and Monolithic kernels got
started.
Who is right?
The answer to this age-old
question might not be that simple, or there might not be an answer at all.
Because operating systems are used for a wide range of purposes, designing the
one best OS to suit all the use cases might not be possible. Therefore, rather
than trying to provide a straight answer to this, it’s beneficial to understand
the pros and cons of both OS design approaches so the best design decisions can
be made depending on the situation.
Monolithic Kernels
In this approach, all the basic functions
the OS designer thinks that should go into the core OS to help run application software
are stuffed inside a single compiled module. To run the basic OS, only this
module has to be run as a process. This means all the code that runs in the OS
process lies in the same address space and any code within that process can
touch any memory address belonging to the same process. This approach would be
quite manageable if this was a simple application process. But since we are
talking about a whole operating system, there’s tons of complex logic (Process
scheduling, memory management, file system, device drivers etc…) residing on
the same process, grouped only by functions. The operations of the functions do
not have true isolation between them since they all execute on the same address
space. Hence a bug in a kernel routine can affect the memory maintained by
another routine and possibly crash the whole kernel.
In addition to this, one can
argue (as Tanenbaum have done) that because of the large chunk of code it
consists of, a monolithic kernel can be very hard to port to another CPU
platform. But of course this can be resolved (as Linus mentions) by limiting
CPU specific code only to core functions within the kernel, so the whole kernel
doesn’t need to be modified when porting.
Another curse of a large
interrelated code base is that there is no guarantee, that a modification to a
certain kernel routine might not introduce a bug in someplace else. These are
very hard to track because there is no isolation of access from one kernel
function to another within the same address space.
On the plus side, writing logic
inside a monolithic kernel is fairly easier because kernel module developers
can work with shared resources and there aren’t many abstract layers to go
through to achieve what you want to do. The truth is that when there aren’t too
many abstraction layers, the easier it gets to think about the direct
implementation details of the module you are working on. This also results in
un-harmed performance because all the module switches and task delegations
within the OS truly happen within the kernel itself without the need for any
context switches at the CPU level.
Microkernels
Microkernels take software design
principals very seriously and apply them in every part of the kernel
implementation. Each individually identifiable logical unit is implemented as a
separate binary/process. In this approach the actual kernel becomes very small
(hence, the name Microkernel) and very simple without so much logic, because
all the additional logic (the OS designer decides what logic is considered as
‘additional’ and what is considered ‘core’) is moved into their own processes.
Only the core features are kept in the microkernel. When the OS is executed,
only the microkernel runs in the kernel-mode. All the other supporting modules
run in the user-mode providing increased isolation and security. The supporting
modules may include a process manager, device drivers, file system and TCP/IP
stack.
Although this design makes the
kernel very secure and reliable, it is supposed to make the actual
implementation and maintenance easier too. But the problem is that the
separation of all the OS modules, calls for the need of communication
mechanisms so that they can interact with each other and provide the full
functionality of the OS to the application layer. Without such measures they
absolutely cannot talk to each other or the microkernel because all of them lie
in separate address spaces; hence no shared memory.
Every microkernel implementation
(Minix included) has such IPC (inter-process communication) mechanisms in
place. It is also true that any modern OS has IPC but they are mostly intended
for passing messages between application layer processes. This is where one of
the major gripes about microkernels arises. The fact is, the microkernel’s IPC
module has to manage all the interactions between OS modules themselves and
must do so in a way that would not affect the performance, security and the
simplicity of the microkernel.
One can argue if IPC is good for
application-layer processes, it should also be good for core OS components as
well. The issue is that all the components of the OS are trying to work
together as a single entity, to provide higher-level services to the
application layer, there are lot of interactions between all the modules within
the OS. The more the IPC is used for communication rather than shared memory,
more visible the performance impact is.
Furthermore, since the kernel
modules do not have direct access to any of the resources of the machine (or
any other kernel functions), the implementation of the module becomes complex because
of the need to think about IPC. Just as multi-threaded applications are harder
to design and implement than single-threaded applications, designing your
modules based on IPC is harder than using shared memory for communication. The
problem is specifically apparent when an OS module needs to access a large
portion of data which lies in the kernel memory (eg. An I/O read/write
operation, which is a very frequent operation). Since the external OS module
doesn’t have access to kernel memory, it has to inform the kernel (via IPC) to
copy over the data to its address space so that it can modify the data. After
it has completed the modification, it has to instruct the kernel again to copy
over the modified data to the kernel-space and send the data via the required
I/O channel. Of course optimizations can be made for such copy-operations but
then comes data coherence issues. Radical OS design approaches such as
Microsoft Singularity (2) overcomes this
problem by transferring the ownership of data between processes but that is
beyond the scope of this document. As Linus argues in his reply to (2) , this defeats the
goal of simplicity in microkernel designs. He quotes “All your algorithms
basically end up being distributed algorithms”. In short, the restrictions
imposed upon OS modules from the microkernel can be too much for the module to
easily function as it needs, so it has to rely upon complex mechanisms to do
its job.
Another area the microkernels try
to excel is system repairability. Because all the OS modules are isolated, if
one module crashes, it can be restored by a separate monitoring agent (in the
case of Minix 3, by the reincarnation server (3) ). This sounds good
in theory but might be not so in practice. If an OS module is in the middle of
something (eg. writing to an I/O channel) when it crashed, restoring the module
might not make the whole system stable again. Theoretically, the crashed module
is freshly running again and everything’s fine, but the I/O channel or some
other module which was interacting with the crashed module might have been left
at an unstable state. Hence, it might be practically impossible to maintain a
fully stable OS environment, no matter how much you separate components.
Which is better?
As mentioned above, both OS
design approaches have their own valid ideologies and pros and cons. Monolithic
kernel designs takes a practical approach on accomplishing the OS functionality
without introducing too much engineering complexity while sacrificing
reliability and future maintainability. Microkernels approach the problem from
a theoretical point of view where everything needs to be separated and isolated
for sake of reliability while sacrificing engineering simplicity (although
microkernels claim to be simple by design).
One can argue that either
approach may seem to be too extreme and stuck inside their respective
ideological aspects. Surely there should be a middle ground between these two
worlds. Some commercial systems have taken this middle ground in an effort to
leverage the best of both worlds by following a “hybrid” design.
Hybrid Kernels
Hybrid kernels try to balance-out
pros and cons of monolithic and microkernel designs by flexibly organizing the
OS components without sticking to one extreme design principal. In this
approach all the crucial OS modules which would affect performance if they
resided outside of the kernel, stay inside the kernel (rendering it a
monolithic kernel). On the other hand, modules which provide application
services are separated from the kernel and moved to user-space to increase
reliability, giving the design, properties similar to the microkernel approach.
This way OS is capable of giving priority to performance and reliability
separately where they are needed most.
Takeaways
If the current operating system
landscape was observed, one could see that there is no clear winner and it’s a
matter of personal preference of the OS designer and the target market of the
OS. Today, from the most popular operating systems in use, Unix and Linux
follows a monolithic design (4) while Windows NT,
Mac OSX and iOS follows a hybrid kernel approach (5) . Each OS might have
its own set of issues specifically because if its design, but these examples
stand as successful and widely used demonstrations of both approaches. However,
the lack of widespread success of a purely microkernel based OS might suggest
that the world is not embracing the difficulties that are coupled with extreme
reliability.
If you are an OS designer, which OS
design to choose from depends on your priorities, and interests of the target
audience. If you prefer a more stable, reliable OS with a beautiful design,
microkernel is the way to go. But you must be prepared to cope with all the
hassles that come with module isolation and IPC. If you prefer a balanced out
approach to get the job done, don’t want to over-engineer things and deliver
your product within a reasonable time period, monolithic or hybrid kernels
might be a good choice. But you must be prepared to face the future maintenance
problems and reliability issues that come with the lack of module separation.
References
1. Open Sources: Voices from the Open Source
Revolution. [Online] http://oreilly.com/catalog/opensources/book/appa.html.
2. Can We Make
Operating Systems Reliable and Secure? Tanenbaum, Andrew S., Herder,
Jorrit N. and Bos, Herbert. 5, s.l. : IEEE Computer Society Press
Los Alamitos, CA, USA, 2006, Computer, Vol. 39.
3. MINIX 3: a
highly reliable, self-repairing operating system. Herder, Jorrit N.,
et al., et al. New York : s.n. ACM SIGOPS Operating Systems Review.
4. Monolithic kernel.
Wikipedia. [Online] http://en.wikipedia.org/wiki/Monolithic_kernel.
5. Hybrid kernel. Wikipedia.
[Online] http://en.wikipedia.org/wiki/Hybrid_kernel.
Special thanks goes to +Kasun Hewage for helping me revise some aspects of the article.
.
Special thanks goes to +Kasun Hewage for helping me revise some aspects of the article.
.
No comments:
Post a Comment