In this series of posts I want to give a small introduction into Kubernetes. I am fairly new to Kubernetes and the first platform I designed just went into production and I found most of the introductory literature not so helpful in this process. My memory is still fresh on the things I struggled with and I am still breaking colleagues in, so this is actually a helpful article and not the same buzzwordy "Kubernetes is a self healing something something" that does not explain a thing. Especially if you are not buying your cloud, but are running it yourself, this article series can be seen as a high level walk through to building a platform that actually can run in production.
Many people still associate docker as the fundamental concept of Kubernetes. Even though docker is a containerization technology, it does many things in its own weird ways, does some things just in ways that are bad and is more like a deadend from a time when we were still figuring it all out. In some sense, docker is to Kubernetes, what Basic dialects are to modern programming langauges. They use the same fundamental technologies, and even though modern Basic looks almost nothing like old basic, it still carries the burdens of the circumstances of its inception and no one would base a new project on Basic.
As one can expect from the analogy, this point of view makes many concepts very nebulous and hard to understand, and is misleading in many instances. In particularly, it is neither a bottom-up nor a top-down approach, but more of an inaccurate historical cargo-cult report of how people perceived things to have come to fruition. However, almost all Kubernetes introductions I encountered started with docker, and often never came to the more important parts.
A purely mechanistical point of view is probably also not very helpful as an entrypoint. When looking at an existing cluster without prior knowledge, the number of services and components is overwhelming: API-servers, etcd, kubelet, ingress controllers, schedulers, everything seems needlessly complex. However, when you can tell yourself the story how everything works together, it makes sense and is easier to grasp. In this post I try to tell this story, bridging the gap between the mechanistic understanding and the pragmatic standpoint of an administrator.
A brief look at Wikipedia does and does not answer this question:
Kubernetes (commonly stylized as k8s) is an open-source container-orchestration system for automating computer application deployment, scaling, and management.
If you know what a container is, this does answer the question. If you think you know what a container is, this probably also answers the question. If you do not know what a container is, let me help you by rewriting the paragraph and removing some fluff:
Kubernetes is an open-source process-orchestration system for automating computer application deployment, scaling and management.
This is already a bit better. But also there are little lies hidden. I am not a marketeer, so let us be more honest:
Kubernetes is an open-source process-orchestration system for facilitating computer application deployment and management.
If you look at it, this boils down to the following.
Kubernetes is an init system managing linux processes.
This is kinda underwhelming, isn't it? Well it is, and I think one should highlight two tiny features:
Kubernetes is an automatizable init system that is able to manage linux process across multiple computers.
This is really where the beauty lies. Kubernetes can (but does not have to) manage multiple computers and moreover, it is designed in a way that allows for declarative automation, by exposing everything via a nice and overwhelming API.
You would expect an init system take care of the following things:
- make sure necessary daemons for logging, networking and distributed storage are running
- start and kill processes
- handle dependencies of processes
- have configuration and startup scripts persisted in easily manageable
configuration files, e.g. in
- reduce workload of system operators
With these expectations in mind, one now can go forth and think of a minimal way of accomplishing this task. Some things are evidently necessary:
- a way to interact with the cluster as a user and administrator
- distribution of config files
- distribution of workload
These things are really the bare minimum one expects. However, there are additional things you probably need in a cluster of computers to make them useful:
- networking between processes
- file systems
- networking with the outside world
- logging and monitoring
In this article I will concentrate on the first four points. First I want to discuss the obvious necessities to distribute jobs on a cluster of computers and their implementations in kubernetes. Then we will deal with the way we talk to the cluster and maybe justify a view of the choices that have been made, e.g. mostly why YAML is used via an HTTP API. Networking in the cluster will be handwaved, as this is already a point where you have to make choices, namely the selection of a container network plugin. Most tutorials recommend flannel here, but I think calico is a better first choice. Then we will put containerization into context and well that pretty much finishes up this post.
All components are for themselves not really that interesting, but it is very helpful to remind yourself why they are there.
First we need some service we as system operators can talk to, so we can tell the cluster which workloads to run. This is accomplished by the API-Server.
As we are expecting things to change over time, we need something like a control loop, which is what the kube-controller does.
There needs to be a process that manages the scheduling of processes and well, you can guess it, this is the scheduler's job.
Then we need to store configurations somewhere, which is usually done in the aptly named etcd.
Finally on each node there is a kubelet service running, which schedules the local jobs and containers and is the endpoint for the cluster to talk to.
Now this already finishes up the core of the cluster, the so called control
On a cluster installed with kubeadm you can see these components via
kubectl get pods -n kube-system -l tier=control-plane.
Networking inside the cluster, i.e. between pods and the control plane, is simple and is not simple. The simple part is the general assumption that you are allowed to do the networking any way you like, as long as you do not break the two core rules:
- every pod has its own IP address
- every pod can reach any other pod via these IP adresses
Of course the second rule is already problematic for many, so there has to be some quantifier like "unless prevented by some in cluster service" or something like that. But for now we can keep it at that.
Also you are free to accomplish this any way you want. There is a myriad of pod networks to choose from and well, as networking is complicated, these can also be complicated.
I had some bad latency issues with flannel, but I did not dig deeper into it, so this might have been a PEBKAC issue and not an issue with flannel. I like and can recommend calico, as it seems to be the easiest to understand while still providing acceptable performance. In this post series, all my networking is written from the perspective of someone running calico, but I try to be explicit about it when it seems to matter. Also I do not know much about networking, so it will not become more detailed than the following section.
With calico on each node there is some felix pod running, which then handles
the necessary things on the host.
Each pod is assigned an IP from a private subnet, e.g. 192.168.0.0/16.
Which IP this will be depends on the host the pod is going to run on, as each
host is assigend a smaller subnet of this subnet.
On the host itself, this address is bound to a virtual interface with a funny
name like caliXXXXXXXXXXX and some
route things happen so
this IP can talk to the outside world via the default route.
iptables things happen so that this IP can be reached on this
host from the outside.
On the other hosts, calico creates routes to the subnets via the host which
are assigned to this subnet.
Again you can inspect these rules with the
In particular, if you are logged in on a host and run calico, you should be
able to reach all pods running in the cluster via their pod-IP, which is a
nice thing to know and sometimes helps debugging things.
Talking with the kubernetes cluster means talking with the API server(s).
For endusers this usually means
The way the API is structured is probably a topic in itself worth to be
discussed, but I want to give a short rationale why we have to deal with YAML
files and clients and can not just ssh into things and run commands.
The main reason for this is that we want to organize different kind of
Services for networking, programs running, config files to be distributed,
all these things want to be managed.
But what is a common abstraction of a call like
ls -al, a config file
residing in some folder and the task to run five apaches distributes over
nodes with none of them being collocated?
Well, one can put everything in the wording of objects. A command object, a
file object, and the apaches distributed over several nodes is then a clever
object consisting of command objects, file objects and metadata making sure
nothing runs on the same server.
Now we have the problem to communicate objects between services; this is usually done via object serialization. One of the myriad of object serialization formats is YAML, which has also become a superset of JSON and has two very important benefits for our use case: The first is, it is human readable due to its significant whitespace structure. Of course this makes it not necessarily easily debuggable, but this is another story. The second benefit is that it plays along with line based diff and hence fits very well into version control. And well, combined with popularity and these two factors, YAML has become the format of choice.
The API server itself is just a HTTP server you (and the cluster) can talk to via the exposed REST-API. This has two benefits: One, REST-APIs are well known, well supported and easy to understand. Second, HTTPS allows for authentication via certificates. The only downside to this is that at the moment seems to be, that there is no support passphrase protected certificates as in the case of ssh keys or personal SSL certificates.
Now on the lowest level of interaction, we usually do something like the well
kubectl apply -f some.yml.
This sends the content of the YAML file via HTTPS (authenticated by your
certificate, e.g. in
The API server then validates your inputs and if successful, and depending on
the tasks, talks to the scheduler and etcd to do their things.
To this point, the cluster is for most workloads rather dysfunctional:
Your pods can mount local file paths, communicate with each other, but not
with the outside world.
At least without a lot of manual intervention, which we do not want.
If you only run computational workloads, this might actually be sufficient.
You submit jobs, they run, work on the local drives and when they are done you
collect logs with
kubectl and data via some means on the bare metal.
This is is not yet an actual improvement for operators, so we have to
introduce more things.
So in upcoming posts we will then dive deeper into ingress networking (how do I make services available to the outside world), storage (how do I manage volume mounts for pods that might spawn on any computer in the cluster), and of course monitoring (what is happening in the cluster?).