Upgrading a kubeadm kubernetes cluster is not nearly as bad as it might seem in the first place, as it is all very well documented, but in the sea of documentation you always feel like missing something. In this post I summarize the most important links and the things I do, so that I have it all in one place.
First of all, the official docs are pretty straight forward and you should read them; they differ a bit between verisons and for older versions you might need to look into website git repo.
Prerequisites
Deprecated API versions
To upgrade a cluster you first need to make sure that no APIs are used which have been deprecated and will be removed with the new version, lest you want to have workloads fail (which is totally fine if you have customers who had plenty of notice to update their manifests).
In general for this you simply check the api deprecation guide, which tells you most of the relevant things.
I have not yet figured out a reliable way to just automtacially run a single
script and find everything.
Actually kubent should do exactly this, but I have seen deprecated
resources which kubent
did not show me (probably PEBKAC), so I still
manually check some things manually with kubectl
.
Deprecations up to version 1.19 are pretty rare and do not require much automation though.
Kubernetes 1.19 also introduced an API server metric,
apiserver_requested_deprecated_apis
which can be used to see which
deprecated API objects are still being requested and also tells you
the actual version when this object will be removed, e.g.:
apiserver_requested_deprecated_apis{endpoint="https", group="extensions", instance="10.118.0.22:6443", job="apiserver", namespace="default", removed_release="1.22", resource="ingresses", service="kubernetes", version="v1beta1"}
Given this output I then can just search for the corresponding resources,
which is given by the resource
label, so in the above example to see the
used api versions for any ingress I use the following command:
kubectl get ing -A -o custom-columns="Namespace:.metadata.namespace, Name:.metadata.name, apiVersion:.apiVersion
The next kubeadm and kubelet packages
For k8s it is totally fine to skip minor versions, but you should not skip major versions, i.e. if you want to upgrade from 1.16.6 to 1.19.15, you upgrade to 1.17.x and 1.18.y, before upgrading to 1.19.15. 1
This means you will need the packages for kubeadm
(actually I think kubeadm
you can get away with the latest) and kubelet
of the respective versions.
Which might be a PITA depending on the distribution2 of your
nodes, and it's better to have them in place before starting to upgrade the
nodes (esp. in heterogeneous clusters) than to have a panic attack after your
first nodes upgraded and you can not upgrade the remaining nodes.
Performing the upgrade
Setting up some watchers
Several things can happen during the upgrade you probably want to know about.
Apart from monitoring (which, if running in the same cluster, could be
unavailable if things go wrong), I have a watch
on a few things I'd like to
see healthy during all operations, e.g. my ceph
status.
I think during the upgrades kube-proxy
will restart (not sure if this is a
must, but it can happen), which might take a while under heavy loads and lead
to DB clusters resynchronizing, which is something I would wait for before
progressing.
Finally on some small corner I have a
watch "kubectl get nodes | grep -v <new_version> && kubectl get nodes | grep NotReady"
to have a good overview on what is still left to do and to see things if
things break horribly.
The control plane
The first control plane member
On the first control plane member, we first only upgrade kubeadm
.
Performing kubeadm upgrade plan
or kubeadm upgrade plan <targetversion
(usually1 not needed).
This will also once again remember you to check things like the in-cluster kubeadm config map, which you should check.
If everything noted is as it should be, next double check your etcd
are
healthy, e.g. via
kubectl exec -it -n kube-system your-control-plane-node -- etcdctl --cert=/etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://localhost:2379 member list
(the syntax might differ a bit between etcdctl versions).
Drain this node, then run kubeadm upgrade <version>
.
After this, you can rune kubeadm upgrade node
and then upgrade kubelet
via
your package manager or whatever you use.
Uncordon the node and wait for everything to come back up healthy and then proceed with
The remaining control plane members
The main thing with control plane members is that before and after each node you want to be really sure things are healthy.
So, for each controle plane member
- check if etcd and everything is healthy
- drain the node
- run
kubeadm upgrade node
on it - update the
kubelet
package to the correct version - uncordon node, wait for everything to be healthy again
The worker nodes
If you use image based deployments, just drain and replace your old workers
with new images.
Otherwise, you can just repeat the same steps as above for the control plane
members, sans checking the control plane services, but possibly you will need
to take care of other services like ingress controllers.
It is recommended that you drain every node before the upgrade, but so far it
worked for me to just run the kubeadm upgrade node
followed by the kubelet
upgrade.
YMMV, so the recommended way would be for each node:
- check application health
- drain the node
kubeadm upgrade node
- update
kubelet
- uncordon node
Finishing up
In general there is not much to do afterwards and everything should work as expected. It's a good idea to check all dashboards for missing data entries as metrics can change.