Kubernetes: or how not to build software
— Travis AthougiesKubernetes is widely considered to be the industry standard way to deploy software in the cloud. This is incredibly unfortunate, since the software is bloated and unnecessarily complicated.
The problem can be summarized as this: Kubernetes does too much. It handles job scheduling, networking, storage, and more, and it does none of these particularly well. In fact, oftentimes, it doesn’t even do anything other than delegate work to so-called “plugins”, which are often pieces of software that could have been distributed completely independent of Kubernetes.
Often times these plugins are simply gRPC interfaces over pre-existing Linux functionality. Frequently it seems the ‘point’ of these plugins (and the entire plugin infrastructure to begin with) is to hide functionality formerly provided by a free software tool behind a nominally-open-source, proprietary black box. Somehow, we are to believe that slapping a gRPC interface over well-documented underlying tools means we’re sophisticated.
The myriad combinations of storage and networking plugins, controllers, and custom resource definitions make it near impossible for any computer program or any human to be sure of what exactly the state of the cluster is at any given time.
Supposedly, leaving it all up to the tiny programs called ‘controllers’ is just supposed to make everything work. Kubernetes explains that the principle behind this is that if every controller simply attempts to drive the cluster closer to its desired state, then it’ll get there eventually. Mathematically of course this is not true. It is well known in gradient descent that if you simply try to go in the direction of your desired state, you may end up there, or you may end up at a local minima that’s actually really not optimal at all.
I actually achieved such a state the other day when I installed an nginx-ingress-controller into my cluster. I made a mistake in the configuration and couldn’t get it to work, so I destroyed the controller. Unfortunately, the controller namespace cannot be deleted. There is some controller running somewhere that wants it. I have restarted Kubernetes (introducing downtime), applied all kinds of commands, etc, but it simply will not go away.
I am not alone either. Searching Google for “kubernetes stuck” one is met with innumerable stories of people whose deployments are simply stuck. No error message given, just a state that’s been declared, but is never going to be reached. It really begs the question… what is the point of declarative deployments if declaring your infrastructure doesn’t actually guarantee that the deployment ever takes place?
Think about it. Would we accept this behavior from any other program? Imagine if you fed in a script to your Python interpreter, and you didn’t know whether or not it was going to actually execute it fully, or get stuck doing something else. Or imagine a web browser that, when you typed in a URL, sometimes went there and sometimes didn’t. That’s the equivalent of deploying an application to Kubernetes.
Ultimately, Kubernetes problems come from attempting to do too much. Instead of a pluggable infrastructure of controllers that are not guaranteed to actually ever achieve any declared state, it would be better to have a small core of built-in controllers that could be composed in deterministic ways to satisfy the most common use cases. For more esoteric use cases, software vendors should write their own orchestration systems suited to the needs of whatever infrastructure they’re providing.
By trying to do it all, Kubernetes ends up providing software that does it all, but only sometimes. For the rest of the times, you’re stuck debugging things the old-fashioned way. For most other software, if it only ever did what we asked sometimes, it would never be relied upon for our precious infrastructure. However, due to the massive investments by large corporations, Kubernetes is being kept on life support. But no matter how much money they throw at it, they’ll never be able to fix the fact that Kubernetes is fundamentally broken.