Sebastian Wöhrl

Estimated reading time: 11 minutes


Developing operators for kubernetes

Kubernetes operators have become not only a hot topic in the cloud-native community but have proven to be immensely useful in hybrid-cloud multi-tenant/team scenarios that we often encounter in IoT projects for our customers. This post looks at how operators support a DevOps approach through automation, how they enable hybrid cloud scenarios through unified APIs, and whether Go, Rust, or Python are better suited for Kubernetes operator development.


Using operators for automation

The Kubernetes Operator Pattern is a way to extend the functionality and API of Kubernetes with custom resources which are user-defined custom types. Thanks to operators Kubernetes users and admins are able to deploy and manage complex – often distributed – systems like message brokers or database clusters. From an abstract perspective an operator contains the knowledge and procedures of experienced human operators/administrators in code. This code automates operations tasks and reduces the skill-level, experience and time needed to run and manage such a system.

Example: OpenSearch Operator

As a concrete example lets look at the kubernetes operator for OpenSearch for which MaibornWolff is a maintainer and one of the major contributors. OpenSearch is an Apache-licensed fork of Elasticsearch backed by companies like AWS. In simple terms the operator takes care of deploying an OpenSearch cluster with its nodes along with an OpenSearch Dashboards instance (the OpenSearch fork of Kibana).

To do so, it follows the following standard pattern: The operator installs a custom resource definition (CRD) that provides a custom resource called OpenSearchCluster. Users can communicate with the operator by adding objects of that kind to kubernetes using their normal deployment workflows. This could be a simple kubectl apply or high-level tools like Helm or FluxCD. Based on the options provided in these objects the operator will then deploy and manage OpenSearch clusters.

A simple deployment could als have been done using a helm chart (and in fact there is an official chart for it), but the operator takes care of a lot more:

  • You want to secure cluster communication with TLS but don’t want to generate your own certificates? Great. The operator can do that for you if you just add an option to the custom object.
  • You want to configure your own users and roles for OpenSearch, which in OpenSearch is done via the securityconfig of the opensearch-security plugin? Just put it into a standard configmap, add its name to the custom object and the operator will handle the rest and also react whenever you change the securityconfig in the configmap.
  • Do you want to scale down the cluster and remove nodes from it without loosing data or sleep? Just change the configured number of nodes in the custom object. The operator will scale down your cluster node-by-node and will relocate data beforehand by draining each affected node.
  • Want to upgrade the cluster to a new OpenSearch version or change the CPU or memory of the nodes? The operator will make the changes restarting one node at at time and will wait after each node for the cluster to stabilize / fully replicate again and show green health.

Of course, all these tasks can be done either manually or be automated in some other way, e.g. using scripts written by the user/admin. But that requires a lot of time, be it for actually carrying out the tasks or writing the automation scripts. Additionally the admin would need a lot of domain knowledge. Put simply, with the OpenSearch operator others (that hopefully have more knowledge and experience with OpenSearch then the average user) have already taken care of writing the automation and made it usable in an easy and (more-or-less) standardized way.

The OpenSearch operator is just one example. There are countless operators out there that deal with a wide variety of tools and (typically stateful, distributed) systems. But in the end they all work the same way: Operations knowledge is embedded in the code and you interact with that code using custom resources. It should be noted that operators have different levels of maturity. Some are very simple and essentially just replace a helm install. Others manage the complete lifecycle of their target system.

Operators manage external systems

Operators can also be used to provide a unified, Kubernetes-native and declarative API for your platform. In many situations you do not just have a Kubernetes cluster with services running on it but Kubernetes is the basis of a complete platform for your business, often spanning many clusters and users with a whole range of infrastructure and platform services like databases, Kafka, object storage and many more.

Most of these services have their own API, be it the API of a cloud provider like AWS, Azure or GCP or the API provided by the product like PostgreSQL or Kafka. A user or team using such a platform needs to access and automate a number of APIs which leads to complexity and effort. One solution that is becoming more and more popular is to unify all the APIs under the Kubernetes umbrella.

And that is where operators come in. They can provide custom resources for each and every platform service so that users can manage these services just like they manage their normal Kubernetes workloads such as deployments, pods, secrets. For example there could be a custom resource PostgreSQLServer that allows users to provision a new PostgreSQL server by applying a custom object to Kubernetes.

Want to change the size of the server? Just change the definition in the custom object, the operator will take care of the change.

Want to provide separate databases inside that server for your services so you have one server per team with separate databases per service? Just create some custom objects of type PostgreSQLDatabase. You can take this pattern as far as needed for other platform services like Kafka topics, object storage buckets, access to MQTT brokers or anything you can think of. Instead of writing some terraform code for the database and custom scripts to orchestrate Kafka you can do it all using kubernetes YAMLs. All the provisioning and management logic is implemented only once – in the operator – and the users of your platform just write some simple yaml files instead of having to learn and use a bunch of different infrastructure languages and APIs.

Kubernetes YAML: Advantages of having a unified API

Having a unified API is just one of the benefits of this approach. Doing everything with Kubernetes YAMLs also means you can treat it the same as your “normal” Kubernetes deployments, so you can deploy it using the same mechanisms and workflows, e.h. using Helm and FluxCD. If you already use a GitOps-style automation for your deployments, managing platform services via custom resources and operators will fit right in.

One other advantage of the unified API is also that you get abstractions. Let’s say your platform runs on different cloud providers and maybe even on-premise but you want to use managed offerings of the cloud providers (e.g. Managed Postgres) as much as possible. That means for each cloud the way to provision a managed service is different and depends on the providers custom APIs. Using Kubernetes custom resources as a unified API means you can abstract away most of the differences between the different environments and provisioning approaches and present the user with the same interface regardless of where they want to deploy.

Because in the end, to use the PostgreSQL example again, the user only wants a PostgreSQL database and doesn’t care about the work needed to provide it or was is behind it as long as what is behind that behaves like a postgresql server. In reality some distinctions will remain, e.g. when a feature supported by one cloud platform is not available on-premise or on another cloud platform. In such a case the operator can, depending on the situation, either ignore the configuration or replicate the feature using (incomplete) alternatives.

Languages and frameworks for developing operators

Kubernetes operators can in theory be implemented in any language that has a Kubernetes client library. Or you could write one yourself if there isn’t. But in practice you want to use a framework that abstracts away all the tedious stuff and allows you to focus on your operator logic. At MaibornWolff we have been developing operators for customers in different languages and with different frameworks for some time. From our subjective experience there are three languages and frameworks that offer themselves for developing operators:

Go: Prominent with large community

When starting a new operator Go is on the top of the list of possibilities. As Kubernetes itself is written in Go there is a lot of official tooling for integrating with Kubernetes. Also Go has become the most prominent language for cloud-native tools like Kubernetes, Terraform and Prometheus. This also means many people use it so it is easy to find documentation, get help from the community or find a third-party library that provides convienient extra functionality. The basis of operator development with Go is the controller-runtime from the Kubernetes API Machinery SIG. It provides all the basic building blocks for developing operators and can be used with several frameworks: Most known are kubebuilder and operator-sdk. While kubebuilder is developed by the same people as controller-runtime, operator-sdk is developed mostly by RedHat.

Rust: For safe and secure operators

Rust is a language that focues on performance and safety and is often used for systems or embedded programming. Nonetheless there is a very actively developed Kubernetes library called kube-rs that also has controller support partly modeled after the controller-runtime. The combination of Rust and kube-rs as framework is not as widely used as Go and controller-runtime and therefore does not have as much support tooling and a smaller community to turn to for help. But if Rust is already used in your organization or planned to be used, it is a very capable alternative that – due to the Rust safety guarantees like the borrow checker – makes it easy to develop safe and secure operators. Depending on the environment and compliance context the operator should run in, this can be an important factor.

Getting started quickly with Python

Python and the kopf framework provide more abstractions and helpers than the other frameworks, making it easier to get started and to develop simple operators. This makes it particulary useful for usecases where the Kubernetes interaction is fairly simple, e.g. react to changes in objects and change some status objects, and the bulk of the code and logic deals with systems outside of kubernetes that are managed through their own API. As Python is already often used for infrastructure scripting/automation there is a plethora of libraries for any system you can think of so if you are coming from that direction it is a good choice. For example AWS, Azure and GCP all have official python SDKs.

Go, Rust or Python: Which one to use?

From our personal experience of writing operators Go and kubebuilder+controller-runtime is widely used. But writing code in Go can be tedious as it is not known for its concise code and limited features in the Go language itself can produce interesting runtime bugs. One example: At the time of writing generics are just being introduced with Go 1.18 meaning many libraries still have interface {} and do type checking at runtime. For fans of Rust writing operators with kube-rs is a joy. That joy can be clouded because libraries to interface with external system are not as numerous as in other languages. Python and kopf are really easy to get started and to get results, so are perfect for small operators but when writing safe code the dynamic typing of Python can be a pain.

To summarize:

  1. Go and kubebuilder or operator-sdk are a good default for writing operators if you already know and like Go or are ok with learning it.
  2. Rust and kube-rs have its nieche.
  3. Python and kopf should be the first choice in situations where python is already established in your organization, be it for automation tasks or as a general programming language.

Operators written by and with MaibornWolff

For inspiration you can check out some operators written by and with MaibornWolff:

About the author

Sebastian Wöhrl

Senior Lead IT Architect

Sebastian has been working at MaibornWolff since 2015 and, as an IT architect, designs and develops platforms for (Industrial) IoT Usecases, mostly based on Kubernetes, with a focus on DevOps and data processing pipelines. As a technical expert, he not only implements customized self-built solutions with his favorite languages Python and Rust, but also plays a leading role in various open source projects, mostly initiated by MaibornWolff, which he uses again in everyday projects for the benefit of customers.