DevOps

What Do Developers Need to Know About Kubernetes, Anyway?

Overwhelmed by Kubernetes challenges? OpenTelemetry provides crucial insights into your applications and cluster health.

Austin Parker Director of Open Source, Honeycomb

November 2, 2023

What Do Developers Need to Know About Kubernetes, Anyway?

Austin Parker, director of open source at Honeycomb, shares insights on how developers can tackle Kubernetes challenges using OpenTelemetry. Discover the key to unlocking actionable telemetry data.

Stop me if you’ve heard this: you just pushed and deployed your latest change to production, and it’s rolling out to your Kubernetes cluster. You sip your coffee as you wrap up some documentation when a ping in the ops channel catches your eye—a sales engineer is

complaining that the demo environment is slow. It’s probably nothing to worry about; it’s not like your changes had anything to do with that… but minutes later, more alerts start to fire off.

You pull up a dashboard, and it’s a Christmas tree of red and green indicators flashing at you. What’s the problem? Unavailable replicas, unknown pods, unsuccessful jobs—it’s a lot to take in, and the clamors of the sales engineers are picking up because they’re going to be demoing in half an hour.

Kubernetes offers developers an appealing story—it can do most of the heavy lifting of running a distributed application. The nice thing about that story is that it’s true! Vertical or horizontal autoscaling, setting and enforcing resource limits, or even basic workload scheduling used to be extremely challenging to accomplish on your own, requiring specialized tooling or cost-intensive IT infrastructure.

Too often, though, developers aren’t given the right tools to understand what’s happening in a cluster or how it impacts their applications and code. Kubernetes monitoring focuses on low-level node or pod metrics with a healthy dash of time-consuming log searches. It’s challenging to discover correlations between changes in cluster health and application behavior. It’s even harder to find the inverse to understand how application behavior influences cluster health.

Running With Kubernetes vs. Running on Kubernetes

Much of this tooling gap can be distilled down to how Kubernetes is managed and offered by platform teams. While Kubernetes itself is an increasingly popular deployment target—96% Opens a new window of organizations surveyed are using or evaluating Kubernetes as of 2022, we don’t see a lot of developers building applications that specifically leverage Kubernetes APIs.

This isn’t a bad thing at all, though! Kubernetes is an abstraction layer over compute, storage, memory, and networking. You don’t necessarily need to build applications that hook into the API to get value. However, this is where the pain begins for developers; even if you aren’t building operators or using Kubernetes-native frameworks (like Quarkus), you’re going to rely on the underlying machinery of Kubernetes to handle things like service discovery, routing, storage, resource limits, scaling, and more.

In either case, you have a problem. Kubernetes itself can influence your application health, and your application can influence the cluster state, but the telemetry you need to correlate and diagnose these problems is often disjoint.

Consider a relatively uncomplicated service running on Kubernetes. Changes to your load profile can harm other pods scheduled on your node. New deployments can lead to load spikes on stateful services, like databases—significantly as you change queries and add features. Trying to track down intermittent bugs across thousands of pods is frustrating. These challenges only multiply when you start to make deeper integrations into the Kubernetes API—for instance, if your service starts new jobs or is otherwise modifying cluster resources.

See More: Unlocking Four Requirements for Enterprise-Grade Kubernetes

Impedance Mismatches

There are two main challenges that you need to tackle when deciding how to understand your Kubernetes applications as a developer. The first is getting the correct telemetry data at the suitable resolution, in the right place, to ask questions about it. The second is to filter out all the less critical data, focusing on the things that provide the most value.

These are challenges that existing tools need help addressing. For example, it’s prevalent to use tools like Prometheus and the kube-state-metrics service to turn object-level information into metrics data. Unfortunately, this data tends to be very high cardinality over time—attributes on a single measurement, like k8s.pod.ready, will change frequently as pods move through their lifecycle. You can end up in a state where you might know how many pods are failing but not which ones. Worse, the entire series might wind up being not exported at all. Things like secret or service creation, which can help understand why pods may have incorrect or missing configuration values or aren’t accessible, are often dropped.

The fundamental problem isn’t just that “it’s hard to get the right data out,” though. The people who are most often responsible for collecting and managing telemetry aren’t the people who need to use it to understand their systems. This doesn’t just set you up to fail on a technical level but on a very human one.

What Do Developers Need?

I think there are three main things that developers need to understand Kubernetes-based applications:

Opinionated and optimized telemetry about Kubernetes events and objects.
A stream of highly annotated and contextually relevant application telemetry.
Analysis tools that can not only quickly identify hotspots and places to start looking for problems but also can assist in deep dives into the system.

OpenTelemetry is the answer for all three of these points. Out of the box, the OpenTelemetry Collector can capture a wealth of data about the health of a Kubernetes cluster and its workloads. You can then transform that data using the Collector to re-aggregate the events, reduce the number of metrics emitted, transform their attributes, and more.

OpenTelemetry also allows you to easily correlate and enhance application and service telemetry with essential Kubernetes metadata using the processors available to the Collector. You can ensure that your traces, metrics, and logs are all annotated with accurate and consistent attributes for later correlation.

Once you’ve got this data, what do you do with it? OpenTelemetry to the rescue again! Almost every commercial and open-source monitoring and observability tool supports OpenTelemetry data. Rather than face vendor lock-in while addressing mounting monitoring bills, OpenTelemetry allows you to customize your observability pipelines at a deep level. Use the Collector to split out high-priority customer-facing telemetry into near-real-time analysis and alerting tools while sending everything to cheap and efficient blob storage.

What’s Next?

If you’re feeling overwhelmed by Kubernetes, you’re not alone. I’ve spoken to hundreds of developers who feel frustrated and stymied by mismatches between what they’re responsible for and what they can control. While OpenTelemetry is the first step in creating and collecting actionable telemetry data, it’s just that—a step. You still need to analyze that data, and you need to develop a practice for using it.

Have you explored the power of OpenTelemetry in understanding your Kubernetes apps? Let us know on FacebookOpens a new window , XOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

MORE ON KUBERNETES

DevOps Kubernetes

Austin Parker

Director of Open Source, Honeycomb

opens a new window opens a new window

Austin Parker is the Director of Open Source at Honeycomb, a long-time contributor and Governance Committee member for the OpenTelemetry project, and author of several books including Distributed Tracing in Practice and Learning OpenTelemetry.