Skip to main content Search

You built observability… But are you actually using it?

Everyone says they have observability—but are they actually using it?

In this episode, Pinja and Stefan unpack the gap between collecting data and creating real value. From DevEx and faster debugging to experimentation and business insights, observability only works when it’s treated as a product—not just a dashboard.

If your observability stack feels more like noise than insight, this one’s for you.

[Pinja] (0:02 - 0:21)

We recommend organizations to look into what you are actually doing with your observability data. Welcome to the DevOps Sauna, the podcast where we deep dive into the world of DevOps, platform engineering, security, and more as we explore the future of development.

 

[Stefan] (0:22 - 0:31)

Join us as we dive into the heart of DevOps, one story at a time. Whether you're a seasoned practitioner or only starting your DevOps journey, we're happy to welcome you into the DevOps Sauna.

 

[Pinja] (0:37 - 0:44)

Hello and welcome back to the DevOps Sauna. I am joined by my co-host Stefan. Hey, how are you doing, Stefan?

 

[Stefan] (0:44 - 0:48)

All good. It's a nice day. Spring is almost here.

 

[Pinja] (0:49 - 0:56)

Somebody said to me this morning that, hey, please acknowledge that it's only two official days of winter left today.

 

[Stefan] (0:56 - 1:07)

Nice. We actually have positive degrees here in Denmark. All of our snow is completely gone.

 

It was so pretty, a thick layer of snow, frost, everything, and now it's just whatever's left is wet and slushy.

 

[Pinja] (1:08 - 1:14)

We're having almost a blizzard today here in Helsinki, so I think we're getting like 15 more centimeters of snow just today.

 

[Stefan] (1:16 - 1:21)

It's almost like the Winter Olympics the last few days. They were just overrun by snow.

 

[Pinja] (1:21 - 1:27)

Yes. To be fair, I'm not going to be digging out my shorts anytime soon. Let's see.

 

[Stefan] (1:28 - 1:32)

I bet you have some colleagues who are running around in shorts. Finnish people prefer their shorts.

 

[Pinja] (1:33 - 1:57)

We do. We do. We do.

 

Especially people who actually look at the calendar, not the weather when they wear their shorts and summer gear. But hey, from weather observations to observability, because that's our topic today. Somebody might have heard the nice phrase “Well, it looks good to me”.

 

If you want to see a funny picture, by the way, just Google that, it looks good to me and memes. You'll get a good photo of Steve Buscemi and it looks good to me.

 

[Stefan] (1:57 - 1:59)

If it's not Steve Buscemi, it's not the right photo.

 

[Pinja] (2:00 - 2:20)

No, exactly. Look for that. But that's pretty much the laughing point of observability, unfortunately, even though the meme is still really funny.

 

But many organizations still say that they have observability, but it's kind of a question is that, do they really? And is it just a fancy dashboard? There's the collected data, but is it being used?

 

[Stefan] (2:20 - 3:07)

It's the inevitable discussion whenever we talk about a term or anything. It's like you say you have it, but do you really have it? It's like me hearing people saying like, oh, we've already got a great developer platform.

 

You can request an AWS account and you can do whatever you like. Well, that's just giving you an account that's not made for your purpose. It's sort of the same thing when you go into observability and people are like, oh, we have observability, we have insights into everything.

 

Show me your logs. Are they over here? Show me your metrics.

 

Is it somewhere in IT? Do you have traces? What's that?

 

So we need to be honest and open and we need a lot of honesty and openness when we talk about observability, because it is looking into the engine and the health of the engine of the tech stuff. Sometimes the more soft side as well.

 

[Pinja] (3:07 - 3:29)

And it's not just the collected metrics, as you say, the logs, the metrics, the traces, is there a proper tagging and labeling in place, the ability to correlate? So we could almost talk about features, to be honest, with this. There's a more cultural aspect of things in an organization because we also break down the silos so that there is no more secrecy around it.

 

[Stefan] (3:30 - 4:13)

And sometimes you see individual teams spinning up some sort of infrastructure for hosting their logs. And well, Elasticsearch has been around for ages and people throw their logs in Elasticsearch, but when they're there, they're super hard to correlate with your metrics. And if you sort of take the next step and have traces where you can see the full flow of your applications, you might have somewhere where something is missing in your trace because you don't transport the correlation ID around to all of your services.

 

So it needs some really, really good focus and a proper structure to do this well. There are so many pitfalls in observability. Talk about sampling where people only sample 1% of their data.

 

It's like, well, if you sample 1%, what about the last 99? Are they failing?

 

[Pinja] (4:14 - 4:42)

Yeah. And we could take a thousand different perspectives on observability. But today, I mentioned the features, but we've been talking about a platform as a product.

 

So today, we're going to talk about treating observability as a product, something to improve, something to build, build in as a quality feature to the final product, maybe roadmap it, and maybe even actually contribute to your DevEx.

 

[Stefan] (4:42 - 5:15)

It is a big, big part of good DevEx, telling developers to deploy stuff. Some companies give full access to production. We don't really like that.

 

You should have the least privilege. You might have read access, or you can request it, but you shouldn't request access to production to see how things are operating. If you're writing log files to the disk on the server, it's not really a good experience.

 

Make sure they get centralized and you can look them up without having to elevate your privileges or gain access to productions or anything. It should just be collected and easy to look at.

 

[Pinja] (5:16 - 5:45)

I agree. And observability is way too often considered as a tool or just a metric itself. So you have your Snyk, for example, as a tool to help you with this.

 

You have the dashboard, but as a building quality feature, it's more like a modern product capability now, I would say. So it's essentially built into your platform, it's built into the product itself. And there are many ways we can actually think about what is good observability when we look at it as if it were a product.

 

[Stefan] (5:45 - 6:21)

Yeah. I usually refer to it as batteries included when I talk about platform engineering. And observability is one of these features that should be batteries included.

 

So if you'd use the platform, you should get some defaults out of the box. You should know, how do I actually just push my metrics into this? Is there a collection of metrics already available?

 

Does it collect my logs? What about the rest? What do I need to hook into?

 

Do I even need to hook into anything or is it just done by magic? Magic is always a bit scary because sometimes it's a bit too magic. But if you can explain what the magic does, it's good.

 

[Pinja] (6:21 - 6:27)

I don't know what is better or more dangerous. Is it magic or is it hope when it comes to stuff like this?

 

[Stefan] (6:27 - 6:29)

We never say hope, Pinja. You know that.

 

[Pinja] (6:31 - 7:17)

No, we have abandoned hope in these cases for a reason. We want reliability and sometimes hope does not take us there. But really looking into what it means?

 

When we talked previously about what it means that a platform is a product, we were talking about the customers. We were talking about the users. There are a couple elements here, at least how I see this.

 

You have your customers internally. So looking at it from the DevEx perspective. But then you have your final products, final users.

 

Actually looking at the final product, it might be a SaaS, it might be something else, but it's a built-in feature. You want your product to be reliable. So looking into what your customer profiles are?

 

What is needed from their perspective? Historically, this has been seen as a thing for the operations.

 

[Stefan] (7:18 - 8:16)

I guess observability landed on the infrastructure or operations table because they needed to sort of see if things were up and running. But observability is way more than uptime these days. You need to figure out, is everything running healthy?

 

If it's not healthy, where should I go and look for trouble? When I find the place to go looking for trouble, I can grab my locks and figure out what went wrong. But at the same time, we can put way more things into observability these days.

 

If you have a good setup in your company and areas or teams are responsible for profit and loss, all of that data from cost-wise, you can maybe grab your billing data from your cloud vendor, put that into your metrics as well. So you can actually see, we deployed this thing and it cost us X, Y, and Z. Is it actually profitable to run this thing?

 

So it's way more than just giving access to the developers. It's a super nice, powerful tool for developers, but it's a superpower when you go further out into the business.

 

[Pinja] (8:17 - 9:04)

And that's one of the reasons why businesses should be actually thinking about this and minding this as a business of their own. And they should have the access, definitely. And we can think about this as, you have the product managers for the final product.

 

You have somebody who should be considering this as a feature. Are we building this in when we improve our final product for the final users? Are we treating this as something that, let's say, might be a heavy user or a super user from another company that is using this, or it might be the actual end customer.

 

But considering also this, there should be an owner from the internal side as well, something like a product manager for this as well, thinking about who's owning it. How do we improve the observability from inside the organization?

 

[Stefan] (9:05 - 10:10)

And what features do we offer in perspective of observability? Because we might not offer the insights into the monetary side of things in the beginning. We might not offer a collection of end user interactivity, like monitoring end users to see our experiments.

 

Are they doing well? We might not offer that in the beginning. It's usually a step where we have some metrics, we have some logs, and we want to expand on that.

 

And we need to talk about architecture as well, because if you don't have anyone owning it, you're going to be pretty damn sure people are going to do whatever they like with it. And that will lead to a ton of problems when we talk about observability. If you ever meet somebody who works with observability, just say the word cardinality around them, and you'll see them sort of like squint their eyes because that's when things get expensive and it doesn't really perform well.

 

So you need that structure, as you said earlier, like good tagging, good labeling, good practices. How do we do X, Y, and Z? If we do that well, we can sort of build on top.

 

And that's where we come back to the roadmap, like what do we actually need in this corporation? What can we actually support here? And we could talk about a journey of maturity from that perspective.

 

[Pinja] (10:10 - 10:41)

Exactly, because we need to consider the cost of running the service, right? So building that into the business model, how much does our product cost to make, to run, to maybe sunset at some point? Because when done well, the teams and areas are responsible for the profit and loss related to this, so they need those insights.

 

And maybe actually these are again coming back to the internal users and internal customers of the observability data after all.

 

[Stefan] (10:41 - 11:09)

Yeah, definitely. And we need to do this good map of personas that should be using it. And when we start talking to the product people, they're quite fast coming around like, it sounds like it's something we could do with our end users as well.

 

Can we collect everything here so we can build the full picture view of everything here? I think that's where it gets super important. Talking to marketing, they would probably love to get insights as well.

 

So just so many perspectives and so many directions we can go here.

 

[Pinja] (11:09 - 11:24)

And you mentioned the maturity journey as well. So one of my favorite things to talk with organizations is roadmapping. Somebody once said to me from a customer's side, well, we don't feel like doing roadmaps because they're going to change anyway.

 

And I was like, aha, that's the thing.

 

[Stefan] (11:24 - 11:25)

Famous last words.

 

[Pinja] (11:26 - 11:56)

They need to change. And the same here. When you look at the observability's maturity journey within your organization, of course, there are new things you need to take into consideration.

 

You might try things out that are not working. And with the roadmap, you need to look at what is going to be prioritized next. Well, do you need a product manager separately?

 

You need somebody to own it. So every capability needs to be handled. And in that sense, a product needs to own that.

 

Otherwise, you're going to just let it go loose.

 

[Stefan] (11:57 - 12:58)

Yeah, it's the good old same product management story for almost everything in this world. If you set a strict fixed roadmap, you are going to fail sooner or later. Let's say a new legislation comes out of the EU.

 

We don't need to store our logs for, let's say, six months instead of 30 days or whatever. Well, you can't deny legal stuff that comes your way. Well, you can, but you're probably going to get a fine if that's how the regulations are built up.

 

But you need to be open and adapt into whatever changes and reprioritize. Every employer has been sitting with a task, and all of a sudden, somebody comes in and says, I need this. All right, tell me what is the most important bit right now, and what shouldn't I do when I do this?

 

All right. If we can have that open discussion in a psychological safe environment, it's good. Then we can have the discussion, should we do this?

 

Yes, no. And we might pull in some person who actually knows profit and loss and say, if we want to do this, will it actually be beneficial to us, or will it cost us? Well, go back to your observability stack, see what are you earning on this?

 

[Pinja] (12:58 - 13:31)

Yeah, so the good old trade-offs and prioritization. Isn't it nice? So the same principles in a little different setup, because we're talking about the technical capability of a product that is not fully visible to the final users here.

 

So we're not talking about something that is, let's say, a website takes X amount of seconds to load, or it gives me this many results, for example. But this is because this is working in the background. But in a similar way, prioritization and trade-offs need to be taken into consideration.

 

And you're going to have some stakeholders, like with your final product.

 

[Stefan] (13:31 - 13:39)

Stakeholders? What's that? We don't use stakeholders.

 

It's so last year. Yeah, we just imagine some things and build it, and then everybody is happy.

 

[Pinja] (13:40 - 13:41)

We build it in a silo.

 

[Stefan] (13:41 - 13:42)

Yeah, of course, yeah.

 

[Pinja] (13:42 - 14:03)

Isn't that silos are the new thing, right? Yes, of course, we go back to those. Good.

 

And to be serious, just to be clear, we're kind of joking there. But having those measurable outcomes, they need to be set out. What do we actually want from our observability?

 

And that actually means that you might need a vision and a strategy, like your final product.

 

[Stefan] (14:03 - 14:34)

And remember, there are more stakeholders than you actually think when you talk about observability. It's not only operations, developers, and products. Oh, GSC comes around.

 

We need to run this control. We need to see how many people were logged into our system on this day, or whatever you can imagine in a GSC setup. Like, oh, we didn't think about that.

 

We don't collect that data. Well, now you have to, and we have an issue because we can't prove that we were in control of this. So if you're highly regulated, you will for sure have a lot of observability data.

 

[Pinja] (14:35 - 14:39)

And I think it's always the case, like actually trying to map your stakeholders, there are more than you think.

 

[Stefan] (14:39 - 14:40)

There usually is.

 

[Pinja] (14:41 - 15:16)

Not everybody needs to have a say in all the things. So a good old RACI map, for example, who is responsible, who's accountable, who you need to communicate to, and you need to inform, or who's the one who gives you some advice. But if we think about the DevEx perspective for a So good observability from a DevEx perspective should at least reduce the friction and ease up the work.

 

We don't want anything extra coming to our developers. So that's one of the things. But looking at reducing the time it takes to resolve an issue is also one of the key things, at least from my perspective here.

 

[Stefan] (15:17 - 16:31)

And it actually comes down to one of the basic needs of a modern observability, that is being able to correlate. If you need to correlate your logs, metrics, and traces by hand, it's not a good DevEx experience. You look up some things in the system, you've sort of like manually done something and pushed something on the side to figure out, that was user ID 5 over here, but in this system it's 11.

 

So now we need a mapping table in between, and like that's not a good experience. And we need to have a good experience here. Because if we have a good developer experience side of observability, it is a superpower for them.

 

Because you reduce the time to resolve an issue, it's easier to find the root cause. When you know that you can actually do that, you feel more comfortable in deploying. And having confidence in your setup, that is so important.

 

It comes back to reliability again. If you're confident that you can do a fast turnaround, you can debug things fast, well, you might not exceed your error budget if you're running SLI, SLO, SLAs, and error budgets hardly any people do that in reality, but they like to talk about it. But when you talk about an error budget and you're close to it, then you need all of the insights to figure out, do I actually dare to do this?

 

[Pinja] (16:32 - 16:58)

And if I were a product manager for a product for an organization, that would be something that I would be very, very interested in, that the developers in the organization have that sense of security, they feel that there is the confidence to deploy, otherwise we're not getting anything done. And the other thing I would be very interested in seeing is that using the good ways of doing observability would actually enable faster experiments and also safer experiments.

 

[Stefan] (16:58 - 17:44)

Oh, yes. We're not guaranteed that things are working when we're experimenting, but at least we know, like we said, you can find the root cause, you can spot if there's any mischief going around, if people have found some odd way to do weird things with your experiment or exploit it even. You don't want that.

 

You want to be able to gain the insight and be faster to either shut down the experiment or ramp it up or whatever you do. It's tightly connected to feature flags as well, because you might be able to say, we want this experiment to run with 5% of our users. And if observability looks good, then just slowly ramp it up.

 

If something goes bad, ramp it down to 0.1% or even turn it off. You can actually automate your rollout strategy for your experiments if it's in that manner.

 

[Pinja] (17:44 - 17:57)

It's like if you want to really do a feature flag kind of thing, or do some A-B testing with a beta tester curve, for example, and actually test out some of the features, it's safer to do when your observability is in good shape.

 

[Stefan] (17:57 - 18:53)

Oh, yes. And some of the vendors have a concept called RUM, like Real User Monitoring, not the rum that you drink, even though that's funny every now and then. But if you have Real User Monitoring, you can actually see what the user is doing.

 

It sounds super creepy to people, but all of them are following GDPR, cookie consent, everything. But if you gain that insight, you can actually figure out, are there hidden user patterns? Is this something we didn't really think of when they see this?

 

Can we see they're super frustrated? Can we troubleshoot an issue they called us with? There's so many things we can do all of a sudden when everything ties in.

 

And we might be able to see they get an error message on the screen, and we can actually follow it along on the internal side and see what actually went wrong and how it is interconnected here. It's just insanely efficient if you do it well, because you can connect the external user and the internal user and whatever goes in between that.

 

[Pinja] (18:53 - 19:16)

Agree. And when we look at the final product, something that more traditionally has a product manager and some product management principles behind it, looking at observability, there are options on how to do it. And there has to be some thought behind it. Do we go way A or way B in what we do and looking at, is this good enough?

 

[Stefan] (19:16 - 21:06)

And how much are we actually willing to pay for it? Because as soon as you talk about observability, you can see a lot of people sitting back like, oh, we don't want to go there. It's expensive.

 

If you store everything, it's super expensive. Again, you need to think about how you're doing things. You might have a profit and loss strategy saying like, all right, well, you can burn, let's say, a terabyte of logging storage per month.

 

If you're making the money for it, then it's not a problem. You will have a ton of data, but it might be okay. It might connect back to some auto remediation of people trying to brute force your website to the floor like Adidas every day.

 

Well, then it's okay to pay the money for the storage because it saves your website. It's just, there are so many ways to do it. And when it comes to the cost, we also need to look at like, do we go with a vendor?

 

Do we want to build on our own? The good old buy versus adopt or build. I hope you don't build your own.

 

The issue is though that of all of the commercial solutions right now, they are actually quite superior to the open source solutions. The thing that is moving the most in open source is open telemetry. It's the standard we should have had like 20 years ago, but now it's finally here.

 

All of the big vendors are sort of okay with it. They're leaning into it. They offer interconnection by open telemetry, which is super good.

 

We've finally reached that. The problem is when you look at the open source products, well, you might use Elasticsearch, you might use Grafana, you might use whatever else is out there, but they're not good at correlating with each other. So you sort of like to reduce your developer experience by going for the open source way.

 

If you go to the big vendors, everything is magical. I'm not going to name the vendor, but everybody knows them. Their system is super good.

 

It's super expensive, but talk to the developer and ask them if they like what they are using. They definitely do. So you need to be aware of that.

 

[Pinja] (21:07 - 21:29)

Yeah. The options need to be investigated very thoroughly, as in you would like very different, the other features that go into your final product. How you do observability is, I don't want to say just another feature.

 

It's got its own twists and turns and some potholes to run into. But at the same time, some really basic principles apply here as well.

 

[Stefan] (21:29 - 22:14)

Make sure you choose a provider that is following open standards because then you can actually get away from them. Sometimes people invest heavily into these vendors. They use their own collectors and everything, and you can't get out again, which is sad.

 

Everybody is using, well, not everybody, but a lot of people are using different graphing systems for dashboards. The problem is you're tied into that product. In the open source world, they are trying to come up with a standard called Perseus that should alleviate all of this by being able to move your dashboard around.

 

Not everybody has leaned into that because it's part of their product, being able to do that, even more fantastic dashboards. So let's see where it goes, but it's just good to see that we have standards actually coming out finally because it's been needed for a long, long time.

 

[Pinja] (22:15 - 22:38)

But to sum a bit of this discussion up, we recommend organizations to look into what are you actually doing with your observability data. There are multiple reasons. Definitely, of course, why are you collecting in the first place if you're not using it is, I guess, one.

 

But from today's perspective, just utilize this as you would if it's worth actually a feature in your product and apply the product management principles.

 

[Stefan] (22:38 - 23:33)

There is an insane amount of observability data running in different businesses that is never being queried, not even in incidents. Weed it out, throw it away, follow some good conventions. There are standards and conventions you can get from open telemetry, for example.

 

What should you collect? When should you collect it? If you're using AI, this is how you actually use log, traces, and metrics for MCPs, for agents.

 

There are some really good conventions out there you can follow. You will actually be wiser just by looking at the specifications. Always keep an open eye on what everybody else is doing, what direction is the industry moving.

 

Why would you go the complete opposite way? Because then you can't leverage the features the vendor is offering you if you go complete bonkers here. Even in the AI world, you need good observability.

 

You definitely need observability for your AI. It's probably more like that.

 

[Pinja] (23:34 - 24:00)

Don't forget your users. A former colleague of mine who was a product management coach, he was always asking, where's the user here? Where is the user?

 

Once again, look at your users. It might be your internal user. It might be the customer and the end product.

 

Where's the user? Where's the customer? Utilize all this data.

 

With better experimentation or a safer experimentation, you might get a better product in the end.

 

[Stefan] (24:00 - 24:42)

For sure you will. If done well, you sort of remediate the issues you see. You actually get a higher level of reliability.

 

You can't get more, but you can get a higher level. That is a superpower as well. All in all, taking all of this, creating good product management, getting insights into everything, figure out what are the users doing, feel safer by experimenting, and increase your reliability.

 

Of course, you need all of the cultural stuff. You need the ways of working on the side, but in the great package, you should probably have some people who care about observability. You should maybe have a team if you're at that scale where you have a team that is focused on observability.

 

It's not easy to do well.

 

[Pinja] (24:42 - 25:06)

It's not easy to do well, but at the same time, keeping in mind that with the increased higher level of reliability, with better experimentation, and just treating it as a feature, you might actually increase how your end customers experience the quality of your product and your service. Maybe that could be one hook to have for the business people.

 

[Stefan] (25:06 - 25:10)

In the end, we want to have a higher profit, and we can actually do that.

 

[Pinja] (25:10 - 25:11)

Exactly.

 

[Stefan] (25:11 - 25:12)

It's just that easy.

 

[Pinja] (25:13 - 25:15)

It's as easy as A, B, C.

 

[Stefan] (25:15 - 25:15)

Good.

 

[Pinja] (25:16 - 25:20)

Hey, I think that's all the time we have for this topic today. Stefan, thank you so much for joining me.

 

[Stefan] (25:20 - 25:21)

Thank you, Pinja.

 

[Pinja] (25:21 - 25:31)

Okay, and thanks everybody for joining us in the sauna, and we'll see you next time. We'll now tell you a little bit about who we are.

 

[Stefan] (25:31 - 25:36)

I'm Stefan Poulsen. I work as a solution architect with focus on DevOps, platform engineering, and AI.

 

[Pinja] (25:36 - 25:41)

I'm Pinja Kujala. I specialize in agile and portfolio management topics at Eficode.

 

[Stefan] (25:41 - 25:43)

Thanks for tuning in. We'll catch you next time.

 

[Pinja] (25:44 - 25:52)

And remember, if you like what you hear, please like, rate, and subscribe on your favorite podcast platform. It means the world to us.

 

 

Published:

DevOpsSauna SessionsProduct managementPlatform engineering