What can we learn about observability from submarines?

When we talk about dashboards in software development, we may think that the more accurate the better. But what if the only data you have is an approximation that gets more uncertain the longer you look at it? Andy Allred has worked in submarines for years, and he discusses what people in software development can learn about navigating and visibility from the deep blue sea.

Andy (00:07):

Instead of checking every single detail and getting our position fixed, our area of uncertainty down to zero on a tool selection, it's much better to choose something which checks most of the boxes; start using it and find out as you go if it works or not. It's always possible to change the tools and the more we change, the better we get at it.

Marc (00:34):

Hello and welcome to DevOps Sauna. This is Marc Dillon, I'm your host today. When we talk about dashboards and software development, we may think that the more accurate is better. But what if there's simply no way of being accurate? What if the only data you have is an approximation that gets more uncertain the longer you look at it. Andy Allred, Lead DevOps consultant at Eficode, has worked in submarines underwater for years. He joined our podcast to discuss what people in software development can learn about navigating with those shady circles on the charts. Let's tune in.

Marc (01:18):

This is SeaDaddy, number two, navigating and making forward progress. I'm here with my colleague, Andy Allred.

Andy (01:26):

Good morning, Marc.

Marc (01:29):

Good morning, Andy. It's great to be here today. One of the challenges that every organization needs to overcome is how to navigate across all the things that are going on, changing priorities, business and markets, and following the tides. We've had a lot of these really interesting times lately where things change quite suddenly across the world and some organizations react better than others. Remember this agile principle of responding to change over following a plan. How many of you <laughs> are out there doing that today?

Andy (02:06):

I think there's not a lot of plans being followed recently because something's changing all the time to just throw them out the window.

Marc (02:14):

I know, really amazing times that we're living in. But Andy, I think you may have done some things in the past that might provide some insight and navigating and changing plans.

Andy (02:28):

Yes, indeed. So when I was on the submarine 10 years as a submarine sailor, one of the duties was navigation. Navigating at sea is a little bit different than navigating down a highway or with GPS and navigating underwater is even more complicated than that.

So when you're underwater, you don't have a dashboard to look out of. You don't have any view of where you are, what's going on. You have no landmarks, you have no GPS signal. You really just have the idea of this is where you were, this is the direction you think you're going, and this is how fast do you think you're going to go by. Also underwater there's currents in the ocean, which push you around.

So there's a principle called set and drift. And the idea of that is even if you're sitting still, the ocean is going to push you somewhere and you measure that in what's called set, which is a direction, and drift, which is a speed. You never know what your set and drift is. You can only guess at it until you reevaluate your position and define this is where you are.

So when you're going through the ocean, you have a compass bearing in front of you about the size of a dinner plate with intervals around of all the different degrees in a circle. And you're told you need to drive this bearing. So let's go two-seven-zero, which is due west, you steer, and you try to get basically to two-seven-zero, but it's going to be plus or minus because nobody's perfect.

The equipment isn't always extremely accurate. Of course, we do the best we can. And then there's these currents and applying set and drift to you that you don't know. So as you're traveling underwater, we don't say this is where we are: we instead say, this is our area of our uncertainty.

We're somewhere in this circle. And the more we go, the farther we go, we need to expand that area of uncertainty and say, okay, now we're in this small circle. Now we're in a bigger circle and continue like that.

Marc (04:37):

So this is like when your GPS gets lost and then all of a sudden, the little dot of where you are starts turning into this big circle, and then you go, oh my goodness. You know, I've, I've lost my signal, but this is kind of standard operating procedure underwater. When you don't have a GPS or landmarks or visibility.

Andy (04:56):

Exactly. Exactly. So if you're looking at your GPS and it says you're somewhere in Helsinki, well, that's less than useful, but at least I'm not in Lahti or St. Petersburg or something. So, so at least I know the city, I mean, that's useful, but it doesn't really help you figure out more and you need to get a more precise fix. So in a submarine then after, depending on the operation we're doing, sometimes we're just transiting. We're just trying to get to the other side of the ocean as fast as possible.

So our area of uncertainty can get really big and it's not a big deal ‘cause we're just in the middle of nothing. So we might come up once every day or a couple of days in some extreme cases and figure out this is where we are now and re-fix our position. Other times we're somewhere closer to land and we need to know more specifically.

And more often, we'll come up more often to get a fix. And when we come up, we will find our position. We have different tools, GPS is one of them, to figure out where we are. We will then reset. This is specifically where we are at this moment, reduce the area of uncertainty, and then decide now based on knowing exactly where we are, this is the direction we need to go next.

Marc (06:14):

This is really neat. It makes me think of some different analogies, like when we're building a proof of concept. And essentially what we want to do is take two remote bits of technology and connect those together to see where the value is. So this is kind of like this transiting. It's like, let's go as fast as we can in order to take the things that we think are important, connect them together, and prove that it works.

But then as we start delivering to customers, then we want to monitor a bit more closely what we're doing. So there's a lot of different kinds of DevOps analogies here I think.

Andy (06:51):

Yeah, exactly. So in the early development stage, you just want to know that is this a good tool to use or a good language to use? You're not really worried about all the details and specifics. You just want to say, is this the right direction to go in, but as you get closer to a release, you want to know more specifically, is this part correct? Is this thing tuned properly? Is this the right setting in this sub-menu and whatnot?

Marc (07:19):

Yeah, there's another interesting observation here, which is that oftentimes I've heard teams that are having difficulties in their sprints say that they want longer sprints. And the reality of the situation often comes at the closer that you're getting to release time, the shorter those sprints should be.

And oftentimes we get into even a daily mode where it's not exactly the daily that you're used to in scrum, but it's every day let's make a plan of what we can prove that day, what we can monitor that day, so we have to close this. It's funny. I keep thinking circle of confusion, but that's something we have to close this area of uncertainty down so that we get more and more clear where we're spending our time and effort so that we don't end up in the wrong harbour or wrong water.

Andy (08:08):

Exactly. And the other thing that I like to apply to DevOps thinking from the submarine is we know there's currents. We know for sure. We're never going to be exactly where we think we are. There's always going to be an area of uncertainty and we could be on the left side of that or the right side of that, or a little bit further than we think, or not quite as far as we think, but that's just standard operating procedure.

We just know that there is set and drift, they do exist. And in DevOps, in development, in IT world, we sometimes forget that. And we think that, well, this is my plan. So this is my plan. Somebody gets sick and it just throws the whole sprint off. And then we start thinking, oh, the sprint failed instead of thinking, well, we had a little bit of set and drift, no big deal. Let's just readjust, realign, move forward again.

Marc (08:59):

Indeed. It's like these estimates have been a very difficult thing in software since the 1950s. We're starting to get a little bit better at this recently actually, but this idea that we still need to make promises in many organizations. The beginning of the year as to what we're going to have. And it takes a lot of time to build the trust in an organization that it's more important that we're able to react to new things as they're happening.

Like, I'll say the word COVID for example, but as things are changing and there's things changing in the news all of the time, look at what's out there and look at how certain entrepreneurs and companies are able to react to these things faster than others. That's good DevOps principles. Very interesting. So what are the kinds of principles that we need to look at in order to be able to manage this area of uncertainty that we're all living in?

Andy (10:01):

We need to kind of remember that, first of all, it grows over time. So the longer, since we've measured last time, the bigger the uncertainty is, the more things could have gone slightly wrong or slightly better than we expected or to the left or to the right. And as we're kind of thinking, how long should our sprint be? If we have four weeks sprints, that's four weeks that things have been going slightly awry and we could be rather off where we thought, if we're doing daily sprints, then it's every day, we're really checking in and see where we are. Of course, both of those are extreme and neither one is probably ideal for most situations.

But you need to think about that and find the right balance. That which period of development am I, how close are we to release.

Marc (10:51):

I just want to kind of repeat this point that you made: the longer that you go, the further you're always going to be away from where you think you're going. The more often that you're checking, then the more often that you're able to adjust for where you actually want to be.

Andy (11:10):

Exactly.

Marc (11:11):

That's a really amazing principle.

Andy (11:14):

Yep. And then we need to think also that on a submarine, when we're navigating, we know our speeds or we know our estimated speed and our estimated direction, but how do we measure this in IT and DevOps.

So is it lines of code? Is it features that you can see? Is it some metrics on a dashboard? What are we measuring to understand how far we've gone and where we are? And those measures can be really important because if you're measuring the wrong thing, your area of uncertainy is totally different than what you thought.

Marc (11:47):

This lines of code reminds me of one of my favorite series of job interviews, where the first guy came in and he told me, oh, my gosh, the last place I was working, they were measuring lines of code. And I'm like, well, why don't you ask me about lines of code, how many lines of code have I written?

And the guy kind of looks at me funny and he says, 'okay, Marc, how many lines of code have you written?' And I said, 'as few as possible.' So the next guy that I interviewed, I told him this exact story and he said, ask me. And I said, well, 'okay, how many lines of code have you written?' And he said, 'I've deleted more than I've written.' <laughs>

Andy (12:30):

Yeah, exactly.

Marc (12:33):

That was the most senior guy that we hired that season. But yeah, it's really interesting. How do we measure when you can go back to the base, DevOps metrics the lead time, mean time between failures, and all of these things. And I think that for a lot of what we're kind of looking at here, if we think in terms of those lead times, and at least one of the failure metrics, the field failure rate for example.

Then you're at least starting to look at the kind of things that tell you, are you going into the right waters or not? And how big is your drift and how far is your drift from the places where you need to go at any given point?

Andy (13:25):

Yeah, exactly.

Marc (13:26):

And then I guess, do those change depending on where you are in the organization?

Andy (13:32):

Possibly they do. For example, if you talk to a developer they're really, really focused on a specific function, a specific feature, and they're also thinking I need to be as accurate as possible, and I need to make sure I get this right. If you talk to a business owner, they're thinking more, when will this release be ready?

And they're more worried about give me a date and a date that I can trust. They're much more worried about the predictability of the date than how fast it is. Of course, they always want both, but if you can only choose one, it's better to be predictable than fast.

Marc (14:14):

And yeah, that’s a different kind of analogy here, which is, I think that if the ship shows up at the right time in the right harbor and you only get half the cargo, is that better than the ship showing up in the wrong harbor at the wrong time? Yeah. Can we trust that the ship is going to show up on time, but it might not necessarily have all of the features that were expected.

Andy (14:39):

Exactly.

Marc (14:40):

And at least then we have the whole release mechanism, still has something to do. All of the deployments, sill results in something that can be tested in the area of uncertainty can be smaller.

Andy (14:55):

Yeah. And that just kind of comes back to how are we measuring and what are we measuring? Of course, we want both. We want the ship on time in the right harbor with the right cargo on board. Always for sure. But when we're measuring and as we're working, which one are we focusing on and that also affects what measures or what metrics might be the good ones to follow?

Marc (15:20):

I think one of the things here about practice is the practices and enterprise-level thing, it's bottom-up and it's talked down all at the same time. And this like, these different planning methods we used to call it. We don't have a crystal ball, so we can't tell the future, so let's have a crystal wall.

So let's look at this: there's lots of different numbers for this. I think like 3, 3, 6, or something where you look at the next three months with the next month or two and a level of detail. And then three months after that with some vision. And then the rest of the year with kind of business targets and how you're going to work towards those.

And this is the kind of thing that when you start to practice over time, then the business starts to understand from the top down that okay, we don't always get all the cargo, but at least we show up at the right times and with the right features, and then it can start to change the way that incentives and other, and trust works in an organization. Because otherwise, I've seen so many times that software just results in everybody feeling like there's broken promises all the time.

Andy (16:36):

Yeah.

Marc (16:37):

Nobody wants that.

Andy (16:40):

And I've heard in the military career, there was a saying that you don't rise to the occasion, you sink to the level of your training. And the idea is that the more you train and the more you do a certain routine as part of your training, when you need to do it for real, so to speak, you're already able to do it.

You kind of do it more instinctively than, hmm, let's figure this out. So in it, the more often we kind of release, the more often we deploy things, the more often we're actually changing our setup, changing things, the better we will be at doing it. If we do something once a year, we're not going to be that good at it.

If we do it every couple of weeks, we're going to work out the bugs. We're going to know how to predict. We're going to understand how to get the right cargo, et cetera, et cetera.

Marc (17:32):

Very good. One of the topics that we could touch on here, I think is many organizations are struggling with their tool selection. And we used to think in an agile that empowering teams means let them pick whatever tools they want and let them run whatever processes that they want inside of their teams. And then we end up with really interesting problems as we scale across the enterprise, that everybody has a different tool and a different way of using it.

And have their pipelines in different shapes. And let's not even go into click ops during today. We'll have another podcast for that, but it's kind of like, what are some things about uncertainty and tools that we could learn from this training, Andy?

Andy (18:20):

In general. I think the tool selection is an interesting use case that quite often we look at a tool and we want to analyze it and double-check that it has all the nitty-gritty details and it fixes what we want.

And that becomes quite a subjective instead of objective exercise that, well, this doesn't feel right. I don't like that over there. It's much more important to say this is the use case we have need to fill, and this is what we need the tool to do. And instead of checking every single detail and getting our position fix, our area of uncertainty down to zero on a tool selection, it's much better to choose something which checks most of the boxes, start using it and find out as you go, if it works or not.

It's always possible to change the tools and regarding the previous comments, the more we change, the better we get at it.

Marc (19:20):

Yeah. And I think, I think there's some interesting things here just to kind of bring up, which is like the tool doesn't really have to check every single kind of box because there's three big ubiquitous toolchains that we talk about a lot, Gitlab, Github and Azure DevOps.

And many enterprises are able to use only one of those for everything from their version control and their ticketing systems. And of course, we have our good friends at Lossy and where that toolchain for the so-called fuzzy front end is often all that's needed for handling all of the requirements and issues and these kinds of things.

But anyway, it's not necessary to have a whole bunch of different kinds of tools in order to do software development and deploy that into the field. So it's like you don't have to check every single box for every single developer's corner case.

And the oftentimes the tool that you have is better than the one that you don't...

Because the one that you don't requires a bunch of energy in order to get over there. And I've seen companies do cultural change by migrating to different tools. And that's a very interesting set of pros and cons.

And then the tool that most people in the organization are using is often better than the one that fewer people are using in the organization, just because if you get onto one, you have so many benefits anyway.

Andy (20:47):

Yes, exactly. Yeah.

Marc (20:49):

And then my personal favorite and I always put this as third, which is that oftentimes, the lead team or the most important new project has a new tool. And sometimes that can be the one that becomes the most important for the organization.

And then everybody kind of migrates it in that direction, but still, the area of uncertainty gets smaller as you have fewer options.

Andy (21:17):

Yes, exactly.

Marc (21:23):

Hi, it's Marc again. Just like in underwater navigation, good metrics are invaluable and bad metrics can make the life of the team miserable. Where submarine crews have their eyes laid on their charts, in software development, Agile metrics provide insight into productivity through the different stages of a software development lifecycle.

Arto Kiiskinen from Eficode has written a blog about avoiding bad metrics and software development. In the blog, he looks at some of the most common metrics, mistakes, and advises, how to avoid the pitfalls. You can find the link in the show notes. Now let's get back to our show.

Andy (22:06):

So when we're thinking about, is this tool good or bad? There's always going to be trade-offs and there's no perfect tool for every single situation. So there's always going to be pluses and minuses. There's always going to be, yeah, but in this case, you need to also run that shell script. We've automated it, but we have to remember that it's there and you need to have people who can maintain pipelines and what whatnot, and understand how to use the tool. But every single tool has those. So instead of kind of trying to find the perfect tool, find the one which basically checks most of the boxes and understand what the workarounds are to get the other boxes checked, and adapt to that.

And you'll be much better off because, as you said, the fewer options you have, the smaller your area of uncertainty. So the easier you're able to plan, what do you really need to work instead of trying to figure out where you are?

Marc (23:03):

Very, very cool. I guess there's other angles that we could look at as well. So one thing that I hear a lot in my consulting work is this analysis paralysis. So what this means and lay terms is people just analyze and analyze and analyze without actually doing something or getting something done. Yeah.

I like to think of scientific method where we research and then we experiment. And then we look at what those results are. And then we plan how to go forward or report. So analysis paralysis is a large circle of confusion that also doesn't seem to really be getting anywhere. But what you've taught is, Andy, it's still drifting.

Andy (23:52):

It's still drifting. Yep.

Marc (23:54):

It's still drifting. <laughs> So what are some things, some further bits about reducing this area of uncertainty around analysis paralysis?

Andy (24:06):

So what I have found is often the best is to pick what the kind of best idea of a tool is, and start using it. You're going to get progress. You're going to be able to start getting things done. You might have to rework some stuff later, but you're going to…you're going to learn a lot more by using a tool than you could ever do by analyzing a tool. So making forward progress is always more important than making a deep, deep analysis.

There's also the saying that fail moving forward. So instead of standing still, you start moving in the right direction. It may not be the perfect direction, but at least you're going roughly in the way you want to go. You're going to learn a lot by doing that. You're going to figure out that, okay, these are actually the important parts of the tool we thought that would be important, but it turns out it's not because it's covered by this other use case or whatever.

Maybe you do need to change the tool, but you already have workflows that you can kind of adapt to the new tool instead of starting from scratch (instead of) a few months later, having done a detailed analysis, which wasn't actually correct anyway.

Marc (25:17):

Yeah. And if you listen to our previous podcast, if you haven't, I recommend that you do. But one of the things that we looked at is that onboarding. When you get onto a new nuclear submarine, you have to learn essentially all of the systems. And how long does it usually take, Andy?

Andy (25:38):

Six to nine months.

Marc (25:40):

Six to nine months. So if you think that a young Navy person can get onto a submarine, a nuclear war machine, and learn all of its systems in six to nine months. And you think, you know, how many people in your organization even know what the value streams are, much less all of the areas that are crossed and all of the different technologies there.

So I think one thing that really can powerfully reduce the area of uncertainty is for, you know, onboarding and essentially everyone at one point or another and their early part of their career to look at every single system, every bit of the value chain, both on the operational value chain and the development value chain that feeds those. And see how these value streams then contribute towards the company. So that when people are making changes, they understand how this change relates to the value that the business generates.

Andy (26:42):

This also affects our previous comments about the empowering developers. If you empower developers and you say, this is what the value stream is, they see what the value stream of the company is, how their work impacts the downstream, maybe this particular feature they were looking for in a tool actually, isn't important. But when they look at just this, the very narrow box of their work, it's critical, but big picture it's meaningless.

So having this kind of proper onboarding in this view of the overall system also helps that when you're empowering employees to do whatever their job happens to be, they see how it fits in. And they're able to make a kind of an overall systems decision instead of this is how my workflow goes decision.

Marc (27:30):

And speaking of empowering employees, I think there's something really, really neat here. So, you don't have maps in the Navy, right? What do you have?

Andy (27:42):

We have charts. Yes. It's the same thing, but we call it something different. If you really, really want to annoy a quartermaster on a Navy Naval vessel discuss, hey, where are the maps? Show me on the map where we are. <laughs>

Marc (27:58):

So, all Kubernetes puns are included here. The charts that we have within an organization, and I'm not talking about organization charts. But we have all these value streams that we just talked about, which are really important. And then, you know your backlog is the most important chart for you as a developer often, is it?

Andy (28:23):

Yeah. Yes.

Marc (28:24):

So what should we do then to lower the area of uncertainty and our development backlogs?

Andy (28:23):

Well if we take a look back on the submarine analogy, all the times, the quartermaster, who's the, basically the navigate, the guy who's following the navigation.

He's always looking at the chart and thinking, okay, if we drift a bit too far, this direction, we're going to go over there. And we drift too far that direction, we're going to go over there. The officer deck, the captain are coming over regularly like every few minutes, every couple of hours in the worst-case and taking a look that, okay, where do we think we are?

What's ahead of us? What's behind us? What's the left? What's the right? What's our situation here? So understanding where you are, understanding what's around you is absolutely critical. So as your area of uncertainty grows, as it naturally will, you get a feeling for 'now I need to do something about this.'

Now I need to come up. Connect to satellite and get a fix and reduce this because I'm worried about that over there. So when we think about the DevOps and whatnot in the backlog being the most critical thing or the most useful chart for developers, it's important to see the backlog and to kind of understand where you are seeing, where things need to go.

So basically constant regular grooming, estimating, following up, updating comments, updating tickets, really making sure your backlog is in good order, and always up to date helps you as you're estimating: what is my area of uncertainty now? And what do I need to do about it? Do I need to worry about it today? Or is this a problem that can wait for the next sprint?

Marc (30:04):

I think there's one really interesting thing here that you reminded me of, which is okay. I know a lot of teams that they don't have a definition of ready, and they basically learn what's in the backlog during the sprint, these kinds of things.

A few teams that I see that have dedicated estimation sessions to just kind of blast through a backlog playing planning poker, or what have you to get like an up-to-date understanding of how well they know what those items are. No discussion, just planning poker, high man wins, get those estimates out there.

Okay. So let's assume that we're constantly grooming, we're breaking our things down. We have a dedicated estimation sessions and the team gets a really good knowledge of what is in the backlog at all the time. That I used to think of state of the art, but the interesting thing is how often is the more left part of the fuzzy front end, revisiting what's in that backlog and what those priorities really could be from a customer's perspective. Yeah.

It's like, how often are we kind of grooming the front end of the backlog to understand the customer need, what are the problems that we need to solve today?

Andy (31:24):

Yeah, exactly. Exactly. So if we're not looking at the backlog from all different angles, then we're just looking at it as a developer would, we're not really getting this overall system view.

So as I said in the submarine, the quartermaster whose job it is to do the navigating, he's always looking at the chart. But the officer of the deck comes over. The intel officer comes over. Communications officer comes over and the captain comes over. Everybody's looking at it every once in a while. And so with the backlog, we think, well, that's something that developers need to worry about, but it's actually something everybody needs to know about.

And everybody needs to follow because there's so many different perspectives that lead into what's important at a certain time, what needs to be kind of focused on.

Marc (32:12):

Yes, about empowering developers. One of the really interesting things here as well is that oftentimes we see, very well-formed solutions that are coming as input into developer backlogs.

And they're often so specific that they don't really leave a lot of room for interpretation. And that sounds good. Doesn't it? It's like, well, you know we give really specific things to our developers and that sounds good, but where it's not good is when there's not any room for interpretation, then where does the novelty come in?

Where does this 'innovation thing' come in that companies are all very eager for? So what we end up with is that the engineers who have the most technical knowledge on, and problem-solving skills on how to actually achieve the value that is being asked for, they oftentimes get really, really narrow requirements.

Put this button here and it doesn't really explain this third part of the user story, the 'so that,' where the customer value comes from. I want to do this thing so that I get this value. So I think that this complex problem solving is kind of like one of the shift right kind of things.

Andy (33:35):

I was on a project a few years ago. And the developers were told you need to develop this kind of tool, which does X and Y and Z. And they went and did it. And it was not entirely useful. And we put it into the system and was showing it with a customer. And the customer says that's not what I was looking for at all.

I need something that does this. And the developer said, oh, that's what you need. Okay. And came back and changed the whole design and said, we're going to do it this way instead, and brought it back.

'Perfect. That's exactly what we want. None of our competitors, none of your competitors were able to do that. We haven't seen this anywhere. This is exactly what we need. Absolutely perfect.'

And it was because the developer who had the best view of what he was able to do with the system or with his component understood what the customer need was. And often we try to filter that out and say, this is what this particular section of code needs to fulfil.

Instead of 'this is the need that it's filling for the customer'. Both are useful, but you have to have some kind of balance there and make sure they're both addressed.

Marc (34:48):

And there's one of the things that you remind me of is this fear to show things that are unfinished, especially to a customer who's paying a lot for a thing. Yeah.

And this demo practice is something that also can get much better and build trust over time, where we can close this loop between what the customer is asking for and what the developers are actually building. We can build trust across the organization that we're able to predictably move forward and things.

And I still see few companies that actually do this in practice where they're showing demos at the end of sprints to customers in order to elicit new feedback, that then becomes new backlog items that get groomed and estimated and prioritize.

Andy (35:39):

Yeah. So if we think inside our organization, we have this area of uncertainty that we're not sure exactly where we are in this area. The customer's area of uncertainty is like ridiculously huge. They don't know, to get anything, is it going to be on the east coast or the west coast or the Atlantic or the Pacific, this could be coming from anywhere. They have no clue. So the more often we can show the customer, also we can reduce their area of uncertainty, which of course builds more confidence, builds more trust, and it gets the feedback. So we're able to adjust and deliver exactly what they want.

Marc (36:19):

Another interesting analogy here that I was thinking of, which is that. So I cook every day and oftentimes, there’s those favorite dishes that we'd like to cook. I do a mean chili, for example, and there's this kind of like set and forget that one gets when they're cooking something that they're familiar with.

You know that it's a five on the burner for the stove and you know how long to wait until you put the vegetables so that they will start cooking immediately and all these kinds of things. But then every now and again, something goes a little bit differently, right? You're waiting for the pan to get hot and the phone rings. And then you walk away and now the pants on fire, so there's an interesting way of thinking that we need to be looking very carefully at what we're doing all of the time.

Because you never know when a circle of uncertainty is going to kind of jump up and you're all of a sudden not going to know where you are.

Andy (37:18):

Yeah.

Marc (37:19):

So, field issues, for example, coming in, where 'this is fine' but we have everything on fire all of a sudden. So are there any thoughts there about, I think there's something to do with connecting the monitoring back into the system as a feedback loop.

Andy (37:39):

Yeah. So one way to establish where you are and what kind of variants of uncertainty you have, of course, is to monitor and to see where you are. And the more measures you have, the easier it is to say, this is where we are and kind of reduce your area of uncertainty. The tricky part with that is, is this the right measure? Is this measure actually… is this metric that you're monitoring actually providing any value or not? So often, I have seen that there's a dashboard with 10 or 12 different metrics on it, but when you look at them and think, okay, what value do these add?

What does this one tell me? Most of them are useless. So figuring out what metric to look at is the tricky bit. And it's more of an art than a science. There's no one size fits all answer.

Marc (38:30):

So that's really interesting because what I'm hearing is the dashboard, the usefulness of the dashboard is also a drift.

Andy (38:39):

Yes.

Marc (38:40):

And if we aren't looking back regularly at what we're monitoring and how we monitor it and what impact this has, then we never really have the ability to improve things. And maybe we think that we're monitoring things, but we're not actually delivering any value.

Andy (38:57):

Right.

Marc (38:58):

I think that's a really neat thing too, to consider here.

Andy (39:01):

And if we're monitoring, for example, just a really simple example: we're monitoring CPU and memory use. Well, we don't have to monitor it and have it on a dashboard with a chart over time. We know what the maximum is, and if we go over the maximum allowed value, we get an alert and you react to the alert.

So then having that wavy dashed line on a dashboard, doesn't bring any value in you. So when you're looking at a dashboard, which ones are still bringing value to see the trend over time, see the ups and downs, and which ones are just, that should be an alert. If it goes over this, you need to do something.

Marc (39:41):

And thinking back, Andy, I would say maybe half of the organizations that I have entered in the last 10 years had red alerts that were persistent.

Andy (39:53):

Yeah.

Marc (39:54):

In their monitoring systems. And sometimes it could be just a simple thing that all, well, yes, that's been on, but it's okay. And everything else is green. And it's like, well, but it's kind of this broken window thing, it's like once you start to allow the things to deteriorate a bit, then all of a sudden your area of uncertainty is growing and growing.

Andy (40:16):

Exactly. If you don't adjust your signal-to-noise ratio, then it's just too much noise. And you're not getting the clear signal you need.

Marc (40:26):

Wow, Andy, this has been really cool. We talked about always making forward progress and understanding we’re always going to be adrift. The more time between our checkpoints, the more adrift that we can be.

I think we don't really understand that we're also navigating blind in between checkpoints.

Andy (40:55):

Yes.

Marc (40:55):

In between looking at our backlogs, in between updating our monitoring tools. We have to practice in order to build trust and to constantly understand how big our area of uncertainty is and how much is acceptable at a given point.

As we look at the results of our development efforts, we understand that the area of uncertainty growing, and the drift growing between checkpoints. So as things get more serious, we need shorter checkpoints, you know, maybe even daily sprints. Do you have any further thoughts?

Andy (41:34):

I would just kind of maybe summarize what we've said so far that: set and drift will happen. Expect it. Know that your area of uncertainty will be and deal with it. Measure as often as you need to don't let the perfect be the enemy of the good and fail moving forward instead of standing still.

Marc (41:55):

Beautifully said, Andy. Thanks, Andy.

Andy (41:57):

Thanks, Marc. Talk to you next time.

Marc (41:59):

Thank you for listening. As usual, you can find a link to Andy's profile in the show notes. And if you haven't already please subscribe to our podcast and give us a rating on your platform. It means the world to us. Also, check out our other episodes for interesting and exciting discussions. Finally, before we finish off, let's give the floor to Andy to introduce himself. I say, now take care of yourselves and remember, choose a tool that checks most of the boxes and improve as you go.

Andy (42:29):

Hi, I'm Andy Allred. I started my career as an electronic warfare specialist in the US Navy, working on nuclear-powered fast-attack submarines, which is something always kind of unique and gets people's attention. After that, I moved into telecoms and worked in telecoms mostly, or I started on the radio side, but then shifted more to the IT side of telecom companies and work there for quite a number of years. And then recently I've moved into the consultancy space and working as a Consultant for DevOps and cloud projects.

Published: April 13, 2022

Agile Sauna Sessions

Eficode

Subscribe to our podcast

Related tracks

Blog it out loud

Blog it out loud: Who owns your test automation over time?

A question we hear from many clients is sometimes puzzling, almost a little philosophical. As long as you keep developing new features, you need to continue developing the test automation too, right?

Go to full transcript

Sauna Sessions

Dev-environments-as-code on any infrastructure

Loft Labs launched a new project called DevPod, built to help spin up a dev environment wherever you want. Developer Advocate from Loft Labs is here to help us learn more about it.

Go to full transcript