Darren and Marc highlight the importance of going all in on a platform transformation, emphasizing the need for commitment and investment. Join in the conversation at The DEVOPS Conference in London and experience a fantastic group of speakers alongside a worldwide online and live audience made up of practitioners and decision-makers.

Darren (00:07): And I feel like one of the biggest dangers of platform engineering is to try and push this tooling without supporting the culture correctly.

Marc (00:21): Welcome to DevOps Sauna, season four, the podcast where technology meets culture and security is the bridge that connects them. Hello, we are back in the sauna. Hi Darren.

Darren (00:42): Hey Marc.

Marc (00:43): We had a recent conversation, Darren and I, about platform engineering. This is quite a big topic right now, isn't it?

Darren (00:51): Yeah, and it's quite a new topic. It's sort of this separation of DevOps practices from technologies. So it's kind of interesting in our field.

Marc (01:01): And in a recent podcast, I think one of the things about platform engineering that we compare to security as well is DevOps has gotten to a certain point of ubiquity and a certain point of progress to where things like security and actually building a platform that serves the internal and all the way to the external, sometimes customers, is something that more and more companies are really getting capable of doing today.

Darren (01:25): Yeah, I think that's a shift that we've seen relatively recently. It's all about maturity and DevOps, I think. And now the ability to create these kind of internal development platforms and have these teams dedicated to building those is kind of fascinating from a technical challenge standpoint.

Marc (01:43): And there's an interesting word that you added there, or an acronym almost, which is internal developer platform. And the difference between a developer platform or what a platform engineering team does and an internal developer platform is not necessarily something that's quite clear I think for not only some of our listeners, but also some of our colleagues, customers and others.

Darren (02:09): Like we mentioned, it's kind of this evolving field that's sort of sprung up relatively recently. So we have the kind of ill-defined areas of understanding with regards to a lot of it, and that actually leads to a lot of common mistakes that we might run into over time that I think we've definitely seen on this side.

Marc (02:32): That's true, that's true. One of the great things about being a consultant for young and old alike is the ability to see how many different companies have similar problems, sometimes unique problems, usually most of them are common, but how they have had very different approaches to solving those in the past, and how many of them have come through the same problem set and emerged in with different results different approaches.

Darren (03:03): Yeah, that's quite interesting. You used the word unique and earlier as we'd reading a blog, which had the line that no one's problems are as unique as they believed them to be, which I think sums up the disparate branches of platform engineering and DevOps in general quite well, that we just have these varied solutions all solving the same set of problems.

Marc (03:26): Yeah, one of the things that we talked about a lot in our presentations and speaking and conference work for a while was how when the Mars Rover sat down, the first thing it did was a software update, and then the drone is getting software updates, and one of the things that I was saying then is your special case isn't so special compared to that. And to be honest, the tools, technologies, and approaches are something that I think are accessible for everything from enterprise down to smaller companies. So let's get into some common pitfalls of platform engineering. That is the topic of this podcast. And I've seen success stories and failures, difficulties alike, but the first one that I want to really highlight is when a company doesn't go all in on a platform transformation or on platform engineering, there's a couple of different things here. One, in the early days of Agile, to me, like the Agile manifesto, people interactions over processing tools, what ended up happening was that process and tools got freedom and people and interactions didn't really change that much. So the result of this is that every team has its own way of doing things, its own Jenkins, its own tags or keywords, labels in something like Jira. And they created more fragmentation because of this freedom than what the Agile manifesto was really setting out to do.

Darren (04:56): Yeah, I agree. And there's this kind of thing about the not going all in, which to me is a bit, it's kind of the evolution of mindset as you're saying, hasn't like really changed. So you're saying that everyone had their own tools, and this led to this kind of modularized approach to software development over time. And now, people are expected to adopt this mindset of buying something so much larger. They're not just buying a testing tool, they're not just installing a Jenkins, but they're being asked to foot the bill and foot the man hours for an entire development platform to be put into place. So they're seeing this exponential increase in the required investment to get somewhere that they've seen that they already are.

Marc (05:46): Absolutely. Absolutely. And then each silo, and I often think of silos at the most acute level is kind of at the director level. So every team has a reason to do things differently in a special case, every director has a reason that they have a certain business metric that they need to achieve, they have a release for a customer or something like that. And when we go to do a migration into a new platform so that platform engineering has a fresh start, we oftentimes see that instead of a company going all in, every time that it's time for somebody to move on to the new platform, they have a similar set of reasons or at least a similar set of derivative reasons that they're not able to move into the new platform. However, on the other side, when we go all in, like with a kind of big bang approach, it gives everybody an opportunity to sort of shake off the old ways of thinking and move into something that commonly can support everyone. There's this term golden highway or gold path that I like to see a lot where if we give developers something that's so good, they can't ignore it and allow them to all kind of go over at once without the business impetus to necessarily keep the features flowing while they're doing this kind of change and it can reap big benefits.

Darren (07:09): Yeah, and I feel you've kind of struck on something there. This kind of one of the base of my existence really being this, everyone having their own individual special case that like as you know, I'm mostly insecurity and every special case is its own unique nightmare for me. So having these kind of special cases to like within every different section of the organization, it's frustrating to see. But yeah, the idea of adoption being pulling all these together and eliminating them, that's really the, in my opinion, the core of this sailing go all in, that no one can be a special case because everyone's a special case, and if everyone has a special case, no one is.

Marc (07:56): Absolutely. You mentioned security, so security vulnerabilities in platform engineering, we could do many podcasts at depths on this, but what's your take here, Darren?

Darren (08:07): Yeah, I think my take here comes from something actually I heard you say in a meeting several weeks ago, which is that you had frequently heard this idea that people are saying we're so far along with this DevOps thing that now we can start to look at security. And honestly, when I heard that, I had to turn my camera off so I could slam my face against the desk, that was the most frustrating thing I think I heard the whole of last year actually. So one of the most common pitfalls in any engineering project, and that includes platform engineering, it includes software development, it includes basically every aspect, is to think you shouldn't bake security in from the start. And everyone talks about this, but still, I get to hear you say that people think they should get to a particular landmark before they should look at security. And I like to look at it from a perspective of buildings. You don't design a building and then add the front door and then design how you want to keep people out. Because if the front door is secure, but the wall next to it is paper thin, people will just cut through the wall. And that's how you avoid building security in from the start. You think it's like this layer of paint you add over your product, which I can tell you is absolutely not the case.

Marc (09:35): Absolutely. Starting with security in mind, and I have a few things here that I'm not the security expert that you are, but maybe we could talk about just a couple, which is one of the things I like about the big platforms that are many people are turning to get away from the fragmentation that they've had in the past, and I'm talking about GitLab and GitHub as the two of the biggest ones, is the ability to have really good pipelines with security scanning built in, looking at dependencies, creating SBoMs, and basically, looking at these things from the very beginning. So are there some other things in terms of platform engineering and starting from the beginning that you'd like to get in there?

Darren (10:18): Well, one of the things you mentioned is these all in one platforms, which are one of the most drastic ways to reduce the attack surface. Because if you think about having this cloud of disparate tools, then you have attack surface in your source code repository, you have it in your pipeline system, you have it all over. But by pulling them all together into this one platform, you reduce that massively. So you reduce the attack surface, you reduce your risk. But to talk about building it in from the start, the most important thing is to have the platform engineering team have that culture as part of them, like have the ideas of security in their mind, because nine times out of 10, security comes down to a matter of common sense. And we do these processes, things like threat modeling, which are designed as kind of thought exercises where you just sit down with a network diagram of your platform and look through them. And this threat modeling process has been the same since it was proposed by-- actually, Microsoft Stride model was, I think released in 1998, and it actually hasn't changed all that much. But just applying this kind of threat modeling mentality, this kind of secure development mentality is enough to get started in building a secure platform. And the great thing about a lot of these tools is quite a lot of them take security seriously, so they are there to assist in that way.

Marc (11:54): Absolutely. Okay, let's start with that. And there's an end point that was recently described to me by a successful change leader for a large accounting firm. And it was DevOps culture is the output of a successful transformation. And I just thought, wow, what a tremendous statement, and oftentimes when we talk about platform engineering, when we talk about transformation within companies. The cultural aspect is we as Eficodians, we're always trying to make our customers understand that the culture part is equal or greater than the tools and technologies that are going to change. But thinking that having a successful transformation means you've actually transformed your culture and that some people get really annoyed if you say that you're trying to affect a cultural transformation by changing a tool set, for example, or something, which is still a viable path. When many different things change, it gives you the opportunity to change the working culture. What do you think, Darren?

Darren (12:56): Yeah, I think that there is a lot of focus on DevOps tooling rather than culture, but it comes down to a case of tooling is mechanical, it's machinery, and it does what you tell it. If you know how to work the machine, you can get the tooling into place, but culture is people and people are tricky and that's kind of the thing that you're pushing against. And it's good because as I mentioned at the start, the platform engineering, at least to me is kind of this separation of tooling from culture because platform engineering, them being about the actually putting together of the tool in to use, but the culture needs to be in place and I feel like it's going to be one of the biggest dangers of platform engineering to try and push this tooling without supporting the culture correctly.

Marc (13:49): Absolutely. I watched a conference talk on a large Scandinavian flat pack furniture manufacturer, and the ladies that had led the transformation said that culture is what you reward and what you punish. And I thought that was rather severe until I took it at face value. And then I thought about it in terms of like, Jean Kim's work, like recently wiring the winning organization, talks about amplification of feedback. So if we talk about amplification of feedback in a psychologically safe way, then we talk about making sure that when things do happen, that we are able to learn everything that we can from it, not just try to immediately say, how do we prevent this from happening again? But try to really, not just the root cause analysis, but the contributing factors, blameless post-mortem using these types of tools. And then having fast feedback from the platform for good stuff I think is equally important as well as probably many others like we use, office vibe and we have a good vibes channel on Slack for people to reward each other peer to peer. But I think this amplification of feedback is a really good way to reward the type of culture that you are after.

Darren (15:06): Yeah, not only reward the type of culture, but it's the only way to sustain it, especially given what's been quite a recent shift to people working more remotely. It's like a useful tool for supporting the development of a culture to have these specific channels of giving feedback and making sure that that feedback is visible. Because I think especially like we're both here in Scandinavia where there is often a tendency towards being kind of quiet, at least in the finished part of it, being kind of quiet, being kind of reserved and having a place where it's safe to feed, to give such information, to give such praise, to actually, and when things go wrong to safely deconstruct them. This is one of the cornerstones of any kind of culture for technology I would think. So it's equally important in platform engineering.

Marc (16:01): It's funny you mentioned we're we're both immigrants in the Nordic culture, and I think that you can embarrass someone pretty equally with reward or punishment over many things. Some might even prefer the punishment here.

Darren (16:16): Yeah, I think in Finland there's definitely a default for eyes facing forward. No one acknowledge when I've done a good job, but hopefully we can change things like that.

Marc (16:26): I think so. Let's talk a little bit about tooling bias. And I think we've kind of led into this a little bit. So when we're doing platform engineering, there's a lot to be said about the tool that you know compared to the tool that you don't know, but there's also on one side you've got tooling bias. The tool I know is the only one I want to use, and on the other side you've got this thing called analysis paralysis, which means that we look too much at all of the different kind of available things. So also in tooling bias, we have this pets versus cattle type of idea, and I have my pet. Oftentimes the who is the greatest pet that we see in so many of these type of transformations and platform engineering kind of outbound. Thanks.

Darren (17:12): Yeah, I have to say at that point it's Jenkins, and I was actually reading a blog before this post about Jenkins that summarized it so well, and it said the worst part or-- yeah, let's say the worst part about Jenkins is that it works. That if you try hard enough and lower your standards enough and kind of blur your eyes enough, you can almost get Jenkins to do exactly what you want it to do. And because of that, and because of years of this kind of reliance upon Jenkins when it was the kind of peak tool, then there's like this heavy bias towards Jenkins existing in these newly developed platforms. And honestly, it makes me a bit, let's say frustrated because Jenkins is a great tool that had its place, but its place was 2016. And right now we're seeing more of an ecosystem where the pipelines need to be right alongside the source code management. And that's, in my opinion, exactly how it should be. These things don't work without one another, so separating them from each other is just adding a layer of complexity.

Marc (18:26): I can put this perhaps in a simple way that when we are looking at one of the big platforms and we're looking at the pipelines there, within those runners set up by a proper platform engineering team today, everything is gonna be afemeral. That means it is generated from code, spun up, used long enough in order to execute the job and then sprung back down again. That is reducing potential attack surface because you don't have something that's just laying around and has been running for God knows how long, and what plugins are installed there, are they still supported? Do we even know what our current configuration is? Because it was configured by guys that have either since moved on or retired or bus factored or who knows what. And when we do platform engineering, I think one of the greatest things that we understand out of the box is that everything is going to change, everything should change. We should not be afraid of change, and the better we are at changing things, then the better we are at maintaining them, pushing them forward, and essentially having a competitive edge for attracting and retaining developers as well as the product that we're supposed to be building, the value that we're supposed to be creating.

Darren (19:42): Again, as we've discussed before, it's this evolutionary shift, and we understand from like evolution that you either adapt or you don't. So it comes back to this pets versus cattle thing. There is this tendency among nerds like myself to develop sentimentality for tools which are not as useful as they could be because we spent a long time learning how to use them. And now it's like playing a particularly difficult musical instrument. We don't want to put it down and move on to something more simple because the attachment to that, and that's actually kind of a problem when the attachment to these tools, when they become pets and they're not cattle and they weigh us down and we just don't have the heart to put Jenkins in its rightful place.

Marc (20:33): Absolutely. And this leads to an interesting transition. So our fifth pitfall is that trying to build a perfect system instead of building a system that you can operate.

Darren (20:45): This actually started from one of the earlier podcast episodes when I heard Andy Allred explaining that it's so important to iterate. The investment into these systems is huge. So if you are unable to hit the ground, keep rolling and keep moving, you end up in so many forms of deadlock that you're trying to build a perfect system is unfeasible.

Marc (21:12): However, these days, and a matter of weeks, you can, for example, take one of the big platforms into use, get some runners up and going that allow you to build your software, be able to have perhaps a higher level of visibility and maintain ability to what that means, be able to break down monoliths or create some modularity where you may not have had that ability so easily before, increase the transparency that you get by literally using Git, or having a fresh and state-of-the-art branching strategy. And once you are able to get into a modern system that allows you a much higher level of maintainability, then you can start to figure out were those requirements that you had really as important as you thought? Or are you able to now create a template in order to spin up an environment and start adding value or start using your domain expertise in value creation rather than what is it like 40% of developers time has been spent traditionally on maintaining tool chains? Well, why not have a small benevolent team that is maintaining those actively with your development community and let those valuable developers of yours spend most of their time on domain specific value creation.

Darren (22:33): Yeah, and the key there is maintaining it actively and benevolently. And it draws it into this idea of user feedback because I think there's this potential for tunnel vision in development and developing specifically of these kind of platforms where the person who knows the platforms has a specific vision and may want to push that specific vision onto developers who do not see it in the same way. And it's like you were mentioning, the golden highway, the idealized path for development work, cannot be formed without developers. And to be fair, it can't be formed without developers trying to use it first because they will be the ones who hit the roadblocks, they will hit the speed bumps, and they will be the ones who go off the road. So drawing in this user feedback is kind of critical in my opinion.

Marc (23:28): I completely agree. Okay, we've identified five platform engineering pitfalls, and instead of list the pitfalls, we're going to give you the solutions. So Darren, what do we do about adoption in platform engineering?

Darren (23:42): Basically go all in. You can't handle platform engineering if you have a team of people who are not willing to commit, and if you have a team of managers who are not willing to budget for it, both in money and in human resources.

Marc (23:59): All right. So Darren, what about security vulnerabilities?

Darren (24:03): Start thinking about it yesterday, and if you didn't do that, start thinking about it right now.

Marc (24:08): The same as planting a tree 20 years ago or right now.

Darren (24:11): Exactly.

Marc (24:12): All right, how about DevOps culture?

Darren (24:14): It all has to come back with making sure people feel listened to, people feel heard, and people are praised for the good work they do. And also don't feel blamed when they mess things up because they will because they're people, and that's what we do, that's our specialty.

Marc (24:32): All right. How about building a perfect system?

Darren (24:34): I'd say throw out the mindset of building a perfect system completely. You have to iterate. Your perfect system won't be version one, it may not even be version ten. And if you're trained to airdrop a perfect system onto developers, all you're going to do is overwhelm.

Marc (24:50): Right. And finally, tooling bias.

Darren (24:54): Oh, on that, I would say get rid of Jenkins, <laughs> but in seriousness, the all-in-one platforms are the way to go. So using something like GitLab with the built-in pipelines and security tool and GitHub with its advanced security alongside, these are the future of platform engineering. There is no going around it. Having a separate pipeline system is going the way of the dinosaur.

Marc (25:19): Beautifully said. Thank you, Darren.

Darren (25:22): Thank you, Marc.

Marc (25:22): This has been the platform engineering pitfalls and solutions to those episode of the DevOps Sauna podcast. Thank you for tuning in, and see you next time. Goodbye.

Darren (25:35): Bye.

Marc (25:39): We'll now tell you a little bit about who we are. Hi, I'm Marc Dillon, lead consultant at EfiCode in the advisory and coaching team, and I specialize in enterprise transformations.

Darren (25:50): Hey, I'm Darren Richardson, security architecture, EfiCode, and I work to ensure the security of our managed services offerings.

Marc (25:57): If you like what you hear, please like, rate and subscribe on your favorite podcast platform. It means the world to us.