Skip to main content Search

DevOpsConference talks

DevOps is for product engineers too | Lesley Cordero

At The Future of Software conference in London, Lesley Cordero dove into the intersection between product engineering, DevOps, and Site Reliability Engineering (SRE) and explored how they're combined to create a culture of technical excellence and psychological safety, both within a team and across an entire organization. In her talk, Lesley also highlights the practice of Culture, Automation, Lean, and Measurement (CALM) and how product engineering, DevOps, and SRE play a key role. About the speaker: Lesley Cordero is currently a Staff Software Engineer and Tech Lead at The New York Times. She has spent most of her career on edtech teams as an engineer, including Google for Education and other edtech startups.

DevOps is for product engineers too | Lesley Cordero
Transcript

Hello, everyone. My name is Lesley Cordero, - and welcome to my talk on how principles of DevOps and its adjacent disciplines, - like site reliability engineering and platform engineering, - fit into the current landscape of product engineering. And so, for some context behind why I wrote this talk, - it came from a place of needing to find a way - to communicate the value of DevOps and how it connects - to our current context in the software industry. So, I figured this is the perfect place to come and speak about this. And so, because of that, my talk is a little bit more visionary - than it is nitty gritty technical, but I think it's super important - for us to grow an appreciation for each other's domains. And that importance is something that we're all responsible - for making an effort to understand. With that said, I'm going to introduce myself a little bit more thoroughly. So, currently, I'm a staff engineer at the New York Times. I'm the tech lead for the reliability platforms engineering team, - which sits under our wider platform engineering organisation. And so, prior to my current role at The Times, - I was also a tech lead on several teams, - both within product engineering and platform engineering. And one of the main observations over my career, - regardless of the domain I was specialising in at the time, - is that our industry struggles with finding value in each other's work, - particularly when it comes to different roles and specialisations. So, speaking personally, a large part of why I switched to platform engineering - is because I had spent so much time in the DevOps communities - when I was working to improve my products observability footprint. And fundamentally, a lot of that had to do with the fact that - DevOps is fundamentally about culture, right? It's fundamentally about bridging gaps between roles - that might not necessarily understand each other, but want to. And so, truthfully, it's a bit disappointing - when we see things like "DevOps is dead" being thrown around. Oh, no, I heard the chuckles. [chuckles] Because then where does that leave me - and all the other folks who gravitated towards this space? Because it's one of the only spaces that I think I genuinely felt - that the human components of building and operating software - are just as important as the technical ones. And this comes from a place of sociotechnical thinking, - which I won't go into too much here, - but that is fundamental to the type of culture, I think, - both DevOps and platform engineering should be trying to cultivate. And to further emphasise why I think this conversation is important - for us to have right now, it's because operational expertise - is more critical than ever, - but the landscape is shifting for operations roles. And so, I would extend that to say that it's shifting the landscape - for product engineering roles too, right? Particularly with the current excitement around this platform engineering. And so this quote comes from Charity Majors in her keynote talk - on platform engineering at DevOps Days, New York City, 2023. And my interpretation of what she means by this - is that operational expertise is not only very much alive, - but it's actually more critical than ever. And this has to do for a variety of reasons, - including the role of generative AI - and the chronic worry that our jobs are going to be automated away, - or the fact that operations organisations are starting - to be treated as cost centres, - which might introduce an extra layer of job security. But holding this intention with platform engineering and its re-emergence, - she also says that what it looks like is changing. And so, regardless, this tension is our current reality. So, it's important to lean into the messy details of what it means - to apply DevOps principles regardless of your technical - or technical domain expertise. Now, since this talk is about how DevOps fits into the current landscape - of product engineering, it's only natural that we talk about the differences - between the disciplines, which I'll do by talking about - how product engineers focus on business domain end users, - whereas platform engineers focus on the development experiences - that enable product engineers to build robust, production-ready services - that serve those external users. So, to begin, let's talk about user experience, - specifically external business domain users. And so, I don't want to reinvent the wheel. So, I'm just borrowing Don Norman's definition of user experience, - since he's the one that coined that term. And these are the experiences that product and engineering typically owns. And I'll use this definition as a basis for also defining developer experience. Which is the crux of what both DevOps and platform engineering are about. Now, there are two important parts here that I want to emphasise. One, that the user experience should be cohesive, - integrated set of product experiences. So, the integrations that it refers to here reflect what should be - a seamless product experience for external users. And this integration set doesn't just refer to one part - of the product experience, it's how the different parts - of the product overlap and interact with each other. And so, it considers that full user experience, - from the very first time a user might engage with your product, - to the continuous work of providing experience - that maintains user loyalty and satisfaction. And similar to user experience, we have developer experience, - which is also about providing a cohesive set of experiences. But this is where the context is different, and it changes. So, instead of building for external end users, - we're typically building for internal developers - that are building production services that make up - those external-facing product offerings. So, this internal versus external distinction is really important - because the users that platform engineering serves - are ultimately their co-workers or other engineers in the department. And that introduces a new set of product considerations, right? But the technical architecture practices translate well - when we're building developer platforms. So, one facet I want to highlight is the role of supporting culture, - because that's where DevOps becomes particularly relevant. So, throughout this talk, we'll talk through DevOps principles, - making sure to elaborate on how they are applicable across multiple domains. So, since DevOps is about cultural transformation, - it can become a great basis for developing platforms. So, it's traditionally thought to focus on infrastructure, - but as time progressed and the discipline has matured, - we've come to realise that its application goes beyond infrastructure - and bleeds into the full developer experience, - from development to production readiness to delivery. Next, we have SREs, or Site Reliability Engineering, - which emerged from DevOps but is distinct - in that they're focused on the end product service user readiness needs. So, while they can be platform contributors, - they're often not the owners of those platforms. Instead, they can serve as robust feedback loops - that translate production needs into platform requirements. And so, SREs can be found in different spots - of an organisation's structure, sometimes under platform engineering, - sometimes separate, and sometimes embedded. The New York Times personally uses an embedded mode - under our platform engineering. And I won't elaborate too much on the trade-offs right now, - other than to say that I still believe that - SREs should be developer neutral when working with its customers. So, regardless of the platforms that - might be centralised or not at your company, - SREs should still always be prioritising the actual production needs of users - and provide that fast feedback loop for their platform partners. And lastly, there's platform engineering, - which will take over a lot of the rest of this talk, - though still in the context of DevOps. So, platform engineers are focused on building and maintaining - centralised platforms that enable robust developer experiences - for product engineers building external end-user products. And so, this is where I'll take a moment to talk about a common pitfall - that we often see within platform engineering, - which is that platform engineering is not equal to infrastructure platforms. And I think this is also part of why we see some of the conversations - about DevOps being dead in lieu of platform engineering, - because too many of us are operating under the assumption - that the only type of shared platforms that a company needs - are ones that are limited to infrastructure. But, for example, we should also be thinking - about how platforms can aid the service or feature development cycle - during the actual development phase of software. That might mean having language runtime platforms - that support the development of standardised Node.js services, - for example, which is a pattern that I've seen in previous employers. And each of these can be decomposed even further. Though again, tying to principle number two, - this decomposition should only happen when there's a genuine need for it. So, for example, if our organisation decides to introduce - a new standard language because they need a more performant language, like Go, - that might be a good time to decompose your runtime platform. Maybe now you want to add some level of specialisation to the language. And in the infrastructure platform context, you might see that - further broken up into domains like cloud infrastructure, - CI/CD, or observability. And the same techniques that we see in domain-driven design - are ones that we can reuse in the platform context as well. And so, to connect this back to DevOps and platform engineering, - and to connect this to how platform engineering and DevOps are practices - that product engineers can utilise to build their business domains, - there are also even product platforms, which might refer - to specific end-user product domain or, of course, services platform. And the process of defining and redefining the technical boundaries - of platforms is what I refer to as Domain Driven DevOps. And so, Domain Driven DevOps takes the principles and practices of DevOps - and applies them in any software engineering context. So, again, we're not just talking about infrastructure here. And part of the reality of software engineering is defining and refining - those technical boundaries of our software, right? Particularly if we're trying to scale fast changing software. And so, being responsive to the changing needs of our software and users - is going to be crucial for our success. And this last part of the definition is more of a nod - towards the most important aspects of any organisation - that we often tend to not think about more intentionally, - which is sustainability. More specifically, I define sustainability - as the continuous practice of operating in a way that enables - short-term growth opportunities while ensuring long-term success. So, there's a lot to unpack here, so I'm going to break it down again. So, first, sustainability is a continuous practice. Even if we spend a lot of upfront time thinking about - how to ensure long-term sustainability, circumstances change and often quickly, - so we need continuous avenues to ensure long-term success. Secondly, enabling short-term growth opportunities. Sometimes those risky short-term growth opportunities - are what lead to that long-term success. The emergence of bundles of tools has worked really well for some companies, - including The Times. I'm sure at least some of us have heard of Wordle - or New York Times Cooking, Wirecutter, et cetera. And so, those are really revolutionary decisions for us - that have enabled us to sustain our products - despite a very changing political climate. [chuckles] So, we don't want to give that up, - but putting on my reliability management hat, - we also need to prepare for the risk of those opportunities. Which leads us to this component of enabling long-term success. So, we frequently see companies take their core business for granted - in the name of growth, right? But for every successful growth opportunity, most opportunities do fail. And so, preparation for this type of risk is essential. So, now that we've defined this goal, which is organisational sustainability - and how platform engineering can lead to that, - let's head back into the principles I mentioned. So, since DevOps is a very principle-driven discipline, - we'll walk through one of the primary frameworks popularised by Jez Humble, - who authored multiple books within the DevOps space, - including the DevOps Handbook, which I believe - Patrick is also a co-author of, Accelerate and Lean Enterprise. Patrick is speaking right after me in case it's lost on you guys. [chuckles] So, most of us might have heard of the CALMS framework, - which is basically a framework of principles - that should be the core of DevOps organisations. Now, while five are all easily applicable here, - I'm going to focus on just four. More specifically, culture, automation, lean, and measurement. So, starting off with culture, the CALMS framework tells us - that DevOps drives the culture of continuous improvement - and reduces silos by intentionally sharing knowledge and feedback. And the same here is true for platform engineering, - but I'll talk about it more directly in the context of community. So, in DevOps, we often talk about breaking down silos, - and the way that we do that is by sharing knowledge. And to share knowledge ultimately means to connect. And this is why I also eliminated the sharing principle - in CALMS framework, because I think it's too intertwined - with this kind of culture, that we need to cultivate one - that emphasises a collectiveness without sacrificing individual needs. And this mindset is holding both the individual - and wider communities in tension. Well, it's absolutely lacking in our industry and wider world. And so, it's easy to forget about the importance of collectivism - when it's constantly being reinforced by outside forces. But that's why it's particularly important for us to be intentional - in the ways that we work and connect with each other. Now, connection and communication are keys for preventing - the silos that would hinder our ability to make continuous progress. When we're talking about organisations, especially as our organisations grow, - the most effective way to manifest this culture of sharing is to think about - how we can cultivate a strong community that fosters this culture at scale. Because ultimately, the opposite of isolation - is to be in community with other people. And the reason that this is so important is because more than anything, - learning through sharing is the most sustainable advantage - that we can give ourselves. So, this quote is by Andrew Clay Shafer. He's often known for being the co-founder of Puppet. And he said this in his talk about socio-technical systems - at DevOpsDays Amsterdam. And the way that I interpret this to mean is that- because our industry is always changing, being able to keep up with this change - is the biggest advantage that we can give ourselves. But to do that, learning needs to be part of the organisation's DNA. And so, while I agree with him, I like to modify this slightly to emphasise - that communal learning is the most sustainable advantage. Because again, while our individual growth is important, - if this knowledge isn't being shared intentionally, - we introduce singular points of knowledge. So, like our technical systems, humans are not supposed to be 100% reliable. So, we shouldn't be putting anyone in the position - to be singular points of knowledge. This is how silos are created. And this is ultimately how silos become an organisational pattern - that hinders our long-term sustainability and the way that we share information. So, in other words, communal learning is what provides the knowledge redundancy - needed to sustain both ourselves and the business. Next, we have automation, which improves our software delivery process - by reducing human error, improving efficiency and enabling faster delivery. So, this means thinking critically about the type of work - that doesn't require business-specific knowledge and figuring out - whether that work can be consolidated into software - that's managed primarily by platform teams. And so, in this, we can reduce the required cognitive load - that product engineers often have to indulge in - by managing all aspects of their software. And the type of work that's important and can be consolidated away - in an automated or centralised way is work that's repeatable or manual, - which is what SRE often refers to as toil - or what platform engineers or product engineers - might refer to as boilerplate software. Another aspect of platform engineering is how we should be explicit about - improving efficiency by leaning into solutions built by third parties, - whether through vendor solutions or open-source ones. The reason for this is because we need to reduce our own cognitive load - and maintenance burden just as much as product engineers do. And to provide some history on the lean movement, - its principles focus on how to create value for customers - through systems thinking by creating a continuity of purpose - and embracing scientific thinking. This principle is very much intertwined with the cultural principle as well, - but I kept it distinct because of this idea of assuring quality at the source. I don't have a dedicated slide for this, - but in the same talk that I mentioned earlier by Charity Majors, - she mentions the role of seniority in platform engineering, - more specifically whether non-senior or even non-staff-plus engineers - have a place in platform engineering. And I still haven't made up my mind on how much I agree with this, - but I do think there's some kernel of truth here when Charity said - that there isn't enough room for non-staff-plus engineers. And it's uncomfortable to say this, because what reasonably good person - doesn't want to be welcoming to junior folks, right? Especially given how I began this talk about emphasising - the positive impact it had on me far before my staff engineering days. Yeah, so I began this talk by saying about the positive impact - that this space had on me, right? And it's easy to say when I'm already a staff engineer - is basically what I was trying to get at. But I think the way that I've come to reconcile this is that - platform engineers are supposed to be stewards of quality for our platforms. And that it's supposed to encompass basically what it means - to build software at our company. And I think in that, that is the unique place that staff engineers can have. But at the same time, they have to be intentional about how we mitigate - the unintentional consequences of emphasising seniority. And ultimately, I think that has to do a lot with being able to invest - in less senior folks since they're ultimately the future of this industry - and because that's going to keep our industry as a total sustainable. And so, kind of exemplified by the state of layoffs these past few years, - there's a strange underappreciation - and aversion to investing in less experienced talent. And at some point, that's going to catch up to us. And so, what do we do in the interim? I think both staff product and platform engineers hold the responsibility - for investing in less experienced engineers through mentorship, - leading by example, thinking socio-technically. But one distinction between product and platform engineers is that - platform engineers are often best set up to centralise operational - and platform knowledge because the very nature of centralisation - means that platform e