Efficiency unleashed: The Ericsson playbook for embedded system development | Ericsson
Discover how Ericsson transformed its embedded system development with a decade-long journey from annual releases to agile weekly deployments. Principal Developer Kristofer Hallén and CI/CD Architect Conny Wickström reveal practical strategies for boosting efficiency, decoupling tools, and enhancing CI/CD pipelines in highly regulated environments. Gain actionable insights for improving software delivery and orchestration in complex embedded systems.
Transcript
[intro jingle] [Conny Wickström:] So, hello, nice to see so many of you here today. Since I guess you have rested your arms from the morning exercise, - we're going to do that again. So, we ask all of you to raise all your hands, - or one hand is enough and keep it up. And if you work in a software company or a project - with more than 100 developers or testers, - keep your hands up. Okay. If you work in a project or company with more than 500. Ah, still quite a lot. That's great. And if you feel that you struggle with speed and efficiency - in your software development flow. And it's okay to raise hands if you've put them down. [laughs] Okay, so that's good. We even got a few more hands up again. Great, you can take down your hands. It proves that, at least, you're in the right room, - because we're going to talk about the challenges and what we have done. Not necessarily solving everything, - but at least working with the challenges. And as you know, we come from an old and large - software and hardware company actually. So, let's get started, and Kristofer will tell us. [Kristofer Hallén:] Some context, I'm Kristofer, this is Conny. We are from Ericsson. We are from the mobile part of Ericsson. We're making these things that connect your phone. Without us, you probably wouldn't have a connection today. So, mobile networks is what we're doing. Ericsson is quite an old company. In 1876, started out repairing phones, building some of the first phones ever. We're still here. We're a world leader in mobile networks. So, during that time, we've had to reinvent many times - from cables and stuff, and now into a lot of wireless things. To give you a bit of a context, and that's also why we're asking about the size, - we're working with a lot of things at large scale. So, a lot of people, a large global market. But we also work a lot on custom hardware - and many different platforms to run things on. And our products are highly regulated. A lot of almost extreme requirements. You are always supposed to be able to call an ambulance if you need it. It's not supposed to go down. And we have a lot of software on these systems and many platforms, - but for us, we always come back to this, it's not only software. In the end, there's always an antenna. We need to relate to physical world. So, that's a bit of the context where we will tell the story today. I'm Kristofer, I've been a developer - in both telecom and automotive, architect, but more and more - I've come into development environment, CI/CD, release, supporting developers, - making sure they can actually bring a product to customer. [Conny:] Yes, I'm Conny, I worked as well as a developer. I started in the 3G era, - and I worked through 3G, 4G, 5G and 6G. And now, I'm mostly focusing on CI and CD and tools - and building systems to support large development organisations. So, that's us. [Kristofer:] Starting with our story here today, - the learnings we will talk about. We had a reason to change. Our title here is Efficiency and Speed. Somewhere 15 years ago, we were developing 4G. We ended up being not fast enough. So, we came from a situation where we had every new generation - of mobile networks, 2G, 3G, 4G was developed - as a separate product in its own context. Delivery of software maybe once a year. That started to change. So, product changed, we now wanted to have many Gs, - generations on the same platform. We had expectations to deliver much more frequently. So, I still remember that our release organisation practically said, - "We give up, we can't handle this anymore." They used to collect the information and mails and documents - and tried to put it together. Didn't work. So, we needed to reinvent. We needed to do that in this scale I've talked about, - hundreds, thousands of people, many mobile generations - and on the same platform and deliver much more frequently. Starting out then with the challenges we had or we ran into. [Conny:] Yeah, we will highlight a few of them. Of course, there are a lot more, but a few of them. And as Kristofer said, there is always an antenna, - and there is always a lot of hardware. Even though we are moving to cloud-based products, - and we're kind of moving with the IT industry - a lot more today than we did in the past. But there are still base stations which talk to your cell phone. And they're usually on the roof of buildings, - in basements, in the subway, - out in the forest somewhere, out on a mountain somewhere. So, it's usually very expensive to go there and replace this hardware. So, the hardware needs to live for a long time, - maybe like seven, eight years, which is a long time for an IT-like system. And of course, we need to support this hardware - with software during that lifetime, - and as you know, it might mean that we even introduce - a new G during the lifetime of this hardware, - and that usually needs to fit on old hardware. So, we need to test a lot in-house on old hardware, - new hardware that we'll continuously develop. But we also need a lot of test equipment - since this is a real-time, or at least near real-time system - for signal processing, and a lot of things that we need to verify. We even build radio chambers, like big rooms, several square metres, - that are isolated for radio waves, - and we build numbers of them just to test our radio spectrums - and test our signals and how the phone connects to the network. So, all of this is very costly, it takes a lot of floor space, - and of course, it takes a lot of time and effort - to manage and keep it running. So, that was one of the biggest challenges. Another challenge is very much connected to the first talk of the day, - that over time, software systems tend to get... We introduce a lot of dependencies, - we have software that was written 15 years ago that are still running, - we still have, as you know, with your phone today, - you can still do GSM or 2G, - which was introduced 35 years ago, and it still runs. And so, we have a software or code base that can do 2G, 3G, 4G, 5G, - all in the same code base, and it needs to evolve. And over time, of course, we have collected a lot of technical debt, - a lot of dependencies that are hard to get out of. And with, let's say, thousands of developers - working on this system continuously across the globe, - it is a pain to get time to merge your commits - or to make sure that your commits don't break - anything else that you didn't think of. So, that's a different challenge that we have. And then, if we take the third area that we will talk about, - is what we just heard from Atlassian, with information. If you have a large organisation with many developers - and a large system that you are developing and need to deliver, - of course, information is very, very important - to make the organisation move at a decent pace. And this has proven to be a lot harder than we think. So, we'll talk a bit about that as well. [Kristofer:] It feels like we are connecting to the other speakers here. So, we were too slow. We realised that there were some main areas - where we really needed to do something. We now call this the playbook. Of course, we didn't just start by writing - all the solutions down from the start. We didn't do everything right from the start. This has been more of a journey, and here, we've summarised it. Also, these things that we presented here, - it's not like you can solve them in isolation many times, - they are connected. But this ends up and what we're trying to describe here is - the choices we made that made sense, - and then continued to make sense, and that we then still believe in, - of course, we're still learning, still getting new challenges, - but here we'll try to show what we have done - and what we think works for us. [Conny:] So, now we have the happy people. So, what we have worked with, and when it comes to hardware and equipment, - of course, we have physical equipment - and virtualised test tools, or even virtualised products. But to manage a large pool of these resources, - we addressed it in two different ways. One, when we work with legacy verification, - keeping things working as we develop new things, - or we change the system, or we fix things, - it's important to test legacy continuously, - every day, every day, the whole year. And for legacy and compliance type of testing, - we like to focus on a limited set of configurations, - configurations that are representative - of a large part of all our customers' installed base. Not everything, we don't test all corner cases, - but it represents a large part of what we have out in the world. And then, by standardising and keeping it to a smaller set, - we can pool things, and we can run it 24/7 - with a very high utilisation - to get as much test per invested Swedish krona. But then, we have different use cases where we develop features, - or we troubleshoot a problem somewhere, or we do exploratory testing, - we want to try corner cases. And for that, we usually need configurations - that are tailored to that feature that we develop - or tailored to that type of problem area we're trying to troubleshoot. So, then we don't want to work with these more, let's say, static - and high-performing types of configurations. We want to build them on demand with a lab-as-a-service - and software configured telecom networks basically, - that we can spin up quickly and easily - and have them short lived for a certain purpose and then tear them down. And the challenge is combining that with these large capacity pools. But that's two different scenarios, and one will not replace the other. And that's important to keep that, you have two types of environments. Camille talked about, - we have a lot of legacy and tendencies, - and we worked very, very hard, and we work continuously - with decoupling the code base, - because we want to make sure that we don't have too many developers - on each repo or on each code branch, - because that's a learning that you very quickly end up - with a lot of developers that want to commit changes, features, - fixes, improvements, but they are short on time, - or the time window to get your time slot to merge things - gets very short if you are many people on the same code branches. So, that's one thing that we want, to split the code base up. A second thing that we worked a lot with or work a lot with continuously - is this system decoupling. Since a telecom system is a large system, - we work a lot with managing interfaces between components. How can we give them independent life cycles? Because that is also valuable when it comes to testing these components. We can define functional requirements on a component. We can define characteristics requirements on a component. And we can test them in, to some degree, - not completely, because we still sell a system, - so we need to test them to some degree - on a cheaper, faster, more easily accessible environment. So, those are two of many aspects of decoupling that we worked a lot with. And then, when it comes to information system, - we heard that, and apparently, there's a lot of vendors - that sell different types of information system, - and we have maybe arrived at the conclusion currently, - that might change over time, - but it's not necessarily most important - to have the one and only information system. It's important that people know where to go for what information. So, we have a number of different information systems - but for different purposes, - and it's very obvious where you should turn. And it's also very important that data in those different systems are coherent. So, you can compare, what you see from over here - relates to what you see when you do a release. Your commit is consistent with what's in the product - when you release it, for example. So, that's our learnings. And the last thing that we want to throw in a little bit, - again, we heard on the Atlassian talk just before, - that a tool itself will not save you. You need a lot of culture on top of it. And I think that was very, very timely that we had Atlassian just before us, - because a strong development culture and leadership that we just heard - is more important, I would argue, than the tool you use. The tools are enablers, it will make you more efficient, - but it won't solve the problem for you. And that's something we have learned, - and we like to maybe highlight two things. This is also an area with many different aspects, - but especially, embrace change, like we heard Camille talk about. CI and CD and DevOps is an ever-changing story. You don't reach a certain goal and you're done, and you can go home. This is a thing that changes every day, every year, continuously. And you have to follow the products that you are supporting, more or less. And they always change, and they always get better. And the second thing is, celebrate your successes. Don't forget that you actually do a lot of good things, - because it's so easy to forget about CI and CD and your pipelines - and your information systems when everything works. But as soon as something breaks, as soon as you push something bad, - then you will be on the top of everybody's speed dial for some time. Then, you will be the most unpopular guy in the organisation. So, celebrate when people don't call you, - because then you've done something good. Don't forget that. [Kristofer:] Going back to this with speed efficiency. Can you be a large company, with a lot of people, - and still be fast, or do you have to be really small? Our conclusion is that, yeah, we can be big and fast, - but also, especially when reflecting on this, - what we should actually say is that - you probably need to define what speed is for you in your context. So, some of our reflections here, - one type of speed could be deployment frequency, DORA-like. Can you do things very frequently, ship stuff? And yes, we did, we had somewhere around yearly releases, - now we can bring out software every day. That we think is a high speed for us. But it's one aspect, and that might not really show customer value. So, we can look into other things, too. [Conny:] Deploy frequency gives you time slots to fill with value. And value is many times features that you can monetise in some way. It could be capacity in a system for us. How many cell phones can you handle? How many continuous 8K video streams can you stream - to your phone or whatever at the same time? So, speed could also be how much customer value, - and that's also noticeable value, - because if you look at your phone today, for example, - you get pushed out a lot of changes, - and you're asked to reboot it every week or so or update apps. But at the end of the day, how much value do you actually get out of that? Of course, you get a lot of security updates, - so we shouldn't forget about that, but how happy are you after each reboot? Do you really see a big game changer? So, it's important to ship often, but you also need to fill it with value. And the third thing, I think we heard it from Camille again. It was really good, [chuckles] all these talks before. This course correction that sometimes you have to adapt - to a changing reality, changing customer needs - from idea to deployable value, that's also a notion of speed. How fast can you change and adapt to changed needs? [Kristofer:] Summarising our playbook, what we did during this journey, - probably, we're not done and never will be, - but pooling, some kind of standardising, - making sure you can get things, but that they are available - in the pool when testing resources, - trying to find ways to decouple people so you don't have to wait for others - actually bringing information, - or this, making sure you know where to find the right information, - and adding some culture part, especially for us, - as Connie said, when we usually don't notice when things work really well. That has brought us speed and some different dimensions, - and some we still need to work more on. I have not mentioned that much about tools, - and we have done [chuckles] some reflections here. We've also gone into if there's one tool to solve it all. We've never really managed to do it, but we definitely see tools - as important enablers to be able to, for example, do these kinds of things, - but they bring a capability, - and it might not be that it's only this, - also kind of looping back, not one silver bullet, - but you need good things that give you good capabilities, - and of course, you can't just use anything, - you still need to balance with competence - and a budget to actually run stuff. And then, we're getting to the end of this. I would be very happy to have some time for discussion now, - if someone wants it, or if you grab us by the coffee or challenge us - or say that you would like to discuss and learn more from us. So, thank you. [applause] [outro music] [music stops]