Skip to main content Search

Bruce Lee on DevOps: Master your tools, deliver better software

Bruce Lee - Photo by Yaopey Yong from Unsplash

Bruce Lee once said, “I fear not the man that has trained 10,000 kicks. I fear the man who has trained a single kick 10,000 times.” In martial arts—and in tech— there’s always the temptation to chase what’s new. Another move, another weapon, another shiny technique that promises an edge. But that urge often distracts from what really matters: mastering the fundamentals.

Overall, the martial artist can lose sight of the necessary groundwork and the required precision. Training more ways to kick is useless unless you have brought the kicks you know to a level where they are really useful.

Training more stuff instead of perfecting existing skills means that the martial artist is wasting his precious training time. It might actually be better for him to use the time and participate in a completely different sport than to be distracted from his main goal: Get better at martial arts.

I am not excluding myself from this. “Have you seen the Wu Dang Eight Immortals Sword ‘Ba Xian Jian’? I absolutely need to learn that!” When thoughts like this enter my mind, I need to have a friend nearby who reminds me of Bruce Lee.

The equivalent of 10.000 kicks in tech companies

In most modern DevOps and engineering teams, new tools are introduced too fast and without enough consideration, often to the detriment of the customer-facing product. Time and resources are limited, and spending them on the wrong topic directly impacts the product quality.

A real example is a team that was running a few, very small services on VMs. They were small Python scripts inside Docker containers running on a stable Ubuntu. Maintenance and updates could be done by literally anyone with a basic understanding of Ubuntu and Python, and were fast.

One day, the team decided to move these services to Kubernetes in the cloud simply because Kubernetes is awesome. To do that, additional layers of automation were introduced using tools that were not widely known within the team. Then, to make use of the awesomeness of Kubernetes, the VMs were replaced with a Virtual Scale Set and redundant network storage, adding a fair bit of Azure to the team’s portfolio. To optimally monitor the services and Kubernetes, the default monitoring backend was replaced, adding yet another database that was previously unknown to the team. 

At this point, solving a problem with the small services required so much additional knowledge compared to the previous version, that effectively a knowledge island had been created because most of the team just did not have the time to catch up. Further development was slowed down, and troubleshooting was a serious topic that took hours or days. 

All of that for no apparent benefit to the product and a steep drop in productivity. 

Another very typical pattern involves Jenkins: Imagine a company that has the always popular Jenkins in the typical, outdated installation. The team decides to move to GitHub, but because the old pipelines are complicated, they never get the time to move all of them. So now there are two pipeline systems in use. At one point, a few other teams move away from GitHub to Azure DevOps, and this encourages the hipster team to make a case for their own solution, which nobody has ever heard of but supposedly solves all problems because it is new, small, and shiny.

At this point, Jenkins is still a problem that requires exactly the same amount of dedication, and on top of it, the teams have created knowledge islands and barriers.

Should we all go back to Perl, then?

We could fill hundreds of pages with examples, and they would all sound familiar. 

Adding new programming languages to the stack, adding new monitoring tools, adding new cloud providers, adding new automation tools, migrating from one tech stack to another because the new one promises to be better, but you actually do not know. The list is long.

I am not advocating for only using Perl on Unix just because that combination gets all jobs done. That is too extreme. But in many cases, an addition to the tech stack or a migration to a new stack does not yield the desired results or has unwanted side effects. And you have to admit that those old-school Perl/Unix guys know absolutely every little corner of their environment, which is something nearly nobody else can claim.

I worked in several companies where the tech stack was very limited for various reasons, and the effect was that people understood the code, infrastructure, and product across team boundaries. Those were the only truly agile companies I have seen. 

As a consultant, one of my litmus tests to find out about the maturity of an organization is to look at how many tools they have in place. Are they using a small selection of tools to the maximum value they can get out of them? Or do they have a sprawling tool anarchy? The answer is placed on a gradient, of course. There is no black and white in real life. 

Why not add new tools

A short list of things that can be worse after adding something new to the stack includes complexity, productivity, maintainability, team focus, and knowledge sharing. 

Complexity: The silent killer of modern engineering

Complexity erodes productivity faster than bad code.It’s dangerously easy to let your DevOps toolchain grow until nobody fully understands it.  A new tool does not come at zero hidden costs. Additional requirements to the infrastructure, new types of databases, new build systems, new package managers, building knowledge, sunsetting old tools, migrating data, all these and more are hidden costs in addition to the new tool itself. 

Someone said that reducing complexity is the most important work of any senior engineer, and that is absolutely true. Most modern companies have lost the fight against complexity and are no longer in control. That makes them slow, reduces the stability, troubleshooting takes too long, and all development is expensive. Their solution is usually to add a few more layers of complexity, but what they should do is reduce the complexity.

Productivity: The hidden cost of introducing a new tool.

There is an obvious cost to introducing a new tool: It takes time.

Additionally, there is a hidden cost in the form of increased cognitive load because the increase in complexity requires engineers to split their efforts. Maintenance costs are doubled, time to read documentation is doubled, and engineers need to remember when to use which tool and which data is stored where. Those are small things, but they add up. 

Consider the typical case of a company that has several knowledge bases. Every time that knowledge is not found because people look in the wrong tool, a direct loss of productivity occurs, and additionally, there is a potential to implement the wrong thing. 

Productivity can really come to a crawl when the complexity of the system becomes too great.

Maintainability: If it’s nobody’s job, it will break.

Someone has to maintain the tool. That might be an internal team, or your company decides to outsource it to another company. But either way, there is a direct cost in time and money to maintain the new tool, and this cost is not going away anytime soon. Even for SaaS solutions, once they get integrated into the development workflow, the option to shut them down on demand is purely theoretical.

IT Ops has a bad reputation for making the maintenance costs visible by clearly demanding the resources for any system they have to maintain, which is actually a good thing! When nobody has the full understanding of the tool to do maintenance, regular admin work, or updates, it is only a matter of time before the shiny new tool turns into a big problem. Even if someone has a full understanding, they still need to be allowed to spend time on the tool.

The common case where a happy evangelist for a product advocates the introduction of a tool and is then the sole maintainer of it will end badly when that person leaves the company or is rotated to a new team. When someone tells you that they can maintain a new tool on the side, that is a clear warning sign that something is going to blow up in the future.

Team focus: Simplicity keeps teams aligned.

As an example, having one build system or one monitoring system means that it will always be in focus. Problems or updates will be identified and taken care of when they arise. As soon as there are multiple such systems, engineers and management will find ways to evade the unpopular work of fixing things and instead do the more rewarding task of rolling out features in the other tool. Data will be forgotten to be updated, data gathering jobs consume resources while they are no longer required, and productivity gets lost when engineers have to switch their focus between different tools. Maybe worse, especially for monitoring tools or knowledge sharing tools, managers can make the wrong decisions because they tend to focus on a small number of tools only and get their answers from the wrong tool.

A subset of this problem is the topic of the owner. A team (not a person) must be the owner of the tool with regard to administration, updates, and general use cases. This team must keep a close eye on usage and ensure that the tool is not used outside of the specified and agreed-upon purpose. Without such an owner, unfocused work will begin with the tool, and over time, all sorts of irregular use cases will be covered by the tool, adding a lot of confusion to the overall engineering team and making maintenance difficult.

Without a clear owner of the cloud Repositories and no clear guidance on how to use them, one company found out that it was about to hit the storage limit for repositories due to one team uploading lots of binaries. Fixing this and moving the binaries to Artifactory was a lot more costly than using Artifactory in the first place. 

Knowledge sharing: Shared understanding beats individual expertise.

The knowledge of how to use the new tool and how to maintain it must be shared and kept up to date in all relevant teams. This might be a visible cost or just appear as delays on tasks, but it will always be there. 

A special case is a new programming language: A good engineer can catch up to any new language in a few days, but it will still take months or years to be really fluent in it, and up to that point time time-consuming refactoring tasks will pop up to adjust the code of the new language to the existing quality standards and best practices.

The question of how to use a new tool in conjunction with existing tools must be answered carefully, lest wild-west usage makes it hard or impossible to track the data flow. Consider the binary example above: This is clearly not a good use of a Git repository. A lot of new hires had to spend additional time to learn about it the hard way, researching where data is stored took longer because it was not stored in the obvious place, and storage limit monitoring (if it had been in place) would have needed to know that there are two storage types.

But AI is going to fix it!

While this opinion is popular with many companies that have lost the battle against complexity, it is also wrong. Ultimately, adding AI on top of chaos is just adding another layer of complexity, where you have even less control. If your complexity has grown to a point where no human team has a chance to understand what is going on, you are sitting on a ticking time bomb.

I highly value the use of AI tools in my daily work and recommend using them to get a hold of your complexity and reduce it instead of blindly adding to it.

How to stay focused

Whenever someone wants to add a new tool to your stack, pause and ask:

  • Do we really need it? 
  • What’s the total cost—including maintenance and learning curve? 
  • Can we get satisfactory results by utilizing the features of an existing tool? 
  • What is the actual benefit to the product?

Then, after a while, ask the same questions for every tool you already have and prune your stack the way Bruce Lee trained his kicks—by focusing on what truly works in the fight.

When you actually do introduce a new tool, have an MVP (not a POC)  first and ask yourself the questions after the MVP. You are likely to have gained new insights, and there is a good chance that you will find out that you do not need the new tool.

Stop training the 9.9999 kicks in favor of the one that is really going to work in a fight.

Published:

DevOpsAI