How to avoid bad metrics in software development

Agile metrics provide insight into productivity through the different stages of a software development lifecycle. This helps to assess the quality of a product and track team performance. Good metrics are invaluable, but bad metrics can make an agile team’s life miserable. In this blog, I will look at some of the most common metrics mistakes and advise how to avoid the pitfalls.

What is the point of metrics?

What results do we want? How do we measure success? The possibilities are limitless, and it is not straightforward to choose good metrics to track.

A good metric measures something of value and gives you information that enables value-producing action. Conversely, a metric you can’t act on is simply a vanity metric.

Bad metrics abound, and they can have serious drawbacks, such as:

They measure something which is not relevant to value creation
They have a terrible impact on team motivation
They are subject to gaming

Don’t measure the wrong things

For instance, metrics tied to annual performance reviews and monetary bonuses are common. This can lead to a situation where targets are decided 1-1.5 years in advance and then frozen – and we all know this is patently absurd in the agile world.

Peter Drucker quote

Team agility is reduced with the frozen bonus plan. The team may actually know something of better value that they should be working on but ends up working (quite rationally) on the things listed in their bonus plan.

It would be far better if the team performance bonuses were tied into the actual business results of the product they are contributing to.

Don't kill team motivation

When a team knows that the metric makes no sense, the more management pays attention to that metric, the more demotivated the team becomes. A good metric is encouraging, not discouraging.

Stop the gaming!

Bad metrics are easily gamed as they encourage a team to do things to make the metrics look good. Things that would offer more value are put on the side. The team may even stop thinking about what creates value and only focus on what is measured, no matter how silly it is.

For example, if we measure errors that have been fixed, it will lead to many errors being fixed.

Is this a bad thing? Maybe not directly, but consider the following scenarios:

What if the team instinctively starts to report many minor, easily fixable errors? Is fixing these small errors the most valuable use of the team's time?
What will happen to errors that require more investigation or fixing effort? Does the team work on them more reluctantly? Does the team instinctively stop finding these errors?
Does the team really try to create designs with not so many errors?

Examples of bad metrics

Here are some metrics that may look sensible on the surface but should be avoided. For some, I also offer alternative approaches to measure the desired results.

Changes to specification

If you start to measure how many times a specification (or an epic description) has changed, what information does this give you? Does a high number of changes tell you that the team is refining the specification based on customer feedback or prototyping?

Or does it simply tell you that the original specification was awful to begin with?

A good metric measures something of value or some activity that leads to value. When someone starts to measure changes to the spec, we can assume they want to see how well the specification is "evolving". However, this is a lousy metric because it can be easily misinterpreted.

Let's look at the activities that lead to good (and valuable) work definitions:

Concepting
Identifying uncertainties and testing them
Getting customer feedback or behavior data early
Refining the work definitions before final implementation
Working in small enough chunks of work
Continuous delivery to production
Follow up of feature in production

It is difficult to see how the number of changes to specification would ever reflect how well the team is doing these things.

The number of changes is also an easy metric to game. Just change one dot, save, then change another small thing, save…

What could be a more useful metric to measure how well a team responds to feedback? I have a couple in mind:

Amount of market-facing tests done per week
Speed of getting statistically significant results from market tests (days)

Code changes (code churn)

It is difficult to know if the amount of code changes is a good or a bad thing.

The good:

The team gets feedback on a proto, demo or release, and decides to change the implementation
The team continuously refactors the implementation to meet current and future needs

The bad:

Early specification and refinement of the feature was so poorly done that there is a continuous need to change the implementation during construction
Poor testing leads to massive rework late in the release cycle or after the release

Purely measuring the number of changes does not give us valuable information and is also subject to gaming. This is simply a poor metric.

Commit count

When you start measuring how many commits a developer does, what do you measure?

The purpose of this metric may be to motivate the developers to check in their changes in small batches rather than large. However, this metric is gamed very easily, incentivizing the developers to do very small continuous check-ins. The metric loses value very quickly and therefore is a bad one.

Unit test coverage

Unit tests are needed, but what is the correct coverage level? 100%? Or 80%? Or is any percentage actually a bad metric? Read this for a funny story on unit test coverage.

If you start to measure unit test coverage, it can easily focus the team's attention in the wrong direction. A better question is: what tests are really needed, and do the developers know and have a routine of building effective unit tests as part of Definition of Done?

Velocity

Velocity is a good metric for a team, but only in helping the team decide a good level of work commitment. Velocity becomes a flawed metric if the team or management starts to do any comparisons using it as the measuring stick.

The reason for measuring velocity is to decide how much to load to a sprint. Additionally, the Product Owner can use the velocity value to estimate when something would be done (on the product backlog) or when a release scope is ready with a release burn up chart.

Comparing one team to other teams using velocity values is a terrible idea: this might be one of the biggest agile mistakes management can make. Teams cannot be compared with each other using velocity values. The velocity comes from the pre-work effort estimates that are affected by many things. If such velocity comparisons are started, the metric becomes instantly subject to heavy gaming, and the benefits are partially lost.

Management should instead offer help to the teams to find the optimum refinement and pre-planning time investment level. Many teams are stressed and pressured to deliver. As a result, they invest too little time in refining the backlog. To read more about backlog refinement, see this earlier blog post.

Teams should not make any comparisons even with their own velocity history. As the team gets better or more efficient, the effort it estimates for any work is lower than before – velocity self-corrects for any "we are better now" adjustments.

Equally bad metrics are "Story Points per developer" or "Story Points per hour". These kinds of metrics should just be avoided.

Lines of Code

If you measure how many lines of code the developers are producing, they will probably write more lines of code than you need.

Think about it: Do you want more code or better code? What should the code review part contribute to the software creation? Should other developers think of ways to further increase the amount of code? Measuring lines of code is simply stupid.

Measuring LoC is a bad metric also because it is easily gamed. Even flipping the goal around – trying to write as little code as possible (with the aim of doing as simple and elegant an implementation as possible) – will likely also backfire.

Hours spent on a task (or time at the office)

Agile teams should estimate the effort of work before starting it. This is the basis of velocity and should be the measuring stick against which a team judges the sprint commitment. But the work toward a task should not be measured after it is started.

After a task has been prioritized high enough to be in the sprint content, the team should see it through to the end until it is done. This means passing all tests, acceptance criteria, the Product Owner's approval, and demo and customer feedback. It should not matter how much effort is spent on it.

The work items that pass the Definition of Ready filter should be small enough that they cannot mushroom into months of work – if they do, this is a rare exception. Certainly, the Product Owner knows about it and can decide if the task still offers a good return on investment.

Logging hours to a task only has downsides:

It may cause the developer to think that the thing must be ready after spending the estimated amount of hours, right?
Spending less than the estimate feels wrong, and the developer may gold plate the implementation.
Spending more time than estimated starts to raise eyebrows outside the team – why is this happening? Are you so bad at your job that you have to spend more time than initially estimated on the task?

Solution: do not measure hours spent at all, and just ensure that you don't start too large tasks using the Definition of Ready.

I have also heard horror stories about companies sometimes measuring "time at work". As if being at the office would somehow automatically lead to good value results. I think it would be less insane to measure "cups of coffee drank at work" than "hours at the office". At least cups of coffee could indicate opportunities to socialize during the coffee break, which usually is a valuable activity!

An excellent example of how bad (or even mad) it is to measure hours spent: let's assign a task to a teenager – "mow the lawn" and start measuring how long it takes. When you mow the lawn, it takes you 1.5 hours. So how long do we wait until the teenager has probably completed it? Two hours? Four? Watch this and decide.

Issues resolved or bugs fixed

This sounds like a good metric. But measuring how many bugs you have fixed easily leads to the team reporting many easily fixable bugs. It doesn't really encourage good quality work upstream to avoid creating those bugs in the first place, does it?

What to do if your organization is using suspicious metrics

Measuring complex socio-technical processes such as software development is a complicated task. You can easily miss and exclude some social aspects and produce a metric that does more harm than good. If you are measuring the wrong thing, management attention and pressure can be hurting rather than helping.

If you suspect that you are not measuring the right things, the best thing to do some serious thinking as a team:

Why are you using a particular metric?
What information do you want to get, and why is that information valuable?
What kind of metric would be more motivating, less susceptible to gaming, and measures a quantity or activity that leads to value?

When you consider alternative metrics, keep in mind that good metrics:

Offer valuable information
Measure activities that we know lead to good outcomes
Help teams to see if they are going in a good or bad direction
Help discover problems

Your challenges and problems do not stay the same; they change. Likewise, your metrics should also change. A successful organization thinks and rethinks the most valuable metrics at any given moment.

Eficode can help you design and introduce better metrics for increased product development efficiency – don't hesitate to contact us to review your metrics and help you define better ones.

Published: January 26, 2022