It’s 2016. And still, some business managers seem to think that using lines of code (LOC) produced is an appropriate KPI for the productivity of a developer, or a software engineering department. Since history has a tendency to repeat itself, I have a feeling that this will even be the case in, say, ten years. Software engineers themselves know, of course, that measuring productivity in LOC is not the way to go. However, non-technical or even semi-technical people do not know this instinctively.

One of the most important skills of a software engineer, and even more so of an engineering manager, is making technical concepts understandable to a non-technical audience. As Einstein said:

“If you can’t explain it simply, you don’t understand it well enough.”

So, in the following, I am trying exactly that: Explaining to somebody who has never written - or seen - a line of code in his life why LOC will not tell you how productive somebody is.

Let’s first try an example from David Herman’s Effective JavaScript, which I slightly adapted below. It is a function that fetches an array of URLs, using rather low-level functionality, with a sophisticated orchestration of callback functions and status variables. You can skim over quickly it if you want:

/**
 * Takes an array of urls (usually text files) and downloads
 * these files concurrently. If everything goes well, will call
 * the onsuccess callback with an array containing the file
 * contents in the order they were specified.
 */
function downloadAllAsync(urls, onsuccess, onerror) {
    var length = urls.length,
        result = [],
        arrived = 0;
    if (length === 0) {
        setTimeout(onsuccess.bind(null, result), 0);
        return;
    }
    urls.forEach(function (url, i) {
        downloadAsync(url, function (text) {
            if (result) {
                result[i] = text;
                arrived++;
                if (arrived === urls.length) {
                    onsuccess(result);
                }
            }
        }, function (error) {
            if (result) {
                result = null;
                onerror(error);
            }
        });
    });
}

This is quite a mouthful, with over 20 lines of state keeping and callback nesting. Now, since the publication of Effective JavaScript, Promises have been pretty widely adopted. What happens if you use the Promise abstraction, plus a utility function, is quite remarkable: You can reduce the above monster to a one-liner (provided that downloadAsync() now returns a Promise, too):

function downloadAllAsync(urls) {
    return Promise.all(urls.map(downloadAsync));
}

To a skilled JavaScript developer, the second version is instantly comprehensible, while the first version requires a lot more thinking and reasoning. Since code in a long-running project is read much more often than written, replacing the first version with the second version will save a lot of developer time in the future, thus saving the organization money.

Moreover, the single line of version 2 offers much fewer opportunities to introduce errors than the 20+ lines of version 1, so the maintenance cost - the cost of ownership, to use a vocabulary that business people are accustomed to - is much lower for version 2. It will result in fewer bugs, therefore, we can reduce QA time spent on this portion of the code.

In other words, more lines of code do not mean more value creation. On the contrary: Given identical functionality, fewer lines of code are clearly preferable. That is, as long as you don’t write entire functions on a single line, but I would not go that deep into the topic with non-technical stakeholders (wrong level of abstraction). If two developers implement the exact same functionality, and developer A produces 100 lines of code, while developer B produces only 50, I would prefer developer B’s version.

Once more, without code

Now, some stakeholders might buy the above argument with the code examples. However, others might still be skeptical, because, since they don’t know anything about code, you could tell them anything. These people might mentally shut down if confronted with actual (or pseudo) code, so I will forget about code examples and try an analogy instead.

My analogy is: Counting lines of code in an application is like counting the number of bricks in a building. Both give you a rough idea of the size of the thing. However, are they also suited to measure the value contribution?

Imagine five teams of builders. Each team is asked to build a garage for a car. Let’s assume the first team builds the garage and uses 2000 bricks in doing so. All done, you can put a car in, requirement met.

The second team anticipates that the owner of the car will soon buy a second car, so they build a garage that is large enough to hold two cars. Material used: 3000 bricks.

The third team has a slightly different idea: “We need extra-strong walls for this garage, so that the car is better protected.” So they double the thickness of the walls, and use 4000 bricks.

The fourth team read the requirements wrong. They build a long wall instead of a garage. You can park cars next to the wall, but it is not quite the same. Since they were pretty motivated and the wall is pretty long, their brick count is 5000.

The fifth team, finally, is creative and wants to impress their boss by building for future use cases: “What if somebody wanted to use the roof-top of the garage as a terrace to sit on and sunbathe?” So they build stairs leading up to the roof-top, so that the owner can get up easily. Also, they acknowledge the danger of somebody falling down, so they build a small wall around the roof-top. All in all, they use 6000 bricks.

What about the KPI values of these teams?

  • The first team, even though they met the requirement perfectly, and created value (a car can park in the garage) has the lowest KPI of all (2000).
  • The second team did not stick to the specification, since they built a double garage. However, they have a higher KPI (3000) than the first team.
  • The third team stuck to the specification (the garage can hold one car), but interpreted it freely by making the walls stronger. By doing that, they reached an even higher KPI of 4000.
  • The fourth team did not even meet the specification (they built a wall instead of a garage), but their KPI is still higher: 5000.
  • The fifth team suffered from a severe case of scope creep and built things that were not requested and are of questionable use. However, they scored the highest KPI value of 6000.

Which team should actually have the highest KPI score? It depends. The first team created as much or more value than any other team, at a lower cost than any other team (1). So it would be reasonable to say that they should have the highest score.

Even if it turns out that the third team was right in building stronger walls, because the garage owner is sometimes drunk and runs his car against them: Yes, then the third team should have a higher KPI score. However, teams four and five have an even higher one, even though their walls are not as strong.

Or, let us assume that the prediction of the second team comes true, and the garage owner does indeed buy a second car soon. This might justify a higher KPI score for the second team compared to the first. But again, other teams have an even higher KPI score than the second team, but no space for the second car. So even if this random prediction turns out to be correct, more than half of the KPI scores will still not be justified.

In short, there is no correlation whatsoever between the number of bricks used and the value created with them. Double any given KPI score, and what will you get? Stronger walls, maybe, or stairs, both of which nobody asked for. And if you think these examples are far-fetched: They are not. Similar things happen every day in the software industry.

Some developers will find elegant and concise solutions to a given problem, and make steady and straightforward progress. This would be team one. Others will foresee use cases that, in reality, might or might not come. For example, instead of creating a web site as they were asked to, an overzealous developer might also create a web service, or an Android app. Naturally, this comes with more lines of code. Teams two and three fall into this category. Still others, like team four, will use wrong abstractions, or will be unaware of the best available libraries and tools, or will simply fail to fully understand the problem. They might well produce numerous lines of code - but no solution. Finally, there is team five, who, instead of building a web site, decide that they need to build their own framework first (think Asana and Luna), and also fork an open source in-memory cache along the way to optimize it for their needs. A lot of code produced, but not necessarily to the point.

To paraphrase the above statement, there is no correlation whatsoever between the number of lines of code written and the value created with them. Double any given KPI score, and what will you get? Features nobody needs, or an overengineered solution, or a lot of code that does not solve your problem, or a nice side-product that you did not request. Moreover: Since your engineers know that they are measured by lines of code, they have an incentive to be overly verbose, to overengineer, or to produce code just for the sake of producing code, with no respect to the problem to be solved. All of this code comes at a cost of ownership.

Metrics should enable decision making

There is one more aspect that I would like to mention. Eric Ries justly says that the only metrics you should invest in are those that help you make decisions. Everything else is a vanity metric that might make you feel good, but is pretty useless. LOC is such a vanity metric. You feel great when your team churns out 5000 lines of code in one week. However, if your “lines of code per week” metric drops to 4000, what exactly does that tell you? Did developers write fewer tests? Were they caught up in meetings more often? Have they introduced a new framework that allows them to delete a lot of now-unnecessary code (a reason to celebrate)?

Not only does an absolute LOC value fail to tell you much about an application. Also, you cannot know if a change in this value is good or bad. It can be interpreted in many ways, and, therefore, easily misused. It can serve as a pretext to fire somebody. It can be used as a reason to exert pressure. It might be an excuse to introduce a new, more restrictive process.

Conclusion

While simple and inexpensive to gather, lines of code are not a meaningful metric that allows conclusions on productivity. If you delete lines of code while preserving all functionality, does that mean your productivity is negative? Of course not. I hope that the above brick analogy can help you convince people who have been seduced by the simplicity of the LOC “metric”, and are seriously considering introducing it. I hope even more that you will never need that help.

Edit

This is v2 of this article. Shortly after (~10h) publishing the first version, I added the last two paragraphs before the “Metrics should enable decision making”.

Time investment

I spent about 4 hours on this article. I did not come up with the brick analogy from the start. I used productivity measured in working time first. However, it had its weaknesses, and I liked the brick analogy much better.

Footnotes

1. You might say that the analogy is inaccurate here, because each brick costs money, while lines of code are immaterial and therefore free. However, this is not true. First of all, writing a line of code requires thinking and typing, and thus requires work time, which costs money. And second, since each line of code is read many times more often than written or re-written, lines of code even come with some sort of tax that has to be regularly paid. It is harder and takes more time to hold a 500 line program in your head than a 100 line program. So, unlike with bricks, you do not just pay once for a line of code, but continuously.