How feature estimates killed Bobo

Software development lacks a single meaningful, objective productivity metric. This is not for lack of trying. A productivity metric would make the life of a software development manager dramatically easier. Performance reviews? Just see who has the highest number. Need to find out how to boost productivity? Find the developer with the best process and have your other developers adopt it as a best practice. And so, managers grasp for measurement. Lines of code is so obviously wrong as a measurement that I’ve mostly heard it brought up for its comedic value. I’m not saying that in the history of software there hasn’t been some misguided manager who actually reviewed their developers by lines of code, but it is certainly more myth than reality in the modern era.

I can’t say the same for effort estimation accuracy. Many otherwise intelligent managers have embraced the accuracy of their developers estimates as a defining measure of their developers worth. There are varying degrees of vigor attached to this review. The energetic manager maintains a spreadsheet (or enlists a task tracking tool) to calculate every estimate given by their developer and then what the task actually came in at. At its most simplistic they might just divide the two numbers at that point and shoot see who deviates the most from 1 and apply the appropriate corrective action.

Most managers aren’t quite so vigorous though. The belief in holding developers accountable to effort estimate accuracy is frequently enforced more subjectively.

“Filbert, it looks like you’re past several features have all come in late. You need to start pulling your weight. Part of being a professional software developer is reliably hitting your commitments.”

And it’s true isn’t it? Promise you do something and then not do it and you’ll lose trust. The problem is why are software effort estimates treated as commitments in the first place? Most of this comes from the belief that deadlines are a necessary motivator. Modern society revolves around time starting with grade school. Teachers hand out homework and assign due dates.Tests are given at set times and students are taught to cram. This same philosophy extends to college and then to work. Without the pressure of time, people are trained to slack off. And so we create artificial deadlines as a motivational tool.

And for some lines of work, that is necessary. Because a lot of work is really boring. But creating software is fun! Sure, it has its slow moments, but the best in the field are here because they love it. For an already self-motivated developer, the addition of the deadline constraint doesn’t make any additional work occur. It just prioritizes that dates are more important than quality.

Hold up says the man in the back. I’m not making these estimates up, the developer is. I’m just asking them to reliably deliver. If I tell my boss I could have a budget out by next week, I’d be fired if I gave it to him a few weeks later. But again you get to the question of why this accountability matters. We always look to latch onto some more well established parallel for software development to better understand how we should treat it. Let’s flip that around and treat another field like software.

Let’s say that you own a craft dutch clog workshop. You have two shoe makers, Bobo and Jobo. Bobo says every day he is going to make 100 shoes and ends up making between 25 to 50. Jobo says he is going to make 5 shoes and always makes 5. The quality is exactly the same for both workers. There is such hot demand for the shoes that as soon as one is made it just flies off the shelves as $200 per shoe. Both Bobo and Jobo are paid the same hourly wage of $100/hour and the material cost for each shoe is $10.

From a naive cost perspective, Bobo makes shoes at a cost of $10+(($100*8 hours)/(25 to 50)) or $26 to $42. For simplicity, let’s say that Bobo’s range of shoes made per hour follows a standard distribution so his average cost per shoe is $31. Jobo makes shoes at a cost of $10 + (($100*8 hours)/5) or $170. Given their respective production rates, Bobo makes you a profit of ($200 - $31)*(25 to 50) or around $6,000 per day. Jobo makes you a profit of ($200 - \$170)*5 or $150 per day. Now a lean manufacturing guru might say that Jobo is still better because reliability is more important than total throughput. Otherwise you end up with overproduction in one part of the system which then has associated inventory costs and other wastes. But in this case, that isn’t quite valid because both Bobo and Jobo are making the end product. And there is no inventory cost because the shoes sell as fast as they are made.

In the end, Bobo makes his employer over 40 times as much money as Jobo does. Now let’s bring in the traditional software manager that uses estimate accuracy as their primary means of driving accountability. Jobo delivers to his estimates 100% of the time and is the model employee. Bobo though is a problem case. He is unreliable and is off on his estimates from 2x to 4x. After sending Bobo to an estimation training for 2 weeks, Bobo now estimates that he makes 37 shoes per day. And while on average this is correct, Bobo still sometimes makes as low as 25 clogs some days and as high as 50 clogs other days. The days he makes 50 clogs he is accused of sandbagging and the days he makes 25 he is just being lazy. This is still much too unreliable. Jobo is still the model employee and is given a raise. Bobo is on a performance improvement plan and asked why he can’t be more like Jobo. Bobo eventually gets smart and starts estimating that he can make 25 a day. As soon as he gets to 25 in a day he whips out the hammock and martini and enjoys the rest of his day. Now his estimate accuracy is 100% but his boss is angry that he sees Bobo in a hammock for a significant portion of every day. This is obviously unacceptable. Jobo on the other hand has been given several raises and is given a company luxury car to ensure his retention. Bobo finally realizes his boss doesn’t even care how many clogs he makes and just wants estimate accuracy. So he starts making 5 shoes a day like Jobo. He staggers the creation of each shoe slowly through the day and makes sure to always look like he is working. His boss is ecstatic, he is finally reliably hitting his estimates and working hard.

Bobo quits a week later because he realizes all he ever loved was making clogs and it is mind numbing to work at a place where estimates matter more than the clogs. Bobo’s manager is lauded for his top-grading efforts. Bobo starts his own clog shop across the street and starts selling his clogs for $150, still making a healthy profit. Bobo’s old clog shop can’t meet this new price without losing money per clog sold. Jobo’s manager, realizing the error of his ways and that there is no way he can compete with Bobo’s clog shop, murders Bobo and burns down his new shop. Jobo’s clog shop continues on for many years with a small but steady profit.

It all seems so obvious in the world of clogs. But is software really much different or is it just harder to quantify the unit of production than in a world of widgets? It is often said that it is better to have an imperfect measure than no measure at all. But estimate accuracy isn’t an imperfect measure for software productivity, it is a completely orthogonal measure that actually drives down productivity. The only reason we are left with holding developers accountable to feature estimates is that their accuracy is necessary for some other purpose, such as the creation of project launch timelines or for cost-benefit analysis. I’ll tackle the flaws of these uses in my future posts.