Lies; Damn Lies; and Forecasting…

NoEstimates in a Nutshell

NoEstimates has made a lot of traction over the last few years, with good reason, it is primarily about adopting Agile properly, delivering the valuable work in order of priority and in small chunks, and by doing so eliminating the need for a heavy duty estimation process.  If we are only planning for the next delivery we can reliably forecast.

But sadly that is generally not good enough and some level of forecasting is often requested.  So NoEstimates came up with a very useful and low cost method of forecasting. However, it has brought with it a whole host of misunderstandings, most of which are not from the book. The author must be as frustrated as anyone by the misinterpretation of his proposal.  This has led to resistance from many (including me) to adopting this method for forecasting.  I am all for delivering value quickly and small chunks or prioritized work, but slogans that are used to excuse bad behaviour are damaging and hard to resolve, especially when they seem so simple.

My biggest bugbear and one I have covered previously is that many have interpreted NoEstimates as an excuse to skip story refining entirely, this was not in the book but nevertheless you can see any number of articles on the internet professing how adopting NoEstimates has saved them from wasteful refining meetings, the misconception is that if you don’t need to estimate the story then the act of understanding the story is no longer required.  When actually the author was suggesting that you don’t need to refine all work up front and could defer deeper understanding until it became relevant – the last responsible moment.

planning dilbert

Story Writing and being Estimable

I encourage those writing stories to use the INVEST model for assessing the suitability of a story and in that: the ‘E’ is Estimable,  but that doesn’t mean you must actually estimate the story, just that you ask yourself whether the story is clear enough and well understood enough to estimate if asked – are there open questions? is it clear what the acceptance criteria are and that these can be met?  There may be a subtle distinction there, but NoEstimates does not offer an alternative to writing and refining good stories. It is just a method for simple forecasting and encouraging deferring effort until it is necessary.

How does NoEstimates work?

Caveat aside I will try to give a very high level summary of how NoEstimates forecasting works, and when and where it doesn’t work. I shall do so via the medium of potatoes.

Preparing Dinner

I have a pile of potatoes on the side and I am peeling them ready for a big family dinner.  My wife asks me how much longer will it take me?  By counting how many potatoes I have peeled in the last 5 minutes (10) and by counting the potatoes I still have left to do (30) I can quickly and simply calculate a forecast of 15 minutes.

That is NoEstimates forecasting in a nutshell, it really is that simple.

Assumptions

However, the mathematics requires a certain set of assumptions,

1. I did not apply any sorting criteria to the potatoes I selected- e.g. I wasn’t picking either small or large potatoes, we assume my selection was random or at least consistent with how I will behave in the future.

2. That the team doing the work doesn’t change, if my son were to  take over to finish the job he may very well be faster or slower than me and my forecast would not be useful.

3.  We also assume that I will not get faster

4.  We assume that all potatoes in the backlog will be peeled, and no others will be added. If my wife asks me to peel more potatoes or to do the carrots too, the forecast will no longer apply and will need revising.
So there we have it, a very simple and surprisingly accurate method for forecasting future work.  But do you see any flaws to the system?

Flaws in the system

Flaw 1. Comparing potatoes with potatoes

The first flaw is that I am getting potatoes ready for roasting so I want them to be broadly similar in size, so when I get to peel a potato I am also sometimes slicing it, some potatoes only need peeling others may be sliced once and others more than once.  Some potatoes are bad and I throw them away.

If my wife comes along and sees my pile of potatoes and asks how much longer it will take? I can look at my pile of potatoes I have completed in the last 5 minutes (18)  and I can count the potatoes I still have left to (30). The problem is I don’t know how many unpeeled potatoes were needed to produce those 18 peeled and sliced potatoes, I am not comparing like for like.   To be able to give this estimate I would have needed to count how many unpeeled potatoes I had peeled, information I don’t have.  Maybe I could take a guess and then use that guess to extrapolate a forecast, but that sounds like guesswork rather than forecasting.

Flaw 2. Forecasting an unknown

Let’s assume that I am producing 10 peeled potatoes in 5 minutes, and I am asked to give a forecast as to when I will be done, but so far I have been grabbing a handful of potatoes at a time, peeling them and then going back for more, one could say that my backlog of work is not definitive, We have a whole sack of potatoes but I won’t use them all for this one meal.  I am simply adding work as I need it. My aim being to judge when I am satisfied I am done and start cooking.  It is very difficult for me to judge when the sack will be empty or when I have prepared enough for lunch.

Flaw 3.  Changing and evolving work

It is a big family dinner and uncle Freddie has just called to say he will be coming so we need to add more food, Aunt Florance eats like a bird so probably not worth doing a full portion for her.  And the table isn’t really big enough for everyone, so maybe we should do an early meal for the kids first.  The point here is that simple forecasting only works if you have a reasonably good assessment of what the work is still to be done, if your backlog of work is evolving, work being added or removed then the forecast will be unstable.

Flaw 4.  Assuming consistency

When selecting work to do next I have a tendency to choose the work that will bring me the most value for the least effort.  The highest ROI, so in this case I may choose the small potatoes first, less peeling and less chopping.  But that means that if I count my competed work and use that to forecast my future work I will end up underestimating how much is left, the backlog has some really big awkward shaped potatoes that will take far longer to do. But my forecast is based on only doing small simple potatoes.

Doesn’t this apply to all forms of estimates and forecasts?

Flaws 2 and 3 apply to any form of forecasting, they are not unique to NoEstimates. Flaw 1 and Flaw 4 could potentially be mitigated with the use of T-shirt sizing or story points, but to do so requires a level of upfront effort.  Effort that is not spent on peeling potatoes, so may well be considered waste – that is unless you see value in a more reliable forecast.
For me Flaw 1 is my main objection to NoEstimates (beyond the belief that refining is unnecessary)  When stories are refined and better understood it is normal to split or discard stories, and often add stories as the subject becomes better understood. So any forecasting tool that uses a metric based on counting refined stories to predict a backlog of unrefined stories is risking over simplification of the problem. But because the maths is so simple it can lead to a confidence level that exceeds the quality of the data.  These assumptions based on flawed data gets even worse when you use a tool like Monte Carlo forecasting which applies a further confidence level to the forecast. By giving a date combined with a confidence level adds such a degree of validity and assurance that it is easy to forget that a forecast based on duff data will result in a duff estimate – no matter how prettily we dress it up.

Summary

Forecasting is risky at the best of times, especially in Agile where it is our goal to have the work evolve and change in order to give the customer what they truly want. Forecasting needs to be understood by both parties and accepted that it is an evolving and changing metric. Anyone expecting a forecast to be a commitment or to be static is likely to be disappointed. Just take a look at the weather forecast, the week ahead changes day by day, the further away the forecast the more unreliable it becomes.  Understanding the limits of the forecasting method is crucial, a simple tool like NoEstimates is fantastic IF the assumptions can be satisfied, if they cannot then the forecast will be unreliable.

It is probably also true that your forecast will improve if you spend more effort understanding the work. Time spent refining the stories will improve your knowledge. But no forecast can reliably predict work you do not yet know about.  The question as always is “What problem are you trying to solve by forecasting?” That will guide you in determining whether the up front effort is worth it.
Related articles:

Why I think estimating isn’t waste
Demystifying story point estimation

Should we re-estimate stories at sprint planning based on better understanding of how to implement a solution?

Estimates should be based on the relative size of the story.  E.g. If our story is to do a 1000pc jigsaw puzzle and we have estimated it as an 8 point story.   The story takes us 6 hours to complete.  We break up the puzzle and put it back in the box.

We are then asked to do the exact same puzzle again as a new story.  We have just done it, so we know the difficult bits, we have fresh knowledge and recent experience, it is highly likely We’d complete the story in much less time.  But the story is identical, We still have to do the same puzzle. Last time it was an 8 point story, this time it is still an 8 point story.jigsaw

In other words our experience changes our ability to complete the story it doesn’t change the relative size of the story. We estimate using relative size because we don’t know who will be doing the story or when it will be done. 

Hopefully we learn and get better, equally it is likely a more experienced or senior developers will complete stories quicker, but none of this changes the relative size of the story.

Story point estimates are to offer the ability to forecast, they are accurate in that context and over the long-term only,  Think of stories like rolling a dice.  A three point story is like rolling a dice 3 times and totalling the results, an 8 point story is like rolling a dice 8 times and totalling the results.  Sometimes a 3 point story will take longer than a 5 point story.  But in the long run the average will be 3.5 per roll.

I could never guarantee the next roll of the dice will result in a value of 3.5 but what we can offer probability not predictability over a longer period, by the time you roll the dice (take the story into sprint planning) the story points offer no value or interest to the development team. The forecasting value is gone. The story will take as long as the story takes, we must trust the team to do their job.

Is estimating story points waste?

I occasionally participate in some of the discussions on Scrum/Agile where questions are posed and answered by experts. I have noted that there is a pretty strong contingent that see estimating as ‘waste’ and advocate against estimating, there are various techniques proposed that involve not estimating. I will state upfront that I am not one of them, and although I do understand and agree with many of their reasons, I feel that in some situations they may be missing out on a number of subtle and not so subtle benefits that the process of estimation brings.

Do it my way.

My first counter argument is that I object to anyone saying ‘their method is right and yours is wrong’. In my experience every team, every organisation and every project, is different, suggesting a one size fits all is ludicrous. I believe a good Agile Coach will assess each situation and seek to find a suitable framework for that particular situation. Blindly proposing changing a situation to fit your preferred method is hubris – in my opinion.

I will refer to my experience of ‘estimation’ and how and why I feel it is not waste and does actually add both direct and indirect value to a software delivery framework.

How can you assess ROI – without knowing the I?

First and foremost, it is very simply often necessary. Some are fortunate to be in situations where direct ROI (Return on Investment) is not scrutinised. But I have nearly always worked in situations that are commercially driven or commercially motivated. A feature is prioritised based on its value to the customer (the R in ROI) but not in a bubble, if the cost of providing that benefit is high (The I), the priority changes, often the feature can be assessed as not viable. A Product Owner or a business cannot make a judgement on ROI without an estimate of both R and I, if the team isn’t providing that estimate then the Product Owner or BA is doing the estimating themselves on some level and an estimate of work taken outside the team is highly likely to be less accurate and therefore less useful than if provided by the team.

Whose waste? Are you eliminating waste or dumping it on someone else?

So if ROI is a factor and someone IS doing an estimate somewhere, then calling it waste and not doing it within the team is just pushing it out to somewhere else, you are not eliminating waste you are moving the work elsewhere and to somewhere where the results are less accurate or reliable. This is not ‘lean’ it is not eliminating waste, it is actually just pushing a problem upstream and adding risk.

How much waste?

Even if you still believe that the act of estimating is waste – just how much waste are we talking about? I find story refining vital, one of the biggest hindrances to flow of getting stories to delivery is not having them refined in advance, if a story is well prepared the team can pick up and go. So assuming you agree with refining stories then estimating a story is simply a couple of minutes tagged on to the end of a refinement exercise – we are talking minutes – the waste(if you still believe it is waste) is tiny and yet the commercial value of the estimate to those outside the team is high. If you push it back on the PO or BA their effort will be much greater, you may even be creating waste by refusing to estimate.

Subtle benefits.

But even ignoring the commercial need for estimating, there are other benefits too. As part of the refining process a story is discussed and assessed until the team feel they have a common understanding. But how can you be sure they have sufficient understanding, when does the conversation end? In Scrum at some point they are able to estimate the story this is a natural destination of the conversation. In this situation the very act of estimating is a tool for focussing a discussion and reaching a consensus, the readiness to estimate is a measure of the conversation concluding.

Is an estimate a Barometer of consensus?

But it can also be a barometer of a consensus of understanding. If when estimating there is a wide distribution of estimates or a single estimate that differs from the others this is immediately an indicator that common understanding has not been achieved and that the requirements may not have not been universally understood, I say May as we do not know the reason for the difference without further discussion, but without an estimate this discussion would not have occurred. A consensus and narrow distribution of estimates is an indication that there is a common understanding, the fact that there is a relative size estimate that can be used for other purposes is a bonus. In this example the value in measuring that you have a common understanding of requirements and agreement that a story is ‘ready’ is hugely valuable to promoting flow.

Is an estimate a conversation starter?

But there is more, an estimate is a conversation piece, it may throw up suggestions of quicker solutions or ways to combine stories or breakdown stories. Without an estimate there is no focus for these discussion, without an idea of scale or scope there is no incentive to find a smaller solution or a more efficient solution.

An estimate is a conversation not a number.

In essence what I am saying is that an estimate is not just the number, it is the result or the output following a conversation and whilst you may debate the value of the output, the value of the conversation is considerable and is so often underestimated.

Velocity

I have seen a number of discussions recently about velocity, and generally the harm it can cause when used incorrectly. Most seem to advocate not using it to avoid mis-using it but whilst it is a blunt instrument it is just information like any other metric and for a process that relies on empirical evidence to adapt and evolve it seems odd to ignore a very useful metric, so instead I will try to explain how, when and why velocity could be used to add value.

My car has a trip computer and among its varied metrics it offers an ‘instant fuel economy’ feature.  Why? I don’t know, but to my mind it is a pretty useless feature, but it is there and at any time I can take a spot check and it will tell me how much fuel I am using in mpg.   When racing up the hill to the Air Balloon roundabout foot down it drops to around 20 mpg, when coasting down with my foot off the pedal I get 99 mpg.

If I were to take a single reading half way up the hill and claim that my car does 20 mpg I would get a totally distorted view of the velocity with which I consume fuel, equally if I took a single reading coming down at 99 mpg.    I’m sure most of you would think I was nuts to even consider taking a single reading as an absolute indicator of future velocity. But somehow with ‘sprint velocity’ people think that is okay.

Let me stretch the analogy further. I know that a single measurement is a little crazy to base things on, so let’s take an average over the last 100 miles. That is a pretty solid and reliable metric, most of us would consider that to be a reasonable predictor.   But I have spent the last few weeks only driving around town, stop start, traffic jams, short journeys and so on. But now I have a long trip, mostly motorway, will my average fuel consumption velocity be a good indicator of the next 100 miles?  No it won’t, common sense tells me before I even look that my velocity will be different under different conditions.  A velocity only has meaning if the measured data is representative of the future conditions.  Equally I tend to drive with a heavy foot, but I am quite sure if my father were to drive the same journey in the same car the consumption velocity would be quite different.  The same is true if I drove a different car, the velocity would be different even if my driving conditions were the same.

Velocity tells you only one thing – what was consumed in this car in these conditions. It offers no view of the future, or of alternative conditions.

But I am free to interpret, I can predict, I can guess, I can extrapolate all I like, but it is not the velocity that comes up with these figures – it is me. If I make a prediction and it is wrong, I cannot blame the velocity, the velocity is a measurement of what was, not of what will be.

So how is it useful then?    The most obvious is to say that if I believe that my typical journey over the next 100 miles will be similar to the last 100 miles, similar driver, similar journeys, similar car then it is not unreasonable to expect similar results.  It can be a predictor.

Or let’s say that I am a little bit eccentric and I decide that I want to increase my average MPG and I spend the next 100 miles on similar journeys in a similar car, but I try very hard to improve my driving to be more efficient, in that case it can be a blunt measure of improvement in my driving if the average goes up, I can measure improvement.  I would caution myself very carefully on this particular measurement because it could easily be a change in the road conditions not the driver that is the cause for a change but even so it is a measure – the interpretation of meaning is mine.

Or if nothing changes that I am aware of and the average goes noticeably down or up, it might be an indication of a problem it might be an indicator there is a problem with the car, or my driving, or even just different fuel.

So now I understand all about the velocity of fuel consumption my current average is 50mpg. How much fuel will I consume on my next journey?     There is no way to tell, to even make a reasonable estimate I’d need to know an approximate distance, I’d need to know the terrain and even with all that there are still unknowns, I may hit a delay in traffic, or a diversion, I may get asked to call in and pick up milk, I may even breakdown.  And if I could predict all of that and gave you an estimate, would you trust it to any degree of accuracy?  I wouldn’t, I might trust the estimate to a degree, but I wouldn’t play to fill up on empty on such an estimate I’d almost certainly plan a significant buffer to ensure I didn’t run out of fuel.

I hope most of you would agree with my views on fuel economy, there are simply too many unknowns to make accurate predictions.

And yet, I reset my trip computer fairly regularly every 1000+ miles or so, and on average over that 1000 miles my economy is in the region of 48 ish, it very rarely fluctuates significantly from this. Over a very long timeframe I can get a very accurate estimate, that is consistent 1000 miles after 1000 miles.  That still tells me nothing about the next 20 miles or even 100 but long term I can be consistent.

Another interesting observation is that to get this measurement I am using an extremely accurate tool to measure, I am in a known unvarying state – same car, same driver, similar journeys.  And yet with all this certainty I still don’t trust my predictions to any degree of accuracy I buffer I plan contingency.   But if I stop talking about cars for a moment and I talk about software, more specifically a team developing complex software with a great many unknowns, where the team members can take leave, be sick, have family troubles, they can leave the team, or new people join the team, they can learn new skills, use new tools.  But we are expected to estimate, often to a very specific dates, large quantities of unknown and often very complex work. What is more odd is that those estimates are often taken as commitments.

If a rational person wouldn’t trust a reliable and regular machine like a car to give specific accurate predictions with fairly well known parameters, why do similar rational people expect such accuracy from a domain far more variable, far more complex and far more unpredictable?

My rather long winded discussion on fuel economy is really summed up in one sentence.

There is only one estimate on how long a software project will take that I would trust and that is one given the day after the project is completed.

Otherwise use projection tools like velocity in context,  if the future conditions are likely to be unchanged then velocity can be a useful tool, but the longer the average the better.  And if your circumstances change: the team, the type of work the domain, the tools your current velocity is meaningless until you have enough information to create a new measurement.  And think of your car, what value is an snapshot measurement without context.

The Confusion Over Story Points.

In my last post I referred to the myth that Scrum Masters can make themselves redundant, the other most common and persistent myth I regularly hear is that Story Points are a measure of complexity.

No, no, no. Story points are just about relative size.   That’s it, nothing more. There really is nothing complicated about it.

It is all relative

A small ‘simple’ task is not the same as a large ‘simple’ task.  And a small ‘complex’ task that I can have done in a quarter of the time of a large ‘simple’ task does not warrant a higher story point estimate. It makes no sense to estimate that way.

My advice when estimating stories is to pick a cross section of stories, pick around a dozen typical stories of varying: scope, sizes, complexity, uncertainty, and any other factors that might affect the time and effort involved. Discuss them and get a feel for the stories and then sort them smallest to largest.

Next take a story in the middle and stick it in the middle of a white board on a horizontal line drawn across the board.  Next take each story and estimate how much effort relative to the stories already on the board mark each story on a number line, hopefully you will get a flow after a while. E.g. The next story is 3/4 the size of the first so i put it 3/4 of the way between the end of the line and the first story.

Story Points 1

When all your stories are on the line pick a scale that you think looks right.  Just Try it and see, each set of stories will have a different distribution, if you initially pick a number too low and things feel squeezed, just try again.  E.g you pick the story in the middle and using Fibonacci numbers call it 13pts where does this leave the rest? Do they fall easily into other Fibonacci numbers? stories around a 1/4 the size of the first would be a 3 and so on.

Story Points 2

That is now your scale, and regardless of how your definition of done changes, or the team changes, the relative scale remains.

If you are lucky enough to have a team room with space to leave those stories up on the wall on the scale line that’s great, if not then take a picture.  From now on when estimating any story you can say how does it compare to ‘x’, x was a 13pts and this is similar, or this is 50% bigger than ‘z’ and that was a 3 so this is a 5pts.  By keeping everything relative to the initial stories and you keep the estimates consistent.

But what about complexity?

Complexity, and uncertainty, or time consuming interactions between layers or departments are all just considerations that affect the size. “That task is complex it may take me a bit longer”, “That task is unclear I’ll need to take some time to understand this aspect of it – it will take longer” all we are saying is that it is just relatively larger, there is no trick to it.  It might help make life easy to say – this story looks very similar to 5-point story ‘x’, but has some uncertainty (or I’ll need to liaise with another department) therefore we’ll estimate an 8.  It is the fact that complexity adds time that makes it a bigger estimate not anything else, it is just another factor in comparing the relative size.

Story Points 3

Comparisons are so much easier than raw estimates.

When estimating new stories all you have to do is pick a story and say: “will this take longer than reference story x?” or “will it be less than reference y?”  with enough reference stories there should be a suitable comparator to find a similar sized story and give it the same points value or a bit more or a bit less based on a considered factor. After very short time you will instinctively know the scale and estimating becomes quicker and easier. Your refining becomes a series of comparisons e.g. “this involves more testing than story x” or “it is like ‘y’ but with a bit more to the UI” etc.  Comparisons are so much easier than raw estimates.

What about ideal hours or ideal days?

Some teams use ideal days or hours for estimates but that is subject to change according to the team, the team changes and the estimate changes, what if someone is off sick or on holiday? and more confusingly it implies a commitment to a particular amount of effort, what if your DoD changes and suddenly you are writing automated regression tests for each story, suddenly the scale us blown, but a relative estimate is always relative.

Sometimes after a while teams begin to drift, it may be worth having a refresher of the initial stories and initial estimates just to calibrate your estimates periodically, maybe even add new reference stories to keep the scale fresh.

Just remember it is all relative

Just remember it is all relative, and then it is easy!