Software estimation in hours with Scrum by Simon van Dyk

With fair warning, this process involves all the things your scrum master probably doesn’t want to hear: estimating in hours instead of points and writing tasks instead of user stories. You can, however, very easily use this estimation strategy running a scrum process.

The gist

An oversimplified version: List what needs to be done in tasks of no larger than 4 hours. Then group them by how they can be tested by someone else.

I am currently using this strategy and, when I explain it to my colleagues, they ask many of the same questions that might be forming in your mind. Let me attempt to address these briefly. If you break the strategy down it is essentially based on some really important concepts. These concepts were introduced to my team at work in our sprint cycles via The Agile Retrospective (I love these, but retros are a discussion better kept for another time).

The first important concept we introduced was the fact that we estimate work and make projections in hours, instead of story points.

Software estimation in hours

You might very well say, “Simon, we can’t use this because our estimations are currently in points, and everyone’s points are different when convert to time. Why did you use hours?”

Herein lies the precise reason I chose to use hours. The most important characteristic of story points are that they are relative to the team that assigns them. I chose hours because they are absolute; an hour for you is an hour for me.

The caveat, however, is this: This principle is not true across project teams. As a result, one cannot use this strategy for higher level estimations that are independent of the team that will implement them. I don’t ever find those estimations to be accurate anyway. As a result, I feel that they are even more damaging then simply not doing them at all.

In the next section I discuss story points and velocity. If you’re unsure of what these are, take a break and read up on these.

I believe this change towards estimating in hours has resulted in an important shift in understanding what our team is actually doing in a sprint cycle. Previously, only estimated work that represented new software was assigned points and signed off, and resulted in a higher velocity. Now, in contrast, I believe that a software team should estimate all valuable work on a project: whether the work is developing new software features; completing housekeeping tasks; or squashing bugs.

Estimation in our team is a quantitative measure of “effort”, represented in hours. To play devil’s advocate here, ask yourself the question: “Do I consider work that does not result in new features in the system to be valuable?”

I have sat in planning and grooming sessions where teams play estimation poker and make wild guesses about how ‘complex’ a feature might be just to come to a number. From this magical number, they end up ‘committing’ to completing, without fully understanding the risks involved in building it. I don’t exactly find this useful because the result is based on a “guesstimate” of the average group assessment of the risk or complexity involved in implementing a feature.

Points are a loaded topic! But if you currently use points, you can naively equate points to hours in order to use this estimation strategy. I’ll show you how in this super easy 2-step tutorial:

To calculate how many hours a developer spends per point, you need 1) your typical developers average velocity in the team and 2) a somewhat accurate number of your team’s average effective dev time per sprint.

This is the total number of hours your team puts towards completing “pointed work” (work that has points assigned to them in your tool of choice). Once you have that, the calculation is simple.

Here’s an example:

# Assuming a 1-week sprint (oh my, so short!)
avg_effective_development_time_per_sprint = (40 - 16) = 24 hours
avg_developer_velocity = 6 points
hours_per_point = avg_effective_dev_time_per_sprint / avg_dev_velocity
hours_per_point = 24 hours / 6 points
                = 4 hours per point

The second important concept underpinning this strategy is that the team is forced to estimate in great detail, because they need to break down large tasks into smaller tasks of no larger than 4 hours per task.

Estimating in detail

I have experienced some resistance to what some consider the extreme detail required in this estimation process. To address this criticism, let me share another concept that our team uses to discuss the extent of the detail we should be estimating, and how much work we should be estimating right now. It’s called the horizon of software estimation.

You might be curious, nevertheless, about why we use 4-hour chunks specifically?

There is no science here. I blindly tested this based on Hiten Shah’s article on engineering estimates and it seemed to work really well for us. In my experience, we have found a 4-hour task to be just small enough to force us into answering questions that would be left unanswered if the estimates were larger. You don’t want to go too small here either, otherwise, you’re likely to be spending too much time in planning by way of micro-estimating.

Another important challenge in describing software is having a shared understanding of when a feature is complete. If the various roles on your team do not know what constitutes success, you will surely run into frustration during implementation and testing.

Defining success

The last step in the process, once you have an exhaustive list of tasks, is to group them into the smallest batches possible, such that each batch of tasks is able to be completely tested by someone else. This is important because it forces your team to describe what success looks like. Doing so will bring clarity to everyone about what your team is working towards, and a shared knowledge of what it looks like when a particular feature is complete. As a bonus, it also helps the work move through any QA processes you may have in place.

Our experience is that grouping tasks in this way, results in smaller tickets in our project management tool that can be tested in isolation, while producing a list subtasks which makes measuring completeness of tasks easy, even if a team member might not have completed all their assigned tasks by the end of the sprint cycle, it’s clear to see how far they are and possibly where they are getting stuck.

Practically speaking, there are many ways to describe success that you are likely already familiar with: acceptance criteria, a definition of done, the goal of the feature. I recommend that your team write this up alongside the groups of tasks so that when they are complete, it is easy to refer back to what the goal of the feature group was. If these criteria end up changing, that is OK. Just make sure you change your estimates based on your new knowledge about how the system should work - if the trade-off is still worth the metric.

An example of what this process may look like is the following image below. On the left we’ve made a list of tasks, on the right we’ve grouped them by how they are to be tested, either together (revealing a dependency), or in isolation.

I’m quite sure that I have left you with more than a few questions relating to the software development process. This was inevitable as estimation is the cornerstone to great projections. Our team went through many iterations to find something that worked for us. The point here is that you shouldn’t be afraid to experiment. If your responsibility is to move the project forward, the best way you can do that is by simultaneously building it and improving your processes continuously - to me, this is the essence of Agile.

It’s ok to fail, just adapt

Estimating when software is going to be done is difficult. I have been failing at it for years until I started using some common sense.

The truth is, whether you use points or hours to measure development progress on your software project, good enough estimation is extremely tricky and very few actually get it right 100% of the time. Lastly, look at the data with the intention of improvement, metrics always need qualifying comments, but accurate, complete data can be used to improve things when interpreted well.

Questions? I’d be humbled and happy to help.