Improved Agile Release Burndown Metric Reveals More Stories about Teams

September 28, 2015

The Done-Done Completion Rate tells us some interesting information about how our agile teams are doing on a sprint-by-sprint basis.  However, it doesn’t help us to understand whether they will actually hit their next milestone.  The trouble is, the current version that they’re working on may still not have all its scope defined.  That’s all agile and fine and everything, but the reality is, sometimes we want to know if we’re going to hit a date or not.

As mentioned previously, here at Central 1, we use JIRA to manage our agile backlogs.  This tool comes with a handy release burndown chart, which shows how a team is progressing against their release goal.  For example, the chart below illustrates a team who started their version before adding any stories to it.  However, once started, they burned a portion of the release with each sprint, eventually leading to a successful release.  In sprint 27, they added a little scope, effectively moving the goalpost.

ReleaseBurndown

The trouble with this chart is that it supposes that the team is planning their releases (versions in JIRA).  What about teams that have multiple concurrent releases, or those that aren’t really using releases at all.  Are the teams leaking defects into the backlog?  Are they expanding the scope of the current release?

In order to answer these questions, we need to include the unversioned backlog.  I’m considering a metric that I have given the catchy moniker, “Net Burndown for the Most Active Committed Release.”  This starts out with the chart above for the release upon which the team is making the most progress.  Any change in any other releases is ignored, so a team could be planning the next release without affecting their net burndown.  However, if they leak stories, defects or tasks into the backlog without associating them with a release, those items are assumed to be in the current release and included in the net burndown.  Sometimes that’s bad, and sometimes it’s good.  Finally, any unestimated items are given the average number of points for items of that type.

Here is how the chart looks for one team.  This team is somewhat disciplined about using versions, but as discussed before, they leak quite a few defects, and may or may not assign them to a version.  In the chart, the blue area represents the same information as the JIRA release burndown chart.  The red area adds in the backlog outside the release, and you can see that it tends to dominate the blue for this team.  Finally, the green area represents all the unestimated issues (mostly bugs).

netBurndown12

In some cases, like August 12th, the negative progress in the backlog (about -50 points) dominates what appears to have been pretty good progress in the release (about 50 points).  The unestimated issues (about -30 points) leaked onto the backlog make the story even bleaker, and we can see that the team is not making substantial progress toward their release after all.

Contrast this team with a second team, who take a more disciplined approach overall.  The team did some sort of a huge change in their versions in June, which cuts off the chart.  However, since then, we can see that the team leaves very few unestimated issues, and tends to assign issues to releases, rather than working on the backlog directly.  They’re not perfect, however, and struggle occasionally to maintain their velocity; comparing with their Done-Done Completion chart, we can see that this was actually a struggle, as opposed to a planned dip for vacations.  They also seem to be letting go of the versions a little in more recent sprints.

netBurndown5

Advertisements

Done-Done Completion Rate reveals interesting stories for agile teams

September 22, 2015

I wrote about the Done-Done Completion Rate a couple of weeks ago. Since then, I’ve plugged in some data from some of my teams, and revealed some interesting stories, which I would like to share today.

First, let’s look at Team 12.  They have a completion rate (points completed / points committed) that tends to vary between about 60% and 80% (light blue line).  It could be better, but 80% is not bad, considering you want to be reaching a little bit in each sprint.

poinCompletion12

However, Team 12 tends to leak quite a few bugs with each sprint.  They have produced estimates for many of these defects, and so, the team is aware that their backlog is growing almost as quickly as they burning it.  As far as they’re concerned, they have an effective completion rate ((points completed – points added) / points committed) that is lower (medium blue line).

There is, unfortunately, a hidden trap in the unestimated defects.  Allowing for these, and using an average number of points per defect, they even made negative progress for one sprint back in May.  The good news is that they appear to be improving, and since July, they appear to have kept their sprints much cleaner, and we can expect that they have a better handle on when they will complete their current milestone.

These unestimated defects can be the undoing of a team.  Consider for example, Team 7, which has a number of new members, and may still be forming.  About every third sprint, this team seems to fail to meet their commitments, and indeed lose substantial ground.  As a manager, it’s important to dig into the root cause for this.

poinCompletion7

Finally, here is a team that keeps their sprints very clean.  Through extensive automated testing, Team 5 allows almost no defects to escape development.  When they do, they estimate the defect right away.  Notice how their completion rate (light purple) is actually lower than Team 12 (blue above), but when we allow for defects, this team is completing more of their work.  The result is that this team can count on their velocity to predict their milestones, provided the scope doesn’t change.  Of the three teams, this one is most predictable.

poinCompletion5

An Improved Agile Completion Rate Metric

September 1, 2015

As mentioned in my last post, we’re changing our focus from productivity to predictability until we can actually predict how long releases are going to take.  I still believe that we need a single true metric for productivity, but until we have some predictability, our productivity numbers are too shaky to provide any guidance to teams as they look to improve.  I’m looking for a Contextual Temporary Metric, rather than the One True Metric for Central 1 Product Development.

At Central 1, JIRA provides us with a Control Chart, which maps the cycle time for issues.  This is a powerful chart, and provides many tools to manipulate the data to gain insights.  However, it makes the base assumption that all issues are the same size.  One large story can severely change the rolling average for cycle time going forward.

controlChart

A brief search gives some good ideas at the sprint level.

  • From Leading Agile, Number of Stories Delivered / Number of Stories Committed, and Number of Points Delivered / Number of Points Committed.
  • From Dzone, recent velocity / average velocity or recent throughput / average throughput.
  • From Velocity Partners, Number of stories delivered with zero bugs.

None of these considers predictability at the Release or Version level.  We already have pretty stable velocities in our teams, and when there are hiccoughs in velocity, it is for a known reason, like vacations.  So, I started looking at delivery versus commitment, which is, at the crux of predictability.  If a team can’t predict what they can deliver in the next two weeks, there is little hope that they will predict what they can deliver in a few months.

As I started to compile the data for stories and points in each sprint — something that is more difficult than I would like with Jira — I began to see that the teams go through large swings in the number of stories they might attempt, but the number of points stays relatively stable.  Meanwhile, stories are central to productivity, whereas predictability should include all work that the team undertakes, even if that work doesn’t move the product forward.

I therefore focused on the points delivered versus points committed, a number I call the Completion Rate.  The chart below illustrates the outcomes for three teams in the time since April.

point completion rate - raw

It is easy to foresee that a team might easily affect this metric by rushing through their work and leaking a lot of defects into the backlog.  A little further analysis shows that for some teams, like Team 12, this is indeed the case.  Teams 9 and 13 on the other hand leak relatively few defects into their backlog, as shown by comparing the light and dark solid lines in the chart below.

point completion rate

When a team marks a story (or defect) complete, while simultaneously finding a lot of defects, it makes it difficult to predict how long it will take to stabilize the release.  The outstanding effort for the release keeps growing, and the team must not only predict their velocity in addressing the defects, but the rate at which new defects will be found (hopefully one is higher than the other!).

I’m calling the value represented by the dark lines in the chart above the Done-Done Completion Rate:

Done-Done Completion Rate = (points completed – points leaked) / points committed

For the purpose of the analysis above, I used the actual estimated points for defects that were raised during the sprint.  However, in practice, those estimates probably don’t exist at the time of the sprint retrospective, when the team should want to review how they did.  In that case, I would use the average number of points per defect for those defects that haven’t been estimated.

What this metric doesn’t capture is the story that isn’t complete, and yet the team marks it as complete, while creating a new story for the remaining work.  I don’t know if a metric pulled from Jira could accurately detect this situation without penalizing agility; we want to be able to create new stories.

In the absence of a Release-level predictability metric, the Done-Done Completion Rate could help a team to see if they are moving in a direction that would enable them to predict their releases.

Look for Predictability before Productivity for Agile Teams

August 15, 2015

Last week I proposed a productivity metric based on the proportion of a release that has been completed, accounting for story creep in the release.  Not all of our teams are using the version feature in Jira, but the several that are enabled me to perform a little analysis.  So far, I don’t think we have the maturity level required to make this metric work for us, but the data are instructive all the same.

Two teams in particular stand out.  Team 13 was our inital pilot team on agile, and they’ve been practising for about three years now.  Some of their releases look pretty close to what I expected from a productivity standpoint.   Here, for example is a five-sprint release that shows a ramp down at the end.  The last sprint only contributed small productivity because there was only one story left at the end to wrap up.

Team13R3.6

This made me wonder what was going on with their releases, and so, I mapped all of them, and I found a much more chaotic chart.  The chart below shows the raw number of stories completed in each sprint.  The different colours denote different releases (light blue stories have no release).

Team13Stories

Now team 13 actually has a legitimate reason to work on multiple concurrent releases: they manage change to multiple pieces of software.  So, my model would have to accumulate value produced across concurrent releases.  This analysis currently takes a certain amount of Excelling, and so, I looked at Team 12, which has only three releases:

Team12Releases

This is a chart of the proposed productivity metric for three releases.  As can be seen, each release kicked off with a killer sprint in which the team produced over ten per cent of the stories initially defined for the release.  Then a combination of deferred stories and probably bug-fixing killed them after that.  The result is a metric that is too unstable to provide feedback to the team.

Reviewing the shape of their velocity chart confirms the story and also raises some more questions and concerns.  In particular, this team’s velocity is too unstable to make for safe predictions.  So, while this productivity metric might be relevant for long-established teams with stable velocities, I suspect the majority of our teams need to concentrate on predictability first.

A Mobile Usability Testing Filming Rig

August 12, 2015

Yesterday, a couple of my interaction design folks came to my office with a webcam and a cheap light from Ikea.  They had had the brilliant idea of mounting the webcam on the light so they could film usability testing on our mobile app.  The masking tape version they had assembled worked fine for internal testing, but tomorrow they’re heading to a branch at VanCity to test with real members, and they wanted something a little more professional-looking.

It turns out this Ikea lamp is made to be hacked with the niceEshop webcam.  All we had to do was take the reflector out, along with the socket, switch and bulb.  Then it was easy to thread the webcam wire through the hole where the lamp switch had been. The webcam wire has a little rheostat along its length to adjust the light brightness, and this needed to be taken apart and reassembled to make it through the hole in the lamp.

I took the reflector home last night to expose it to my hack saw and Dremel tool for fifteen minutes to get rid of the parabolic part of the reflector and to make a place where we can reach the camera on-off button.  Then this morning, I re-installed the reflector with some Blu-Tak to keep the camera from moving around.  If I wanted to be professional about it, I might have used some black silicone, but nobody will see the Blu-Tak anyway.

Don’t get me wrong, I love managing a team of developers, designers and testers.  But occasionally I get to play MacGyver, and that is really fun.

How Productive was our Sprint? A Proposal

August 11, 2015

My search for a good productivity metric continues.  As mentioned, Isaac Montgomery suggests a metric for productivity that relies on releases.  Release 60% of the value of the product, and divide by the cost to acquire that value and you have a productivity metric.

This metric has a few nice features:

  • It doesn’t incur much overhead.  We already know the value of our projects and initiatives at Central 1, or we could come up with something with relatively little cost.
  • It encourages breaking projects into milestones and assigning value to those milestones.  Milestones matter at a macro reporting level: when we speak to our customers, it’s nice to be able to point to concrete artifacts that we have completed.
  • It is easy to normalize and compare across teams, or at least across time.  The individual teams would not be involved in assessing the overall value of their initiatives, and by centralizing we stand a hope of equalizing the value assigned across teams.  The alternative that I’m familiar with, value points, relies on teams or individual product owners assigning the same value points to the same story.

On the other hand, Montgomery’s metric doesn’t provide the rapid feedback that you would want if you were making small adjustments to your process.  In order to determine if you were more productive, you would need to pass several milestones, and that could easily mean a six-month lag between the time when the change is made and the effects are known.  It would be far better if this lag were only a few sprints.

What if we combine my story productivity metric with Montgomery’s metric?  It would work like this: during release planning, we divide the value of the project into releases or milestones, per Montgomery.  At this point, we have a story map, and we could say that if one of the releases were worth $100, say, and it had ten stories in it, then each of those stories is worth $10.  Go nuts!  The challenge with this is that I know that the number of stories in a release grows as we get into it.  Those first few stories were good conceptually, but missed a lot of the nuances, and responding to those nuances is what agile is all about.

To allow for this, we could assign a part of the residual value to the new stories that are added after the first sprint.  In the example, if we produced one story in the first sprint (10%), there are 90% of the value left and 9 stories.  If we then add a new story, each of the ten outstanding stories is worth (90%/10) 9%.  Eventually, we stop adding stories, and the team completes the remaining ones so we can complete the release.

theoretical

Based on this narrative, I would expect productivity to follow a 1/x type of curve over time, eventually stabilizing for the release.  I shall be interested to see how it pans out with some actual numbers from our teams.

Measuring Productivity Using Stories

July 3, 2015

About a month ago, I attended some training on leading high performance teams. There I learned that a single well-defined metric that is perfectly aligned with the team’s performance can help to ignite their performance.  Among other things, this reignited my interest in actually measuring the productivity of my agile teams.

Despite many claims that productivity metrics are a fools errand (McAllister, Fowler, Hodges), I’ve been trying to measure it for years, at least ever since I came to Central 1, and possibly before. Without measuring productivity, the many easily grasped quality metrics are unbalanced, and the team can find themselves in constant-improvement mode, without actually producing anything new.  Without measuring productivity, how do we know that we are being strangled by technical debt?

For several years, I used the number of changes per hour of development. This got better when we jumped into agile with two feet back in 2014; prior to that, there was too much variability in the size of a JIRA ticket – they might represent a small bug fix or a whole project.  By the end of 2014, we were looking solely at the number of stories developed per hour (the reciprocal, hours per story, is more intuitive, but early on we sometimes spent time without producing any stories).

I was often asked why stories instead of story points.  The reason was value.  A very complex story would have a high number of story points, but might have little business value.  Stories, on the other hand, should be the smallest unit of work that can still deliver business value – a quantum of business value.

This metric was pretty good.  It had the immediate benefit of being cheap to produce – simply query JIRA and divide by time.  Moreover, the chart showed a beautiful increase of “productivity” as the team got used to working with agile.

productivity

 

But then a funny thing happened.  We were working on the new mobile branch ATM locator, and the project was producing stories just fine, but it was never concluding.  The problem was in the nature of the stories.  Instead of meaty stories like,

As a user I would like to search for the closest ATM so that I can go get money.

many of them were more like

As a user, I would like the search box to be titled “search” so that I know where to put my query.

Clearly, not all stories are created equal.  I don’t think the team was deliberately gaming the system (there was no benefit if they did) and small stories are a hallmark of a healthy agile team, but surely we cannot ascribe the same level of productivity to a team that is changing the value of a label as one that is enabling search. More to the point, the team was not completing the project!

I feel that the unevenness in story value probably averages out over a sufficiently large team.  However, measuring over the larger team has little benefit in terms of motivating a single agile team.  Across the larger development organization (Central 1 has about 50 developers and 20 testers), we might expect to see an effect if we make a change for everyone, as we did when we moved to adopt agile in early 2014.  However, because the values are not steady, it takes four to six months to be sure of a trend.  On the other hand, it is very difficult to dissect what is happening if no change has occurred, but a trend is detected anyway.

Looking forward, there is a promising-looking blog post from Isaac Montgomery at Rally.  It has the benefit of measuring true productivity, but requires valuation for initiatives, which at Central 1 would be difficult.

 

Does the CAP Theorem have a Second Order?

May 25, 2014

A couple of years ago, we decided at Central 1 that our services should fall on the Availability-Partition Tolerance (AP) side of the CAP Theorem. The assertion at the time was that, at a business level, it is reasonable to accept eventual consistency if we can be always available and partition tolerant. With our old systems, we made that tradeoff all the time, and sorted out the reconciliation issues the next day.

Recently, we were working on implementing Interac Online Payments, which has a fairly complex message flow that includes the POS switching network. The details aren’t important here, but the net result was that we needed to handle a scenario where the first part of a transaction might come to one data center, and the second part would come to the other. Conceptually, it was a bit like the propose and commit in 2-phase commit coming to different data centers.

The system is based on an Active-Active database server pair with two-way replication between them. Unfortunately, we were seeing the commit message come to the remote data center before the propose message was replicated there. Our solution is to try to route the commit message to the same data center as the original propose message. The result is that if the service is unavailable at the location that received the propose message, (even if the propose was replicated) we respond negatively to the commit: we answer inconsistently. Having said that, we can always receive a message, and our system continues to function if the network gets partitioned.

This leads me to wonder if the CAP Theorem has a second order. That is, if I have a data service that is AP, is it impossible for me to create a service on top of it that is Available-Consistent or Consistent-Partition Tolerant?

Structural Quality and the Cost of Maintenance

November 19, 2012

title=”Structural Quality and the Cost of Maintenance”>Structural Quality and the Cost of Maintenance

This short (4:40) interview with a couple of Cap Gemini execs largely speaks to the value of measuring structural quality of code.  CG is using a tool called CAST, while here at Central 1, we use SonarGraph, but I expect they accomplish more or less the same thing.  Right at the end of the interview they propose the idea of using CAST to help them predict the cost of maintenance of an application.

This is an interesting idea, and it speaks to structural problems being the more costly type of technical debt.  That is, it is the debt on which we pay the most interest: working within a code base that is poorly designed is slow and error-prone.

North Shore Outlook Investigates Declining Enrollment

March 9, 2012

The Outlook published an article today that investigates the declining enrollment in North Vancouver schools.

(Thanks to Norwood Queens CA for pointing it out)