Archive for the ‘management’ Category

Improved Agile Release Burndown Metric Reveals More Stories about Teams

September 28, 2015

The Done-Done Completion Rate tells us some interesting information about how our agile teams are doing on a sprint-by-sprint basis.  However, it doesn’t help us to understand whether they will actually hit their next milestone.  The trouble is, the current version that they’re working on may still not have all its scope defined.  That’s all agile and fine and everything, but the reality is, sometimes we want to know if we’re going to hit a date or not.

As mentioned previously, here at Central 1, we use JIRA to manage our agile backlogs.  This tool comes with a handy release burndown chart, which shows how a team is progressing against their release goal.  For example, the chart below illustrates a team who started their version before adding any stories to it.  However, once started, they burned a portion of the release with each sprint, eventually leading to a successful release.  In sprint 27, they added a little scope, effectively moving the goalpost.

ReleaseBurndown

The trouble with this chart is that it supposes that the team is planning their releases (versions in JIRA).  What about teams that have multiple concurrent releases, or those that aren’t really using releases at all.  Are the teams leaking defects into the backlog?  Are they expanding the scope of the current release?

In order to answer these questions, we need to include the unversioned backlog.  I’m considering a metric that I have given the catchy moniker, “Net Burndown for the Most Active Committed Release.”  This starts out with the chart above for the release upon which the team is making the most progress.  Any change in any other releases is ignored, so a team could be planning the next release without affecting their net burndown.  However, if they leak stories, defects or tasks into the backlog without associating them with a release, those items are assumed to be in the current release and included in the net burndown.  Sometimes that’s bad, and sometimes it’s good.  Finally, any unestimated items are given the average number of points for items of that type.

Here is how the chart looks for one team.  This team is somewhat disciplined about using versions, but as discussed before, they leak quite a few defects, and may or may not assign them to a version.  In the chart, the blue area represents the same information as the JIRA release burndown chart.  The red area adds in the backlog outside the release, and you can see that it tends to dominate the blue for this team.  Finally, the green area represents all the unestimated issues (mostly bugs).

netBurndown12

In some cases, like August 12th, the negative progress in the backlog (about -50 points) dominates what appears to have been pretty good progress in the release (about 50 points).  The unestimated issues (about -30 points) leaked onto the backlog make the story even bleaker, and we can see that the team is not making substantial progress toward their release after all.

Contrast this team with a second team, who take a more disciplined approach overall.  The team did some sort of a huge change in their versions in June, which cuts off the chart.  However, since then, we can see that the team leaves very few unestimated issues, and tends to assign issues to releases, rather than working on the backlog directly.  They’re not perfect, however, and struggle occasionally to maintain their velocity; comparing with their Done-Done Completion chart, we can see that this was actually a struggle, as opposed to a planned dip for vacations.  They also seem to be letting go of the versions a little in more recent sprints.

netBurndown5

Done-Done Completion Rate reveals interesting stories for agile teams

September 22, 2015

I wrote about the Done-Done Completion Rate a couple of weeks ago. Since then, I’ve plugged in some data from some of my teams, and revealed some interesting stories, which I would like to share today.

First, let’s look at Team 12.  They have a completion rate (points completed / points committed) that tends to vary between about 60% and 80% (light blue line).  It could be better, but 80% is not bad, considering you want to be reaching a little bit in each sprint.

poinCompletion12

However, Team 12 tends to leak quite a few bugs with each sprint.  They have produced estimates for many of these defects, and so, the team is aware that their backlog is growing almost as quickly as they burning it.  As far as they’re concerned, they have an effective completion rate ((points completed – points added) / points committed) that is lower (medium blue line).

There is, unfortunately, a hidden trap in the unestimated defects.  Allowing for these, and using an average number of points per defect, they even made negative progress for one sprint back in May.  The good news is that they appear to be improving, and since July, they appear to have kept their sprints much cleaner, and we can expect that they have a better handle on when they will complete their current milestone.

These unestimated defects can be the undoing of a team.  Consider for example, Team 7, which has a number of new members, and may still be forming.  About every third sprint, this team seems to fail to meet their commitments, and indeed lose substantial ground.  As a manager, it’s important to dig into the root cause for this.

poinCompletion7

Finally, here is a team that keeps their sprints very clean.  Through extensive automated testing, Team 5 allows almost no defects to escape development.  When they do, they estimate the defect right away.  Notice how their completion rate (light purple) is actually lower than Team 12 (blue above), but when we allow for defects, this team is completing more of their work.  The result is that this team can count on their velocity to predict their milestones, provided the scope doesn’t change.  Of the three teams, this one is most predictable.

poinCompletion5

An Improved Agile Completion Rate Metric

September 1, 2015

As mentioned in my last post, we’re changing our focus from productivity to predictability until we can actually predict how long releases are going to take.  I still believe that we need a single true metric for productivity, but until we have some predictability, our productivity numbers are too shaky to provide any guidance to teams as they look to improve.  I’m looking for a Contextual Temporary Metric, rather than the One True Metric for Central 1 Product Development.

At Central 1, JIRA provides us with a Control Chart, which maps the cycle time for issues.  This is a powerful chart, and provides many tools to manipulate the data to gain insights.  However, it makes the base assumption that all issues are the same size.  One large story can severely change the rolling average for cycle time going forward.

controlChart

A brief search gives some good ideas at the sprint level.

  • From Leading Agile, Number of Stories Delivered / Number of Stories Committed, and Number of Points Delivered / Number of Points Committed.
  • From Dzone, recent velocity / average velocity or recent throughput / average throughput.
  • From Velocity Partners, Number of stories delivered with zero bugs.

None of these considers predictability at the Release or Version level.  We already have pretty stable velocities in our teams, and when there are hiccoughs in velocity, it is for a known reason, like vacations.  So, I started looking at delivery versus commitment, which is, at the crux of predictability.  If a team can’t predict what they can deliver in the next two weeks, there is little hope that they will predict what they can deliver in a few months.

As I started to compile the data for stories and points in each sprint — something that is more difficult than I would like with Jira — I began to see that the teams go through large swings in the number of stories they might attempt, but the number of points stays relatively stable.  Meanwhile, stories are central to productivity, whereas predictability should include all work that the team undertakes, even if that work doesn’t move the product forward.

I therefore focused on the points delivered versus points committed, a number I call the Completion Rate.  The chart below illustrates the outcomes for three teams in the time since April.

point completion rate - raw

It is easy to foresee that a team might easily affect this metric by rushing through their work and leaking a lot of defects into the backlog.  A little further analysis shows that for some teams, like Team 12, this is indeed the case.  Teams 9 and 13 on the other hand leak relatively few defects into their backlog, as shown by comparing the light and dark solid lines in the chart below.

point completion rate

When a team marks a story (or defect) complete, while simultaneously finding a lot of defects, it makes it difficult to predict how long it will take to stabilize the release.  The outstanding effort for the release keeps growing, and the team must not only predict their velocity in addressing the defects, but the rate at which new defects will be found (hopefully one is higher than the other!).

I’m calling the value represented by the dark lines in the chart above the Done-Done Completion Rate:

Done-Done Completion Rate = (points completed – points leaked) / points committed

For the purpose of the analysis above, I used the actual estimated points for defects that were raised during the sprint.  However, in practice, those estimates probably don’t exist at the time of the sprint retrospective, when the team should want to review how they did.  In that case, I would use the average number of points per defect for those defects that haven’t been estimated.

What this metric doesn’t capture is the story that isn’t complete, and yet the team marks it as complete, while creating a new story for the remaining work.  I don’t know if a metric pulled from Jira could accurately detect this situation without penalizing agility; we want to be able to create new stories.

In the absence of a Release-level predictability metric, the Done-Done Completion Rate could help a team to see if they are moving in a direction that would enable them to predict their releases.

Look for Predictability before Productivity for Agile Teams

August 15, 2015

Last week I proposed a productivity metric based on the proportion of a release that has been completed, accounting for story creep in the release.  Not all of our teams are using the version feature in Jira, but the several that are enabled me to perform a little analysis.  So far, I don’t think we have the maturity level required to make this metric work for us, but the data are instructive all the same.

Two teams in particular stand out.  Team 13 was our inital pilot team on agile, and they’ve been practising for about three years now.  Some of their releases look pretty close to what I expected from a productivity standpoint.   Here, for example is a five-sprint release that shows a ramp down at the end.  The last sprint only contributed small productivity because there was only one story left at the end to wrap up.

Team13R3.6

This made me wonder what was going on with their releases, and so, I mapped all of them, and I found a much more chaotic chart.  The chart below shows the raw number of stories completed in each sprint.  The different colours denote different releases (light blue stories have no release).

Team13Stories

Now team 13 actually has a legitimate reason to work on multiple concurrent releases: they manage change to multiple pieces of software.  So, my model would have to accumulate value produced across concurrent releases.  This analysis currently takes a certain amount of Excelling, and so, I looked at Team 12, which has only three releases:

Team12Releases

This is a chart of the proposed productivity metric for three releases.  As can be seen, each release kicked off with a killer sprint in which the team produced over ten per cent of the stories initially defined for the release.  Then a combination of deferred stories and probably bug-fixing killed them after that.  The result is a metric that is too unstable to provide feedback to the team.

Reviewing the shape of their velocity chart confirms the story and also raises some more questions and concerns.  In particular, this team’s velocity is too unstable to make for safe predictions.  So, while this productivity metric might be relevant for long-established teams with stable velocities, I suspect the majority of our teams need to concentrate on predictability first.

How Productive was our Sprint? A Proposal

August 11, 2015

My search for a good productivity metric continues.  As mentioned, Isaac Montgomery suggests a metric for productivity that relies on releases.  Release 60% of the value of the product, and divide by the cost to acquire that value and you have a productivity metric.

This metric has a few nice features:

  • It doesn’t incur much overhead.  We already know the value of our projects and initiatives at Central 1, or we could come up with something with relatively little cost.
  • It encourages breaking projects into milestones and assigning value to those milestones.  Milestones matter at a macro reporting level: when we speak to our customers, it’s nice to be able to point to concrete artifacts that we have completed.
  • It is easy to normalize and compare across teams, or at least across time.  The individual teams would not be involved in assessing the overall value of their initiatives, and by centralizing we stand a hope of equalizing the value assigned across teams.  The alternative that I’m familiar with, value points, relies on teams or individual product owners assigning the same value points to the same story.

On the other hand, Montgomery’s metric doesn’t provide the rapid feedback that you would want if you were making small adjustments to your process.  In order to determine if you were more productive, you would need to pass several milestones, and that could easily mean a six-month lag between the time when the change is made and the effects are known.  It would be far better if this lag were only a few sprints.

What if we combine my story productivity metric with Montgomery’s metric?  It would work like this: during release planning, we divide the value of the project into releases or milestones, per Montgomery.  At this point, we have a story map, and we could say that if one of the releases were worth $100, say, and it had ten stories in it, then each of those stories is worth $10.  Go nuts!  The challenge with this is that I know that the number of stories in a release grows as we get into it.  Those first few stories were good conceptually, but missed a lot of the nuances, and responding to those nuances is what agile is all about.

To allow for this, we could assign a part of the residual value to the new stories that are added after the first sprint.  In the example, if we produced one story in the first sprint (10%), there are 90% of the value left and 9 stories.  If we then add a new story, each of the ten outstanding stories is worth (90%/10) 9%.  Eventually, we stop adding stories, and the team completes the remaining ones so we can complete the release.

theoretical

Based on this narrative, I would expect productivity to follow a 1/x type of curve over time, eventually stabilizing for the release.  I shall be interested to see how it pans out with some actual numbers from our teams.

Measuring Productivity Using Stories

July 3, 2015

About a month ago, I attended some training on leading high performance teams. There I learned that a single well-defined metric that is perfectly aligned with the team’s performance can help to ignite their performance.  Among other things, this reignited my interest in actually measuring the productivity of my agile teams.

Despite many claims that productivity metrics are a fools errand (McAllister, Fowler, Hodges), I’ve been trying to measure it for years, at least ever since I came to Central 1, and possibly before. Without measuring productivity, the many easily grasped quality metrics are unbalanced, and the team can find themselves in constant-improvement mode, without actually producing anything new.  Without measuring productivity, how do we know that we are being strangled by technical debt?

For several years, I used the number of changes per hour of development. This got better when we jumped into agile with two feet back in 2014; prior to that, there was too much variability in the size of a JIRA ticket – they might represent a small bug fix or a whole project.  By the end of 2014, we were looking solely at the number of stories developed per hour (the reciprocal, hours per story, is more intuitive, but early on we sometimes spent time without producing any stories).

I was often asked why stories instead of story points.  The reason was value.  A very complex story would have a high number of story points, but might have little business value.  Stories, on the other hand, should be the smallest unit of work that can still deliver business value – a quantum of business value.

This metric was pretty good.  It had the immediate benefit of being cheap to produce – simply query JIRA and divide by time.  Moreover, the chart showed a beautiful increase of “productivity” as the team got used to working with agile.

productivity

 

But then a funny thing happened.  We were working on the new mobile branch ATM locator, and the project was producing stories just fine, but it was never concluding.  The problem was in the nature of the stories.  Instead of meaty stories like,

As a user I would like to search for the closest ATM so that I can go get money.

many of them were more like

As a user, I would like the search box to be titled “search” so that I know where to put my query.

Clearly, not all stories are created equal.  I don’t think the team was deliberately gaming the system (there was no benefit if they did) and small stories are a hallmark of a healthy agile team, but surely we cannot ascribe the same level of productivity to a team that is changing the value of a label as one that is enabling search. More to the point, the team was not completing the project!

I feel that the unevenness in story value probably averages out over a sufficiently large team.  However, measuring over the larger team has little benefit in terms of motivating a single agile team.  Across the larger development organization (Central 1 has about 50 developers and 20 testers), we might expect to see an effect if we make a change for everyone, as we did when we moved to adopt agile in early 2014.  However, because the values are not steady, it takes four to six months to be sure of a trend.  On the other hand, it is very difficult to dissect what is happening if no change has occurred, but a trend is detected anyway.

Looking forward, there is a promising-looking blog post from Isaac Montgomery at Rally.  It has the benefit of measuring true productivity, but requires valuation for initiatives, which at Central 1 would be difficult.

 

Inspired by Atlassian’s Fedex Day

December 21, 2011

My team has been after me for years to implement something like the Google 20.  Well, I’ve never felt I can afford to work only four days a week on delivering what the business asks for — 6 would be preferable.  So, we never did it.  However, Atlassian came up with the idea for Fedex days a few years ago, and this seemed a much more sellable idea, especially if we did it in December when things are starting to slow down a bit.  This year we tried it out.

We changed it a little from their format, but looking at their FAQ, there are some things we should adopt, like grabbing screenshots on the last morning in case the project breaks a few minutes before the deadline.  We also made it a two-day event, rather than just 24 hours.  Our environment is complex, and it could easily take a day just to get started.

Noon on Wednesday hit and the energy on the development floor went through the roof! Suddenly little teams of two or three formed all over the place, laptops emerged so people could work at each others’ desks, developers were huddling. Work continued into the wee hours of the morning both days. It was great!

Being the director, I decided to lead by example, and came up with my own project.  Part of my time was eaten by meetings that I couldn’t avoid, but for much of those two days I managed to roll up my sleeves and do some development.  True to form, I decided to start by learning a new language and development environment, and implemented my project in Grails.

By the end of Wednesday afternoon, I’d gone through all the tutorials I felt I needed and started on my actual project, which was to call the Innotas API to create a tool to simplify accepting timesheets.  That’s more or less when I found out that Grails is not all that much help for calling web services.  Oh well, I persevered, and thanks to Adrian Brennan, who was working on another integration with Innotas, I got my application to talk to Innotas by the time I went home, around 3 AM.

The Innotas API is a poster child for the worst API ever.  To do remarkably simple things, you need to cross the network tens of times.  It’s like traversing an XML document one node at a time over the Internet.  But I digress.

Thursday dawned earlier than expected and some of the teams were starting to struggle, including me.  I had more than half the day devoted to meetings that I couldn’t avoid.  Worse, there were no good blocks of time to get in the zone.  I was experiencing first-hand the difficulty with context-switching that my developers go through every day.  Indeed, I only got about two hours of productive time during the day, and came back in the evening.  When I left at 2 AM, I wasn’t the last to leave, and I suspect there were more working from home.

Friday morning flew by, and some of the organizational items that I’d left until the last minute became minor crises – mental note for next year!  However, I managed to get a partial demo working, which meant that at least I wouldn’t embarrass myself in the afternoon.

Suddenly it was noon, and a mountain of pizza was being delivered to our largest meeting room, which attracted the whole team very effectively.  Everyone grabbed some pizza and we called into the conference bridge for the handful of remote workers.  The afternoon would be long.

Atlassian limits their demos to three minutes.  We didn’t limit the demos this year, but next year we will.  A couple of people chose to show documents or presentations that they’d worked on, which I feel is counter to the spirit of the event.  We won’t accept those next year either.

One of the things I’d left until the last minute was figuring out exactly how we would finagle our way into the development VLAN from the conference room.  The challenges of seeing demos on various developer machines while simultaneously using join.me or gotomeeting ate up too much time.  So next year we’ll do a little practice in the week before, and we’ll get two computers going so we don’t have to wait for each demo to set up.  Well, lessons learned.

I hoped for team engagement, skills development and demonstration, and we got those in spades.  I thought we might perhaps get a product idea or two, but I was completely blown away by the number of projects that resulted in something that is almost usable in our products. We got way more value out of this initiative than I expected, and I fully several projects to graduate into our products after a little refinement.

If you’ve thought about Fedex Days for your organization, I heartily recommend finding a quiet time of the year and going for it.

The Myth of Governance

December 5, 2011

In the previous post regarding requirements, it is tempting to think that you could avoid prescriptive or unnecessary requirements with a proper governance structure in place.  In fact, that is the fashionable reaction when any project artifact is found to have deviated from the path.  If only we had a proper review and sign-off procedure, everything would stay the course.

Now, anyone knows that review and signoff takes time.  If you want my time to review a 50 page document, you’ll be waiting 3 days to a week. If it’s 100 pages, I’ll get it back to you in at least a week.

The requirements document in the previous post was about 200 pages long.  Think about that.  200 pages is the length of a novel. Except if you picked up a novel that had the same character development and story arc as a typical work document, you’d put it down after reading the fist chapter. The quality of the attention you’re able to give the work drops off significantly after about 40 pages.

Even the author can’t pay attention past page 40. That’s why it’s common to find documents that contradict themselves.

This, along with a desire to parallelize writing and reviewing is why we often see these big documents released in chapters.  But then we gain opportunities to miss whole aspects of the subject. The document really needs to be considered as a whole.

So, governance in the form of review and sign-off is slow and error-prone.  You might be able to compensate for the errors and inattention by slowing down further.  Give me more time to review, and maybe I’ll be more careful and won’t miss things.

The real problem, however, is that review-based governance doesn’t scale. If the overall direction sits with one person, and they must review every decision, then the organization is limited to that reviewer’s capacity.

Well, obviously you scale by adding more reviewers.  But how do you ensure that the reviewers all agree on the same direction and vision?  Even if they all think they agree on the direction and vision, they will have to interpret it and apply it in specific circumstances.  Who watches the watchers?

In the end, we introduce documentation and review because we don’t know how else to ensure that our staff are producing what we expect.  However, if we think we’re going to actually ensure they produce what we expect through review, we’re dreaming.

What we really want is self-government, and I think a few organizations have done this well.  With self-government, the leadership clearly communicate a broader vision or path toward the future, and then motivate their staff to work toward the shared goal.  If you can sufficiently communicate the idea, and convince everyone to support it, then you should not need governance.

Technical Debt and Interest

August 9, 2011

Since installing Sonar over a year ago, we’ve been working to reduce our technical debt.  In some of our applications, which have been around for nigh on a decade, we have accumulated huge amounts of technical debt.  I don’t hold much faith in the numbers produced by Sonar in absolute terms, but it is encouraging to see the numbers go down little by little.

Our product management team seems to have grabbed onto the notion of technical debt.  Being from a financial institution they even get the notion that bad code isn’t so much a debt as an un-hedged call option, but they also recognize that it’s much easier to explain (and say) “technical debt” than “technical unhedged call option.”  They get this idea, and like it, but the natural question they should be asking is, “How much interest should they expect to pay should we take on some amount of technical debt?”

In the real world, debt upon which we pay no interest is like free money: you could take that loan and invest it in a sure-win investment, and repay your debt later, pocketing whatever growth you were able to get from the investment.  It’s the same with code: technical debt on which you pay no interest was probably incurred to get the code out faster, leaving budget and time for other money-making features.

How do we calculate interest, then?  The interest is a measure of how much longer it takes to maintain the code than it would if the code were idealized.  If the debt itself, the principal as it were, corresponds to the amount of time it would take to rectify the bad code, the interest is only slightly related to the principal.  And thus you see, product management’s question is difficult to answer.

Probably the easiest technical debt and interest to understand is that from duplicate code.  The principal for duplicate code is the time it would take to extract a method and replace both duplicates with a call to the method.  The interest is the time it takes to determine that duplicate code exists and replicate and test the fix in both places.  The tough part is determining that the duplicate code exists, and this may not happen until testing or even production.  Of course, if we never have to change the duplicate code, then there is no effort for fixing it, and so, in that case, the interest is zero.

So, I propose that the technical interest is something like

Technical Interest = Cost of Maintaining Bad Code * Probability that Maintenance is Required

You quickly realize then that it’s not enough to talk about the total debt in the system; indeed, it’s useless to talk about the total debt as some of it is a zero-interest, no down-payment type of loan.  What is much more interesting is to talk about the total interest payments being made on the system, and for that, you really need to decompose the source code into modules and analyze which modules incur the most change.

It’s also useful to look at the different types of debt and decide which of them are incurring the most interest.  Duplicate code in a quickly changing codebase, for example, is probably incurring more interest than even an empty catch block in the same codebase.  However, they both take about the same amount of time to fix.  Which should you fix first?  Because the interest on technical debt compounds, you should always pay off the high-interest loan first.

The Problem with Templates

March 17, 2010

As technical teams mature, one of the remedies for the many ills that come from growth is the addition of process.  These processes call for documentation, and someone generally kicks off a template to make these documents easier to produce.  As we learn more, we add sections to the templates to ensure we don’t repeat mistakes, or at least remember to consider the factors in subsequent initiatives.

So far so good.  The organization is learning and improving with every project.

Unfortunately, document templates often wind up looking a lot like forms.  That makes people want to fill in all the sections (often improperly), and that leads to bloated documents that don’t even fulfill their purpose.

Take, for example, a fairly typical waterfall model of software development.  There is a requirements document, followed by a design document.  Often the design document template will include a section called something like, “architecturally significant use cases.”  It is tempting to simply grab all the use cases from the requirements document and paste them into this section, especially when there are sections on logical, physical, deployment, data and code architecture yet to write.

Apart from the obvious problem with cut and paste, the inclusion of all the use cases fails at the most basic level to communicate the significant use cases.   The document fails.

I don’t have a good answer to this unless it is to provide only a high-level template for much of the document along with a description of how the document should work.

For example, that design document starts with architecturally significant use cases that drive the choice of logical components.  The logical components find places to live in executables and libraries, which are documented in the physical architecture section and those executables find homes in the deployment architecture.  In order to write a sensible design document, an author has to understand this flow; and seeing the headings in the template isn’t going to help.

In most cases, the document template is not the place to learn.  It should stay high-level, and force its authors to think through the process of writing the document.  We still need a place to ensure projects can impart their wisdom to subsequent projects, but the place to do this is in a checklist, not in a document template.

So, if you’re thinking of creating a template, think about creating a short (!) explanation of how a document of this type should be organized so that it communicates.  Add a checklist to the explanation, and do it all in a wiki so that those who come after you can help the organization learn.