Tuesday, March 09, 2010

Goals, Behaviours, ROI and Testing Ideas

You're developing a great web application, training course or social software strategy. You cost it and it looks like a substantial piece of work, so you need to get executive approval. Putting together a list of the benefits you find to your dismay that the list is full of terms like "easier", "more efficient", "more usable" and "collaborative". You know your executive will ask questions this list doesn't answer, "How will you know you're successful?" and "What's it worth to us?".

There are two common (and wrong) approaches to this question:

  1. It's Web 2.0/training/social, it's intangible and can't be measured.
  2. It should add to the bottom line, so just measure revenue changes.

The first one incorrectly assumes that intangibles can't be assessed, but market analysts know that they certainly can, and indeed regularly are, just ask what Google's stock valuation is based on.

The second one jumps to the über-goal and ignores the fact that as everything the organisation does can potentially affect revenue, it will be nigh impossible to really know whether any change in the bottom-line is the result of this particular web application, training or social software strategy (you can easily have a great success in one area cancelled out by a horrible mess in another).

Fortunately this problem is well known. Unfortunately the answer is hard. That is, hard as in deciding what you want for your birthday, not hard as in needing a PhD in Applied Rocket Science.

Find your desired user behaviours

The bottom line is that you need to find what change in user/trainee behaviours will support your business goals and then use that to derive metrics that can be measured (and eventually valued).

David Maister talked about the importance of identifying desired behaviours in his article Why (Most) Training Is Useless.

There is no point putting on skills training if there is no incentive for the behavior; the people don’t believe in it and they don’t yet know exactly what it is they are supposed to be good at! ...

What behaviors by top management need to change to convince people that the new behaviors are really required, not just encouraged? If the behavior is going to be optional, then so should the training be.

In other words, if we don't know what behaviours we want training to encourage, or if we don't actually support the display of those behaviours (e.g. through measurement and feedback) then we can't expect our training to actually help our business goals.

909064_59906845

In exactly the same way software applications (not just web or social ones!) are often built without thinking what change in user behaviours is really desired or how the change in user behaviours will be measured - thus guaranteeing that even if success is achieved there is no way of valuing it.

ROI for difficult stuff

Adaptive Path introduced a way of valuing user experience by reflecting on the change in behaviours we desire to solve our business problem. They gave us the graphic below as a way of envisaging how business problems map to user behaviours and how they in turn can be mapped to valuable financial metrics.

AdaptivePath-ValueChain Source: Adaptive Path

via Marina Chiovetti at ThoughtWorks

  • Business Problem = a specific problem you want to affect
  • User Behaviour = a change in user behaviour that would introduce the effect desired
  • Behaviour Metric = a way you can measure the change in user behaviour
  • Value Metric = the dollar value we can apply to the behaviour metric
  • Financial Metric = the expected amount of user behaviour change to come from the project, multiplied by the value metric and compared to the expected cost of the project

A specific example might help make this clearer. Imagine you want to increase leads and you decide to do this by improving your website's ability to elicit customer responses via the contact us form:

AdaptivePath-ValueChain2 Source: Adaptive Path

If we add some case studies to the website, and then place the contact us form at the bottom of each one as a call to action then we might expect some increase (assuming this was a new initiative for our website).

The behaviour metric for this is the number of leads received from the website contact form per month. Our sales team has estimated that 1 in 10 leads becomes a sale, worth on average $1,000 to us (whether revenue or profit depends on what you are most interested in - the smart money is on profit though).

Based on our website's traffic patterns we expect to increase our leads from 7 per month to 20 per month by implementing this measure. If we are happy to get a breakeven return on investment (ROI) over the next 3 months then we could invest $1,000 * (20 - 7) * 3 = $39,000 in the project.

Before someone tries to tell me I don't know how to value project returns, I know that you could use Net Present Value (NPV), Internal Rate of Return (IRR) or other methods to ensure this is comparing apples with apples across projects, time and investment opportunities - but that's not the point of this post.

Now we might well spend a fraction of that on case studies and a contact form, but if we wanted we could now justify getting a professional copywriter to help shape up our case studies and spend a bit on usability testing to ensure our contact forms get out of the way and give us the leads as easily as possible.

808213_49563806

Okay, okay, this is a fairly simple example, with straightforward metrics. After all, we might well not know how much a lead is worth, and often we're asked to solve business problems far more complex than this one.

What if I’ve got no history (i.e. I’m a startup)?

We have used this process with startups to help them prioritise wildly diverse feature lists when they do not have the budget to afford to build it all in their launch timeframe.

(Aside: There is a lesson here about launching a simple, compelling product and then iteratively adding value to it based on user feedback, the problem is that in many startup ideas there is usually no simple, compelling offering. It's the fault of funding requirements, but more about that in another post.)

Startups do not have financial history to fall back on, and even with good financial modelling they might only have a guess to give you for their value metric. In this case we take the prioritisation this gives us and keep the rest of the information for analysis of the success of the startup down the track and (as importantly) for helping us identify which changes we should look at implementing later.

What if you don’t know what to change?

The methodology is still useful even when you have a complex situation with incomplete information. If nothing else it focuses your attention where it belongs, on the problem you are solving and the user/trainee behaviours that you think will help solve it rather than on the application technology or the subject matter of your training courses.

Of course you often don’t know what to do to create a particular desired behaviour. In this case a bit of prototyping and A/B testing can go a long way. Google Website Optimizer does a great job of helping you do this with content or separate functional pages, but it does require two separate pages exist, and sometimes you just want to tweak the way a particular feature works (e.g. adding a couple of extra fields to a form).

Assaf Arkin has created a plugin for Rails called Vanity that supports A/B testing in a rather unique way, by creating an easy way for developers to embed the tests into their code and then run them for a set number of iterations and/or exceed a set probability for one option over the other.

By using a simple API, and elegant admin functionality, Vanity provides a very viable way of testing one idea against another. While the time to make such a change is greater than not doing the A/B test – the marginal extra cost is small enough that you do it in order to find out which option really works better.

Assaf has a great post explaining how this is really Experimental Driven Development in action. He explains the cost/benefit tradeoff this way in the comments section:

“You start with an idea for a change that will improve your software. Your baseline cost of development is having both alternatives — before and after the change. Without EDD these alternatives will be separate in time, with EDD you’re going to have some overlap (the duration of the experiment).

For the experiment, you only need a skeletal implementation, you’re not committed to fully developing the feature until after it proves itself. For small changes it makes little difference, the cost is the same.

For complex changes, you can save a lot by not fully developing features that don’t matter. You’re going to know whether a feature matters or not quickly enough, and with data to back up, that you can make the decision to *not* develop it further.

You can also kill unused features early. So these are two ways to reduce development costs using EDD.”

Summary

Get behind whatever you are doing and try to understand the underlying business goals, the user behaviours that would support those and derive the ROI from that. If you are not sure what would best support the change in user behaviours, then try A/B testing to establish which way you should jump. Whatever you do, don’t allow yourself to be sucked into doing something just because your competitors did, or to be “simpler”, or more “usable” in some undefined, unaccountable, way.

Thursday, March 04, 2010

Sydney Scrum Meetup (March)

Jeff Sutherland and Jens Østergaard visited the Sydney Scrum Meetup last Thursday. We had some great pizza and then they got everyone doing the Nokia test!

My Test Results

Iteration length, 2 weeks – 10

Testing, customer acceptance testing – 8

Agile specifications, poor user stories – 4 (good enabling specifications might be 3-5 pages long – before sprint planning)

(for venture companies Jeff finds that 2 sprints planned is necessary)

Product owner, product owner with backlog – 5 (or 2 with last project!)

Product backlog, single product backlog - 3

Estimates, backlog estimated by BA – 0

(Jeff said that usually as teams get better their stories get smaller and eventually are about the size of tasks, and then estimating changes to use points vs hours – surprising what can be untangled and done separately)

Sprint burndown chart, no chart, but team knows velocity – 0 + 3 + 2

(Jeff mentioned that partial completion of tasks creates a high-risk environment)

Team disruption, project leaders telling people what to do – 3

(self-organise to maximise velocity)

Team, no emergent leadership - 1

Total = 4.33

Most of the room were between 2 and 5, and apparently, most of OpenView’s venture companies start out around 4, but when they work on it they end up around 6. It looks like TIDC might have a great Scrum team as someone from there scored 8 for their team! (they were the only one higher than 5)

If your reference stories change then so do the story points. But Jeff said it is really the delta in velocity that is interesting. One of the symptoms of hyper-productive teams is that you get asked to slow down!

  • Sustainable pace
  • Quality
  • Velocity
  • “Balanced Scorecard” (not right term, but Jeff couldn’t remember what it was)

Jens recommends the XP game for teams to learn about velocity.

How long does it take to fix a bug from CI/automated testing. If longer than 2 hours then create an impediment list and get the Scrum master to remove them. Simple metric, track the day you start a story and the day you finish a story and then compare to your standard process efficiency (usually around 20% – measures the quality of your backlog) {calc elapsed days versus theoretical ideal days based on velocity}. Raising the efficiency meant getting the backlog well enough detailed that you spend less time waiting to do a particular item. Measuring the time to Done.

Speed of testing is usually the bottleneck, and is more important than the speed of coding. So interrupt devs immediately with a bug, rather than leaving them to finish what they’re doing. The availability of testable code is another way of thinking of it.

Just evaluate the code in production is one way of incentivising people.

I asked about how to handle client projects using Scrum, Jeff mentioned that Systematic is doing big fixed price contracts using Scrum. They provide 2 bids on every project, with the Scrum bid at around half the price.

In response to someone’s question about Scrum, Jeff mentioned he’d written about the answer in an article titled Future of Scrum from the Agile 2005 conference.

Metrics to Live By

Jeff got me hooked on metrics as it is obvious when talking to him that empirical process control requires metrics to be successful. Some of these metrics assume that you use story points for estimating, which we have done with some success on one project. To do this well you need a range of reference stories of various sizes and rough sizing (e.g. Fibonacci numbers) that the team can agree on the size of.

Here are some of the metrics Jeff mentioned:

Backlog Velocity = story points/sprint

Sprint Velocity = story points ‘done’/day

Efficiency = story’s ideal days/actual elapsed days from ‘start’ to ‘done’
(if an item should have taken 2 ideal days, but it actually took 10 days from the ‘start’ to the ‘done’ date, then you have 20% efficiency)

Churn = % of ‘done’ items that testers send back to developers for fixing
(if all items churns once then you have 100% churn, if half of those churn again then you have 150% churn! Ignore size of item when calculating)

Jeff has an upcoming conference session on this very topic and he discusses it further in his Excel Spreadsheet for Hyperproductive Scrum Teams post. The spreadsheet seems a little hard to get into, but I’ve given it to our Solutions team to review.