*Suppose 3 UI changes go out. How do you decide how much each increased users?* ...

btilly · on Feb 10, 2010

Let's see. Your response to the 3 UI changes question is Measure the change in usage for each screen or section of the app that each team worked on. From that I conclude that you've probably never done A/B testing and discovered how difficult those deltas are to accurately measure, or discovered how changes upstream waterfall down through your site. The amount of data you need to narrow down performance changes to within, say, 5% is much more than most people realize. And from a business view you don't want to do that for the simple reason that once you have a demonstrated improvement, continuing the measurement to figure out the exact win is costing you real money.

If, as with several of my employers, people take a long time from engaging with your website to making money for you, the tracking problems get MUCH more challenging. When you add complex social dynamics on top of it (such as person A getting person B to make a purchase), you're in for a world of fun. (Yes, I've worked on a site that had to solve exactly this problem.)

Measuring support calls per user is also not as simple as it looks. Different parts of your site generate support calls at different rates. And it isn't always obvious what part of the site generated the call. This gets worse if you have the long lead time issue that I've faced multiple times.

As for whether QA is viewed as a roadblock, that depends strongly on team dynamics. But in places I've seen that had that bad mindset, things can get very, very bad.

On the randomizing tests, sure you can limit how many users are shown the option. But when someone makes changes that introduce technical debt on the back end, that debt and resulting bugs can't be segregated out so easily. Sometimes you really just need to say no to features. Anything that limits your freedom to do that is a Bad Thing.

Now don't get me wrong. I like the idea of performance based pay. My current employer (Google) is all about performance pay. But you have to structure the incentives right. Giving people pay after the fact based on feedback from people around you creates incentives to not just optimize a flawed metric, but to work together as a team and get stuff done.

This is important enough that while I'd read with interest what the experience of this experiment was like, I'd personally not want to be involved in anything like it until I'd been convinced that it really works like you hope it will.

bokonist · on Feb 10, 2010

From that I conclude that you've probably never done A/B testing and discovered how difficult those deltas are to accurately measure, or discovered how changes upstream waterfall down through your site

We've done it before. We've launched new screens, measured the usage, then done a bunch of changes to the screen and seen usage go up. We also have a lot of good data on which screens generate the support call, data on the conversions of all the different screens, etc.

But again, your points are all good. We have pretty good data. But it's nowhere near as simple and clear as the same type of measurement for a sales person. Would the data be good enough to base pay on? I don't know.