Maybe it’ll help you out?
Previously on Locally Sourced: does anybody read this part? Hi, it’s Noel, how are you? Last time, I wrote about pair programming, and there were a couple of discussions on Twitter (where you can follow me @noelrap) that came down to what would make you choose to have your team do pair programming or how you would evaluate it. I thought it was worth a follow-up…
After I posted about pair programming, a discussion I had on Twitter made me think about how I evaluate processes and tools, and how I think about team performance.
This is much, much more structured than I actually am in practice – just the act of writing this down makes me seem way more systematic than I really am. And I probably left something out. But it’s broadly the way I approach technical change on my teams. For the purpose of making this more concrete, I’ll follow this discussion of two specific changes. One is an actual decision to use Stimulus over React in a previous project, and the other is a more hypothetical decision to do more pair programming on a new team.
In general, any decision I make about how a team runs falls back to the following three things.
- You get what you measure, but you may get it in the worst way possible. This goes doubly true if you make a big deal out of the measurement, because so many coding metrics can be easily gamed. Pick a few different things, don’t obsess over any single one of them. I mentioned in a previous essay which metrics I’d pick.
- All else being equal, it’s better to have too little process than too much process. You can always make up for too little process as needed, while time spent on too much process is gone forever.
- All else being equal, pick the side that makes the developers more satisfied. You can take this one too far, but erring on the side of the people doing the work is a decent place to start.
Questions to ask
When looking at a new tool or new process, here are some of the things I think about.
What problem are we trying to solve? Is it the most important problem the team has?
This may be the question I ask in these kinds of meetings more than any other. Being specific about the problem helps.
For the Stimulus choice, we had a new team, a set of requirements that included a complex form with lots of client-side behavior, and we were looking for a tool to implement them.
Adding pairing could be a potential solution to a lot of problems, is it to improve communication? Productivity? Code review? To allow onboarding of new team members? Any of those are plausible, but you’d evaluate them differently. For the sake of argument, let’s say we are trying to improve code quality, and therefore long-term productivity.
How does this tool or process solve our problem? Do existing users of the tool or process have similar problems? Do those users have similar scale?
A real thing to watch out for is picking up a tool that is being used by teams with different problems. A rule of thumb is if the tool is primarily being advanced by teams that are an order of magnitude bigger than yours, there’s a really good chance its not for you. Big teams have way more process needs than small teams due to the exponential increase in communication paths. Using big team process on a small team leads to over-process or over-engineering.
In the React/Stimulus case, both tools would solve our technical problem. React seemed to be pitched to more of a Single Page App, which we didn’t want, whereas Stimulus seemed to be targeted more at the kind of app we were building. It turned out that our app was more complicated than we originally thought, but not so much more complicated that I wished we had done a single page app.
For pair programming, there are teams that do use pair programming heavily to manage code quality, but there aren’t a lot of them. The size of those teams is roughly consistent with the size of our team, pairing basically passes this question.
What are the costs of incorporating the thing? Who needs to buy in to it?
For costs, you should try and think of up-front costs to get started, plus medium-term costs as the project is active, plus long-term costs as the project continues.
For Stimulus/React, there’s the initial cost of learning the tool, since the team was not very familiar with either (much higher for React). The mid-term cost of dealing with the code, which we suspected was higher for React because we suspected the React code would be more complex. Long term costs might include finding future developers, which I considered kind of even, more of a chance of finding React devs, but Stimulus is easier to learn. Oh, and I had final sign off on the tool choice, so only the devs on the team needed to buy in. (I really, really wanted to try Elm, actually, but decided the short term cost of teaching it and the long term cost of maintaining it would outweigh any mid-term cost improvement from using it.)
For pair programming, it appears at first that there’s no start-up cost. But I think there are mid-term costs. There’s going to be some adjustment to the new way of working, some negotiation of exactly what gets paired on and what doesn’t. I’d almost certainly want to do some training on effective pair programming. And, of course, there’s the potential cost of limiting the amount of parallel things that the team could work on at a time. Buy in needs to be not just the team, but typically some management needs to sign off on the adjustment. Enough cost to make me think twice, not enough to turn me away at this point.
What happens if the solution doesn’t work? How will I know if it isn’t working? What can I do to mitigate the risk of it not working?
I often find it easier to imagine the failure state than the success state. Or maybe that I find it easier to imagine a unique failure state, where the success state tends to look similar for all these tools – better code, more satisfied team. Also, the goal isn’t to run a perfect project, it’s to run a successful project, so failure-avoidance is a reasonable strategy.
For the Stimulus/React decision, I had slightly different concerns in each case. In Stimulus, the concern was that we would need to do something that was off the edge of the tool, and therefore hard or impossible. For React, the concern was that the code would get very complex, very quickly, and it would be hard to make future changes. In both cases, the “not working” signal would be pretty clear. It seemed like the Stimulus failure case would be somewhat easier to walk out of given our scope. To test this a little, I did a spike of some of the form code in both, which confirmed to me that the React complexity issue would be real. (In retrospect, I probably could have mitigated some of the complexity by using a central value store, which I originally thought would be overkill, but by the end of the project probably would have been reasonable). After all that, I felt that Stimulus was less risky, and we went forward with it.
For pairing to improve code, we have the problem that “code quality” is hard to operationalize. There are some failure states that are easy to see – we might find that nobody wants to pair, or we might find that certain people are not good at pairing, or that the team doesn’t have enough trust to make paring work. Those signals, I think, would be easy to see. If the team’s velocity dropped in half, I think that would also be easy to see (though maybe not if it was a brand new project and we had nothing to compare it too). But I think it’d be harder to see a subtle degradation in velocity, so I might want to think about things to look for (beyond merely the subtle degradation in velocity, which might also be visible). I would be concerned if we started to see hard bugs, or if we did not seem to get the knowledge sharing expected. My mitigation strategy here would definitely include frequent retrospectives to get the team’s opinion on whether pairing was effective.
Ways to evaluate
This is the tricky part.
The real spirit of agility, I think, comes in here and involves.
- Frequently checking in to see what the team thinks the ongoing problems are.
- Making an action item that will incrementally improve the situation.
Often it’s hard to get your head around trying to solve the problem all at once. It’s easier to think of something that will make the problem slightly better. Sometimes, you’ll get lucky and your incremental improvements will make the real problem clear and hopefully more manageable.
A couple of things to look for:
We all say that developers are bad at estimating, but I think that can be overstated. Developers tend to be pretty good at understanding the face complexity of a task, but often get tripped up on the logistics and the complexity of integrating with the rest of the system. As such, if the team as a whole is continually missing estimates, that often means that there is an increasing amount of hidden complexity in the code.
Teams I’ve been on have had dramatically different approaches to team metrics, from gathering none at all, to tracking all kinds of things. I can’t really say there’s a correlation – I’ve been on good and bad teams that didn’t formally track metrics, and the difference was largely in how the team communicated and tried to improve. Velocity is a reasonable thing to track, as long as you understand that it’s limited and don’t take it too seriously.
Developer satisfaction is a reasonable thing to track, but making it replicable by asking the same questions over and over tends to bore people, you get more interesting answers by asking different questions, but it’s less objectively measurable.
Another thing I like is a risk board, where people on the team keep track of the biggest risks to the process and then discuss how they might be mitigated or avoided. I think that making those kinds of concerns explicit, you put them in a place where they can be discussed.
I think that a lot of code and process issues are manifestations of a lack of trust or communication inside the team, and it’s often worth trying to figure out the underlying issue before trying improve it.
Pulling the examples forward. In the real-world case of Stimulus, we went with Stimulus and mostly didn’t look back. We did kind of keep an eye on whether we were still able to deliver the front end stuff as it got more complicated, but although we wound up doing some fairly weird stuff, nothing was a huge blocker. By the end of the project, we were getting close to the place where we might have been better supported by React, though looking at the new turbolinks stuff, I think that would have been even better.
In they hypothetical pairing for quality case, I think I’d do a retro specifically about pairing practices after about a month, and look to the results of that to improve our pairing practices. I’d specifically check in on junior members or anybody else I think might have a problem with pairing outside a group setting. As for code quality, metrics are hard, but you should be able to see the difference between a team of ten and a team of five somewhere in the velocity. My guess is that if the pairing wasn’t working at all, that’d show up in velocity never quite being what you expect. The tricky thing would be if it was kind of working, that would be hard to see. I’d defer to developer satisfaction, developers often have a good sense if things are working (within limits – people are often reluctant to try new things, and can be enamored of complexity for complexity’s sake).
I hope you found that helpful. Please let me know what you think by commenting here or by following me on Twitter @noelrap. And if you like this, please share and tell your friends and colleagues. Thanks!