Experimentation: Measuring AI's impact of Developer Productivity

Oct 27, 2024

Samuel Akinwunmi

Over the past few weeks, I've spoken with about 40 engineering leaders in San Francisco, and the same theme has cropped up a few times now, "how do we measure the impact of AI on developer productivity?"

Experiments have always been the most reliable way to measure the impact of any intervention. In this article I'm going to summarize a couple of research papers I've read. They are pretty good guides on how to set up experiments for your team!

What does the research tell us?

A research team from Princeton, MIT, Microsoft, and UPenn recently published a comprehensive study on Copilot's impact on developer productivity. The scale is significant - three experiments spanning over 5,000 developers across Microsoft, Accenture, and a Fortune 100 electronics company.

The numbers are compelling. Across all experiments, teams using Copilot saw:

26% increase in completed pull requests
13.55% increase in commits
38.38% increase in builds

Perhaps most interesting is how the impact varied by experience level. Less experienced developers not only adopted the tool more readily (9.5% higher adoption rate), but also saw substantially higher productivity gains (21-40%) compared to their senior counterparts (7-16%).

One notable insight: 30-40% of developers opted not to use Copilot at all. Access alone doesn't drive adoption.

Real-world validation

These findings align closely with what we're seeing in production environments. Recent data from a US fintech company we are working with (1.5K engineers) shows that developers actively using Copilot - defined as weekly usage - ship 24% more pull requests than non-users. These PRs also tend to be larger in terms of code changes.

The time savings are tangible: 28% of their engineers report saving at least an hour per week, with 11% saving two or more hours. For a team of this size, that translates to significant engineering capacity unlocked.

Setting Up Your Own Experiments

Here's how I'd consider setting up experiments for measuring AI's impact in your organization:

Define Clear Metrics:
- Quantitative metrics: PR completion rates, commit frequency, build success rates
- Qualitative metrics: Developer surveys, time-saving estimates
- Code quality indicators: Build success rates, code review feedback
Create Control Groups:
- Consider staggered rollouts
- Compare active vs. non-active users
- Account for experience levels and tenure
Account for Variables:
- Developer experience and seniority
- Project complexity
- Team size and structure
- Previous exposure to AI tools

Key Takeaways

Impact Varies by Experience: Both studies show that junior developers tend to benefit more from AI tools. This suggests a need for targeted training and support based on experience level.
Adoption Requires Strategy: With 30-40% of developers not adopting AI tools, organizations need to focus on:
- Clear guidelines for tool usage
- Training programs
- Culture of knowledge sharing
- Understanding and addressing adoption barriers
Measurement Must Be Comprehensive: Don't rely on a single metric. Combine:
- Quantitative productivity metrics
- Qualitative feedback
- Code quality indicators
- Time-saving measurements

Conclusion

The evidence from both academic research and real-world implementation suggests that AI tools can significantly boost developer productivity when properly implemented. However, the impact varies based on factors like developer experience, adoption rates, and implementation strategy.

For engineering leaders looking to measure AI's impact, the key is to set up structured experiments with clear metrics while accounting for various factors that might influence the results. Remember that successful implementation goes beyond just providing access to tools – it requires a comprehensive strategy for adoption, training, and measurement.

At Bilanc, we help engineering teams measure and improve their developer experience and productivity through data-driven insights. Our platform integrates with GitHub and Linear, and uses AI-powered surveys to give you comprehensive visibility into your team's performance. Set up takes just 5 minutes – reach out to learn more!