Experimentation: Measuring AI's impact of Developer Productivity
Oct 27, 2024
Over the past few weeks, I've spoken with about 40 engineering leaders in San Francisco, and the same theme has cropped up a few times now, "how do we measure the impact of AI on developer productivity?"
Experiments have always been the most reliable way to measure the impact of any intervention. In this article I'm going to summarize a couple of research papers I've read. They are pretty good guides on how to set up experiments for your team!
What does the research tell us?
A research team from Princeton, MIT, Microsoft, and UPenn recently published a comprehensive study on Copilot's impact on developer productivity. The scale is significant - three experiments spanning over 5,000 developers across Microsoft, Accenture, and a Fortune 100 electronics company.
The numbers are compelling. Across all experiments, teams using Copilot saw:
26% increase in completed pull requests
13.55% increase in commits
38.38% increase in builds
Perhaps most interesting is how the impact varied by experience level. Less experienced developers not only adopted the tool more readily (9.5% higher adoption rate), but also saw substantially higher productivity gains (21-40%) compared to their senior counterparts (7-16%).
One notable insight: 30-40% of developers opted not to use Copilot at all. Access alone doesn't drive adoption.
Real-world validation
These findings align closely with what we're seeing in production environments. Recent data from a US fintech company we are working with (1.5K engineers) shows that developers actively using Copilot - defined as weekly usage - ship 24% more pull requests than non-users. These PRs also tend to be larger in terms of code changes.
The time savings are tangible: 28% of their engineers report saving at least an hour per week, with 11% saving two or more hours. For a team of this size, that translates to significant engineering capacity unlocked.
Setting Up Your Own Experiments
Here's how I'd consider setting up experiments for measuring AI's impact in your organization:
Define Clear Metrics:
Quantitative metrics: PR completion rates, commit frequency, build success rates
Qualitative metrics: Developer surveys, time-saving estimates
Code quality indicators: Build success rates, code review feedback
Create Control Groups:
Consider staggered rollouts
Compare active vs. non-active users
Account for experience levels and tenure
Account for Variables:
Developer experience and seniority
Project complexity
Team size and structure
Previous exposure to AI tools
Key Takeaways
Impact Varies by Experience: Both studies show that junior developers tend to benefit more from AI tools. This suggests a need for targeted training and support based on experience level.
Adoption Requires Strategy: With 30-40% of developers not adopting AI tools, organizations need to focus on:
Clear guidelines for tool usage
Training programs
Culture of knowledge sharing
Understanding and addressing adoption barriers
Measurement Must Be Comprehensive: Don't rely on a single metric. Combine:
Quantitative productivity metrics
Qualitative feedback
Code quality indicators
Time-saving measurements
Conclusion
The evidence from both academic research and real-world implementation suggests that AI tools can significantly boost developer productivity when properly implemented. However, the impact varies based on factors like developer experience, adoption rates, and implementation strategy.
For engineering leaders looking to measure AI's impact, the key is to set up structured experiments with clear metrics while accounting for various factors that might influence the results. Remember that successful implementation goes beyond just providing access to tools – it requires a comprehensive strategy for adoption, training, and measurement.
At Bilanc, we help engineering teams measure and improve their developer experience and productivity through data-driven insights. Our platform integrates with GitHub and Linear, and uses AI-powered surveys to give you comprehensive visibility into your team's performance. Set up takes just 5 minutes – reach out to learn more!