Inside Amazon’s New Framework for Measuring Developer Productivity

Inside Amazon’s New Framework for Measuring Developer Productivity

The News

At AWS re:Invent, Bethany Otto, a subject-matter expert on the Software Builder Experience team, briefed analysts on Amazon’s internal framework for measuring development productivity at scale. The conversation covered the origins of Amazon’s cost-to-serve model for software delivery, the role of AI in improving developer outcomes, and how Amazon is reshaping developer experience as an organizational competency instead of a tooling initiative.

Analysis

“Developer Productivity” to “Development Productivity”

One of the most important themes from the discussion was Amazon’s explicit move away from individual developer productivity metrics toward system-level development productivity. Their insight was that software development is a team-based sport, and individual metrics regress to team averages over time. Measuring teams instead of individuals reduces cultural risk, avoids vanity metrics, and aligns incentives across CI/CD, review processes, quality, and operational practices.

This echoes a broader industry shift. Organizations are rediscovering that productivity is an emergent property of the socio-technical system, not the output of any one engineer. 

Cost-to-Serve Becomes the North Star Metric

Amazon adapted its physical logistics “cost-to-serve” model to software development, creating a simple but powerful equation:

Top of the equation:

  • Developer headcount
  • Infrastructure costs for build systems

Bottom of the equation:

  • Software Delivery Units (SDUs), tailored to each team (normalized deployments, code reviews reaching production, or commits depending on architectural style)

This structure allows Amazon to evaluate whether thousands of developers across AWS and Amazon are becoming more efficient or more burdened over time without micromanaging individuals or dictating specific practices.

For leadership, cost-to-serve ties directionally to return on invested capital (ROIC), a controllable input to long-term enterprise value. That alignment gives developer productivity a board-level rationale that most engineering organizations still lack.

Balancing Speed and Safety with “Tension Metrics”

To prevent velocity-only optimization, Otto’s team pairs cost-to-serve with real-time balancing metrics that detect quality regressions. One of the most effective was human-action high severity tickets per normalized deployment, which evolved as Amazon learned which signals held up under scrutiny.

This mirrors what we see in ECI survey data. Organizations want to move quickly, but they’re deeply concerned about regressions in reliability, compliance, and maintainability as AI accelerates code output.

AI Demonstrably Improves Cost-to-Serve

Otto shared one of the clearest, data-backed validations we’ve heard. Teams that adopted AI broadly (moving from ~20% utilization to ~80%) shipped more and reduced cost-to-serve, returning meaningful headcount capacity to the business.

Importantly, AI’s influence is not limited to code generation:

  • Amazon Q Code Transformation eliminated entire categories of developer toil, like large-scale Java version upgrades, without developer intervention.
  • Automated code migrations improved as more teams used them, meaning the value compounds across the organization.
  • Platform teams must now challenge themselves; before launching any new “campaign” of required work, they must prove why a code transformation can’t be used first.

This illustrates a pattern emerging across the industry. AI doesn’t shrink engineering teams; it shrinks undifferentiated work, allowing developers to focus on innovation and reducing churn triggered by operational drudgery.

Leadership Gets a “Report Card” 

The cost-to-serve model became a governance tool. Teams no longer have to fight for CI/CD modernization or platform improvements; they can show leaders the measurable “headcount left on the table” when those improvements aren’t prioritized.

This gives teams the political capital to invest in long-term health, not just net-new features. And it exposes leadership decision patterns that amplify or suppress productivity across the org.

Measurement Without Punishment 

Because Amazon is a highly skeptical, data-driven culture, Otto’s team validated the psychological and performance impacts of measurement. Their research confirmed what most DevEx leaders intuitively know:

  • Developers do not want individual measurement.
  • Team-level measurement produces better outcomes and better sentiment.
  • Productivity increases when teams know they’re being measured but aren’t individually targeted, so the Hawthorne effect becomes a feature, not a risk.

Amazon therefore frames its approach not as developer productivity, but as development productivity.

Developer Experience as a Force Multiplier for ROIC

Amazon’s model ties directly into business levers that CFOs and CEOs care about. Their 15.9% reduction in cost-to-serve becomes a financial proof point for investing in DevEx, platform engineering, AI adoption, and SDLC modernization. It reframes developer experience teams from “tooling providers” to value creation engines.

Looking Ahead

Amazon aims to build a repeatable, data-backed framework for understanding how large-scale engineering organizations accelerate delivery without compromising quality. As agentic AI expands and code-generation becomes ambient, this structure will matter even more:

  • The definition of a Software Delivery Unit will continue to broaden as agents participate in SDLC activities.
  • Tension metrics will become increasingly important as AI-generated code floods pipelines.
  • Cost-to-serve may evolve into a new industry benchmark, particularly for enterprises seeking financial justification for platform engineering investments.
  • AI-native SDLC workflows, from automated refactors to transformation-as-a-service, will likely spread across Amazon’s internal tooling ecosystem.

For application development leaders, Otto’s framework offers a practical, CFO-aligned path to quantify DevEx improvements and AI-driven boosts in team throughput. And as organizations shift from pilots to operational AI, models like this will help determine which teams are truly ready to scale.

Author

  • Paul Nashawaty

    Paul Nashawaty, Practice Leader and Lead Principal Analyst, specializes in application modernization across build, release and operations. With a wealth of expertise in digital transformation initiatives spanning front-end and back-end systems, he also possesses comprehensive knowledge of the underlying infrastructure ecosystem crucial for supporting modernization endeavors. With over 25 years of experience, Paul has a proven track record in implementing effective go-to-market strategies, including the identification of new market channels, the growth and cultivation of partner ecosystems, and the successful execution of strategic plans resulting in positive business outcomes for his clients.

    View all posts