AI Can Write More Code. People Still Make It Shippable.

As reported widely, AI development tools are excelling at producing large volumes of code. But the data points to a practical limit: increases in code activity do not translate one-for-one into increases in shipped software. The chart below traces a single AI coding assistant across the development pipeline, and the pattern is consistent — the productivity gain is enormous where the work is mechanical and shrinks at every stage that depends on human judgment.

Source: Demirer, Musolff & Yang (2026), Table 5, a forthcoming MIT Sloan working paper. Figures reflect the sync agent (“AI coding assistant”), weeks 21–30.

Leadership at federal agencies and in GovCon might reasonably ask: “If AI is producing the code, why does it matter that its boost to code written outpaces its boost to software shipped by so much? We’re still releasing more.”

That’s fair. An increase in shipped software is good — if your releases rise 20%, you’re delivering more to users. But treating the rest of the pipeline as someone else’s problem works only for a while. Eventually, the strain shows up where the judgment lives.

The “Bottleneck” Framing — and Why It Misreads the Problem

Look at the data and one reading jumps out: AI writes the code in seconds, but people still have to review it, test it, integrate it, and secure it — so people must be the bottleneck. The easy conclusion follows quickly. As software engineer Subhash Jha describes the impulse: “The fashionable solution to the human-as-bottleneck problem is to remove the human. Let the agent write the code. Let the agent review the code. Let the agent ship the code.”

There’s an important point here. AI can generate code far faster than any team can responsibly ship it — because review, QA, integration, and security audits still set the pace at which work reaches production. Left unmanaged, the imbalance is corrosive: a flood of AI-generated code wears reviewers down, and overwhelmed reviewers start rubber-stamping approvals to clear the queue, letting bugs and vulnerabilities slip into production.

But it certainly doesn’t follow that letting agents review and ship code, automating the process, is the answer. The downstream stages aren’t clogging a pipe — they’re where human judgment converts raw code into software that’s correct, secure, and accountable. People bring what the model can’t: deep context on the existing codebase, the broader tech ecosystem, the mission, and the consequences of getting it wrong. Automating those stages away doesn’t remove the bottleneck; it removes the part of the process that makes the output trustworthy and valuable to users.

In Government, Removing the Human Isn’t an Option

In federal environments this isn’t only bad engineering — it’s disqualifying. Automating the reviewer out of the loop creates what Jha calls an accountability gap: when an agent writes, reviews, and ships its own work, no one can answer for what reached production. That’s poor practice in private enterprise. In government it’s unallowable. Authorization boundaries, security controls, and statutory accountability all assume a responsible human at the decision points — not a closed loop of agents signing off on each other.

So the question for agency and GovCon leadership isn’t “how do we get the human out of the way?” It’s the opposite: how do we keep human judgment exactly where it matters most, while still capturing the speed AI offers everywhere else?

Design the Pipeline Around Human Judgment, Not Against It

That reframing changes the goal. The objective isn’t an unobstructed pipeline — it’s the most efficient pipeline that preserves human judgment at the points where it carries the most weight: review, QA, integration, security. Everywhere else, let AI move fast. At those gates, make the human’s work faster and better-supported rather than removing it.

In practice that means starting small, defining clear rules, and building a record of every decision and change as AI-generated work moves through the pipeline — so reviewers spend their attention where it counts and every approval has a name and a rationale behind it. That’s exactly what we designed TCG’s Glassbox AI to do. It doesn’t mean move slowly. It means move fast without breaking things — and without breaking the chain of accountability.

For more on how TCG approaches this, visit tcg.com/glassbox. To continue the conversation, find us on LinkedIn @TCG