The code nobody owns #101

We are shipping software faster than anyone can verify it, and the discipline that closes the gap is a deterministic check that no probabilistic system can run on itself.

Jun 22, 2026

Adoption stopped being the question

Spend a week inside any enterprise that builds software, in banking, in telco, in manufacturing or in energy, and you notice quickly that nobody argues about whether to use artificial intelligence anymore. That conversation is over. The figures now describe a baseline rather than a trend. In the 2025 Stack Overflow Developer Survey, 84% of developers said they already use AI tools or plan to, up from 76% percent a year earlier. The Italian market for artificial intelligence grew by roughly fifty percent in 2025 and reached about 1.8 billion euros, according to the Osservatorio Artificial Intelligence of the Politecnico di Milano, which also recorded that 84% of large Italian companies bought at least one generative AI licence during the year, a jump of 31%. “Agentic AI”, a phrase that until recently lived only in research papers, was named Italian word of the year, and job postings asking for AI skills grew by 93%. None of this reads like a pilot. It reads like infrastructure.

Companies are doing all of this from behind. Eurostat put enterprise AI use at 8.2% in 2024 against a European average of 13.5%, and the 2025 estimates suggest the country roughly doubled that figure inside 12 months. The room is adopting faster than it is learning to govern what it adopts, and the velocity of the catch-up is exactly what makes the gap dangerous. When a market doubles its usage in a year, the controls do not double with it, because controls are unglamorous, invisible when they work, and easy to postpone whenever a demonstration looks impressive. From the field I see the same pattern repeat across three postures. There are the conservative adopters, using a Copilot-style assistant cautiously and within narrow scope, treating it as a productivity nudge with low perceived risk. There are the deliberate teams, dedicated groups running agentic coding with the whole lifecycle in mind. And there are the full-throttle deployments, where agents are handed to everyone, speed comes first, and quality and security debt accumulate out of sight. I have watched two customers with identical tools end up in different places: a large bank that paired its generative AI rollout with strict verification gates and kept control, and a manufacturer that skipped them and watched its review queues balloon. The difference was never the model. It was the controls.

Coding is one slice of a longer chain

The word “agentic” conceals a ladder, and most teams are standing somewhere in the middle of it without ever having decided to be. At the bottom sits autocomplete, where the tool proposes the next line and the human drives. One rung higher is the assistant inside the editor, where you ask, it drafts, and you decide. Higher still is the agent that plans and runs multi-step tasks while a person stays in the loop, and at the top sit multi-agent systems, where agents call other agents with very little human oversight for each line of code produced. The further you climb that ladder, the more value you extract and the less human judgment touches any individual decision. The principle that follows is simple and routinely ignored: controls have to scale with autonomy, and in most organisations they flatly do not.

There is a second confusion worth clearing away. AI is spoken about as a coding tool, when in practice it already reaches across the entire software lifecycle, from planning and feasibility and requirements, through system design, prototyping, implementation and testing, into deployment and release, operations and monitoring, maintenance, and finally decommissioning. Coding is the most visible use of frontier models, close to 36% of Claude usage by the Anthropic Economic Index, yet it remains one stage among many. This matters because attention and tooling have piled into the editor, where the hype concentrates, while value and risk are distributed across the whole chain. Optimise the editor in isolation and every other stage stays ungoverned, which is precisely where, as the evidence below shows, the cost quietly gathers.

The productivity that feels real and is not

Here the data turns against the story the industry tells about itself. Adoption keeps climbing while trust falls. The same Stack Overflow survey found that only 29% of developers now trust the accuracy of AI output, down from 40% a year earlier, and that 46% actively distrust it. 66% percent report losing time fixing AI code that is almost right, the single most cited frustration in the profession. Almost right is the most expensive category of wrong, because it survives a quick glance and then fails later, in production, absorbed one manual correction at a time.

Then comes the productivity paradox, and it is the finding I would pin to the wall of every engineering leader who has signed an AI mandate on the strength of a feeling. In a randomised controlled trial published by METR in 2025, experienced open-source developers working on their own repositories took 19% longer to complete tasks when they were allowed to use AI tools. They had predicted a 24% speed-up, and even after finishing, having lived the slowdown, they still believed they had been 20% faster. The gain was felt. It was not delivered. An honest caveat belongs here, because a METR follow-up in early 2026 complicated the picture once many of the same developers refused to work without AI at all, which says less about measured speed than about how fast dependence sets in. The DORA research from Google Cloud had already pointed the same direction a year earlier, associating a 25% rise in AI adoption with lower delivery throughput and reduced stability, even as three quarters of teams reported feeling more productive. And when generation accelerates while understanding does not, the bottleneck does not disappear; it moves downstream. Faros AI, looking across more than ten thousand developers, recorded a 98% rise in pull requests opened and a 91% rise in total review time on the same teams. Writing code stopped being the constraint. Safely understanding it became one, and a senior engineer now spends several times longer reviewing a machine’s code than a colleague’s.

Quality and security degrade quietly

Speed without verification does not erase cost. It defers it and multiplies it. GitClear analysed 211 million changed lines of code between 2020 and 2024, drawn from Google, Microsoft, Meta and a range of enterprises, and found code churn climbing from 5.5 to 7.9 percent, meaning a growing share of new code is reverted or rewritten within two weeks of being committed. More code arrives, less craft survives, and once you count the rework the net gain drifts toward zero.

Security is probably the most critical issue. Veracode’s 2025 GenAI Code Security Report tested more than a hundred models and found that 45% of AI-generated code introduced a vulnerability from the OWASP Top 10, the standard list of critical web-application risks, producing 2.74 times more vulnerabilities than human-written code. The detail that should end the “it will fix itself” argument is that newer and larger models were no safer than smaller, older ones. The weakness is structural, built into how these systems generate code, not a defect the next release quietly retires. Layer on the supply-chain surface known as slopsquatting, where roughly 22% of the packages suggested by open-source models simply do not exist and 43% of those invented names recur on every run, a predictable and therefore exploitable target (USENIX Security 2025), and you have risk manufactured at machine speed.

The stakes climbed again in 2026, because the attacker now holds the same capability. Anthropic’s Project Glasswing, using a preview of its Claude Mythos model, autonomously found a denial-of-service bug in OpenBSD that had survived 27 years of expert review, alongside more than 20.000 potential flaws across a thousand open-source projects, of which over a thousand were confirmed at high or critical severity. The lesson we have is the opposite of the obvious one. These vulnerability-hunting agents are themselves probabilistic; their output fluctuates from run to run, and they complement rather than replace a stable baseline. That is the whole case for a deterministic verification layer, not an argument against it. Under NIS2 and DORA, shipping exploitable code is no longer merely technical debt for a bank or a telco. It is regulatory exposure, and it lands as money: the average cost of a data breach in 2025 reached 4.44 million dollars globally and 10.22 million in the United States, by IBM’s count. The same defect costs a fraction at review, more in testing, and the most in production. Speed that skips verification simply chooses when to pay, and adds interest.

We are starting to forget how to read code

The effect I find most serious is the slowest one, and it is now measured rather than suspected. A study by Microsoft Research and Carnegie Mellon, presented at CHI 2025 and built on 319 knowledge workers, found that people report investing less effort in the core thinking parts of their work when they use generative AI, 72% on analysis and 76% on synthesis. The same study found that the more a person trusts the AI, the less critical thinking they bring to bear, an inversion I have described in earlier issues as a quiet surrender of cognitive sovereignty. Higher confidence in oneself produces the opposite effect; higher confidence in the machine switches the human off.

This is where the verification problem and the themes I keep returning to in this newsletter meet on the same page. The design property that makes these systems feel fluent, agreeable and trustworthy, the artificial empathy I have written about at length, is the very thing that persuades us to hand over the judgment that would otherwise catch their mistakes. Developers ship code they never struggled over, each layer resting on code that nobody fully owns, and the structure erodes over time. The people who can spot what is wrong are exactly the experts whose instincts are wearing thin, which raises the awkward question of who reviews the reviewer once the reviewer has stopped practising. If juniors begin their careers from generated code, they absorb patterns without the reasoning beneath them, and the debugging intuition that flags an AI’s subtle failure never forms. Three to five years out, the seniors who catch those failures grow scarcer at exactly the moment AI output scales up. That is an organisational risk wearing the costume of a personal habit, and it is the kind of cognitive lock-in that stays silent until it becomes expensive to reverse.

Probabilistic power needs a deterministic check

So here is my position, stated as the line I now repeat to every person who asks why a separate check is needed when the model keeps improving. Probabilistic power needs a deterministic check. A probabilistic system returns a different output to the same prompt, is clever yet blind to your context, and makes mistakes that its own makers acknowledge openly. A deterministic check returns the same result to the same input, is repeatable, auditable and explainable, and remains necessary even when the AI rarely fails, because “rarely” is not a property you can demonstrate to an auditor. The control is not a brake on innovation. It is the mechanism that lets adoption be fast and safe at the same time, which is the only version of fast worth having in a regulated sector.

The bar, in any case, is no longer only technical. It is legal, and it has arrived precisely in the sectors filling these rooms. NIS2 has imposed cybersecurity duties on essential and important entities since transposition began in October 2024, with fines reaching ten million euros or 2% of global turnover. DORA has applied to banks, insurers, investment firms and their technology providers since January 2025. And the EU AI Act’s high-risk obligations, after the Digital Omnibus agreement reached in May 2026, now apply from December 2027 for standalone systems and August 2028 for AI embedded in regulated products, carrying duties of transparency, human oversight, logging and record-keeping. The thread running through all three is identical: you must be able to show, on demand, that your software is controlled, auditable and resilient. A deterministic verification layer is how you evidence that claim rather than assert it.

What disciplined adoption looks like in practice is deliberately unglamorous, which is part of why it works. You guide the agent first, giving it the context and the quality and architecture expectations up front, so that it writes the right code the first time rather than something plausible you must later untangle. You verify every output against repeatable, auditable rules, instead of stacking a second probabilistic opinion on top of the first and hoping two guesses average out to a fact. You remediate close to where the issue appears, before it reaches production, where the same defect is most expensive to remove. No single gate catches everything, so the controls overlap like slices stacked against each other, each one covering the holes in the next, until together they form one complete verification layer. And you begin small, with one team, one pipeline, one gate, then scale only what the data tells you is actually working. You do not need to boil the ocean to gain control; you need to start measuring.

Measurement is the part most teams skip, so I will name the few signals that separate “we feel faster” from “we can prove it is safe”. Watch the quality-gate pass rate, whether generated code clears your standards before merge on the first attempt. Watch new security findings per release and how quickly they are closed. Watch code churn and duplication, the lines rewritten within two weeks and the ratio of cloning to refactoring. And watch review time per change, the honest test of whether verification is keeping pace with generation or falling behind it. What gets measured gets governed, and these four turn a comfortable feeling into an auditable fact.

I would leave any team that builds software with three questions, and I mean them as a diagnostic rather than a flourish.

Where is AI already writing code that you do not verify deterministically?
If your best reviewers left tomorrow, would your checks still catch bad code?
Could you show an auditor, today, that your AI-assisted code is controlled?

If the answers make you uneasy, the discipline is missing, and the speed you are enjoying right now is a loan drawn against a bill that has not yet arrived.

So the question is: who really owns the code when nobody has fully understood it?

(Service Announcement)

This newsletter (which now has over 6,000 subscribers and many more readers, as it’s also published online) is free and entirely independent.
It has never accepted sponsors or advertisements, and is made in my spare time.
If you like it, you can contribute by forwarding it to anyone who might be interested, or promoting it on social media.
Many readers, whom I sincerely thank, have become supporters by making a donation.

Donate

Thank you so much for your support!

Future Scouting & Innovation

Discussion about this post

Ready for more?