Toward automated verification of unreviewed AI-generated code – Peter Lavigne

March 18, 2026

Three pinned notes with links: "peterlavigne.com," "Writing," and "Hire me." Text below: "Toward automated verification of...

I’ve been wondering what it would take for me to use unreviewed AI-generated code in a production setting.

To that end, I ran an experiment that has changed my mindset from “I must always review AI-generated code” to “I must always verify AI-generated code.” By “review” I mean reading the code line by line. By “verify” I mean confirming the code is correct, whether through review, machine-enforceable constraints, or both.

Source

Code generation was never the bottleneck is a refrain we hear daily by people skeptical about the use of LLMs for software engineering.

So what are the other bottlenecks? One of those is verification. Ensuring that the code generated is of sufficient quality. And we’ve long had many techniques for doing this as humans, which include code reviews. but many of them, much of it is automated with linters, compilers, test suites and so on.

I think a really important question any software engineer should ask is what signals would make me feel comfortable accepting some code?

I also think that’s not a one-and-done answer. There’s plenty of software I’d write internally to speed up a process that was previously done manually, and what I’m very much concerned, almost solely concerned with there, is what comes out the other end of that process. But, if I’m driving an autonomous car, I would like to think that the code that went into it, whether generated by a human or a LLM, was more rigorously verified.

What answer do you have?

Comments from Mastodon

  1. sam says:

    @conffab "Verify, not review" is the right mental model shift. Line-by-line review doesn't scale when AI generates thousands of lines. Machine-enforceable constraints do.

    That's the approach we took at repofortify.com — instead of reviewing code, we verify structural signals: CI exists, tests are present, secrets aren't hardcoded, dependencies are managed. Binary checks that scale regardless of code volume.