What should an AI agent verify after coding?

At minimum, it should run the project build or test command and check the user-facing behavior listed in the spec.

Is a passing build enough?

No. A passing build proves the project compiles. You still need to inspect the actual behavior, especially for UI and content changes.

How do I make AI-generated code safer to merge?

Keep the task small, write verification steps before coding, run the checks, and review the diff against the spec.

← Back to blog

verificationAI code reviewAstro

My AI agent kept breaking old code until I made it prove the change

A practical story about adding verification to AI-assisted coding so every agent run ends with evidence, not hope.

May 13, 2026 6 min read

The problem

The agent finished the task and said it was done.

I wanted to believe it.

The page loaded. The new section looked fine. Then I clicked a different page and found the break. A shared component had changed. A route still built, but the layout was off. The agent had solved the visible task and quietly damaged something nearby.

That is the uncomfortable part of AI-assisted coding: the agent can sound confident before the project is safe.

What I used to do

I used to review the final answer more than the final state.

If the agent said “I updated the component and everything should work,” I treated that like progress. But “should work” is not evidence. It is a vibe with better grammar.

The fix was to stop asking the agent if it was done and start asking it to prove what changed.

The new rule

Every spec gets a verification section before coding starts.

For a static Astro project, the minimum proof is simple:

npm run build

That catches broken imports, invalid content, bad routes, and TypeScript issues. It does not catch everything, but it catches enough that skipping it is silly.

For UI work, I add behavior checks:

Does the page render at the route I changed?
Does the link go to the right place?
Does the copy button still copy the right text?
Does mobile layout still make sense?
Did the agent touch files outside the spec?

Now the agent has to end with evidence.

The happy path

The workflow became:

Write a small spec.
Add verification steps to the spec.
Let the agent implement one unit.
Run the checks.
Review the diff against the spec.
Commit only when the evidence matches the claim.

This changed the tone of the work.

Before, I was reading AI output and trying to decide if I trusted it. After, I was reading build output and inspecting a smaller diff. That is much easier.

The prompt I use

Read the project context and the current unit spec.
Implement only the files needed for this unit.
When done, run every verification step from the spec.
In your final note, include:
- files changed
- commands run
- any checks you could not complete
- any files you touched outside the spec, with a reason

The last line matters. It makes scope creep visible.

What verification does not solve

Verification does not make AI code automatically good.

A build can pass with bad UX. Tests can pass while the product feels wrong. A screenshot can look fine while the copy is confusing.

But verification changes the default from “the agent says it is done” to “the project gave us evidence.”

That is a better place to start review.

Questions people ask

Should the agent run commands itself?

Yes, when the command is safe and part of the repo workflow. For this project, npm run build is expected. For commands that deploy, publish, delete, spend money, or message people, stop and ask first.

What if the build fails?

Good. You found the problem before the user did. Give the agent the error and keep the fix inside the same spec boundary.

Should I commit after every passing build?

Commit after a passing build and a human diff review. The build is evidence, not permission.

The lesson

A confident agent is nice.

A verified change is better.

Need this in your product?

I help teams turn agent experiments into useful systems.

If your team is trying to build with AI agents, I can help you shape the workflow, write the specs, review the architecture, or build the first production-ready agent with you.

Work with Sandip Join the newsletter

Where I can help

Agent workflow design for real products
Context files, specs, review loops, and verification
Hands-on builds for internal tools, content systems, and product agents