AI Agents, meet Test Driven Development

April 22, 2025

Diagram illustrating the Test Driven AI Development Process as a circular flow with five stages: Planning, Experiment, Evaluate, Deploy, and Observe. Each stage includes a brief description, starting with "Planning" to define the problem, and looping through experimentation, evaluation, deployment, and observation.

Historically with Test Driven Development (TDD), the thing that you’re testing is predictable. You expect the same outputs given a known set of inputs.

With AI agents, it’s not that simple. Outcomes vary, so tests need flexibility.Instead of exact answers, you’re evaluating behaviors, reasoning, and decision-making (e.g., tool selection). This requires nuanced success criteria like scores, ratings, and user satisfaction, not just pass/fail tests.

And internal evals aren’t enough. You need to make continuous adjustments based on real-world feedback.

So how can you build a process around this?

Source: AI Agents, meet Test Driven Development

When software is deterministic, testing the same functionality with the same inputs will always give the same outputs.

But the output of LLMs is nondeterministic? So what does testing look like?