TDD with AI: Supervising Agents Through Tests

Part 8 of 9 in the series: Unit Testing — A Behavior-First Approach

This post explores how TDD applies to modern development with AI coding assistants. Each post stands on its own, but reading the full series will deepen your understanding.

Reading Time: ~5 minutes

Context for New Readers

We've been exploring a shift in how we think about testing — from counting coverage metrics to focusing on behavior. The key insight is that tests aren't just safety nets; they're contracts that specify what code should do. This becomes particularly powerful when you're working with AI agents.


The Problem: AI Without Guidance

When I first started using AI coding assistants, I'd ask them to "implement a feature" and they'd generate code. But I had no way to verify correctness or guide the AI toward better design.

The problem? There was no specification. I was reviewing implementation without understanding the intended behavior.

The Solution: Tests as Contracts

When you ask AI to write tests first, then implement, you get:

  1. A clear specification — The test describes exactly what should happen
  2. Verifiable correctness — The test passes or fails
  3. Design pressure — The AI must design a clean API
  4. A reviewable artifact — You can review the test to ensure it captures the right behavior

TDD Workflow with AI: A Real Example

Me: "Write a test for blocking a task when it depends on another incomplete task."

AI generates:

describe('Feature: Block Task', () => {
  test('Scenario: Task is blocked when it depends on an incomplete task', () => {
    const testDsl = TaskTestDsl();

    const blockingTask = testDsl.generate
      .task()
      .withId('blocking-task')
      .withStatus('IN_PROGRESS')
      .build();

    const dependentTask = testDsl.generate
      .task()
      .withId('dependent-task')
      .withStatus('PENDING')
      .build();

    dependentTask.blockDueTo(blockingTask);

    testDsl.assert
      .task(dependentTask)
      .exists()
      .and()
      .isBlocked()
      .and()
      .isBlockedBy(blockingTask);
  });
});

Me: "The test looks good, but add a scenario for when the blocking task is already completed."

AI adds the test, I approve, then: "Now implement the code to make both tests pass."

The entire interaction was guided by tests — the test specified what we wanted, I reviewed the test, and the AI implemented to satisfy it.

Supervising AI Through Test Review

  1. Review the test first — Does it capture the right behavior?
  2. Verify test quality — Does it follow our guidelines?
  3. Use tests to correct AI — Point to test failures, not code bugs
  4. Iterate through tests — Add/modify tests to guide AI

When AI makes mistakes, I don't explain in code terms. I point to the test:

"The test expects isBlocked() to return true, but your implementation returns false. Fix the implementation to make the test pass."

The test failure guides the correction. The test is the specification — it's unambiguous.

The TDD Cycle with AI

  1. Red: Ask AI to write a failing test
  2. Review: Review the test to ensure it captures the right behavior
  3. Green: Ask AI to implement code to make the test pass
  4. Verify: Run the test to confirm correctness
  5. Refactor: If needed, ask AI to refactor while keeping tests passing

The Future: AI Agents and Human Leads

In a world where AI agents write most of the code, TDD with AI becomes the standard workflow:

  • Leads write or review tests — Ensuring behaviors are correctly specified
  • AI implements to satisfy tests — Following the contract
  • Tests verify correctness — Providing immediate feedback
  • Leads review through tests — Understanding intent without reading implementation

This workflow scales because tests are faster to review than implementation, they document intent clearly, and they enable parallel work.

What's Next

In the final post, we'll reflect on the entire journey — from coverage metrics to behavior-first testing. We'll summarize the key lessons and the mindset shifts that transformed how we think about testing and development.