Testing at the Layer of Behavior

Part 4 of 9 in the series: Unit Testing — A Behavior-First Approach

Welcome to Part 4! You can read this post independently, but it's part of a weekly series. Each post builds on lessons learned from mistakes in the field.

Reading Time: ~6 minutes

A Quick Context for New Readers

If you're jumping in here, here's what you need to know: We started by discovering that "test every class" creates brittle, unmaintainable tests. We fixed brittleness with Builders and Assertion Chains. Now we're tackling the harder question: what should we test in the first place?

The answer is elegant: test behaviors, not implementation. This reduces test count while increasing confidence.


The Problem: Testing Implementation Details

Let's say we're building a Kanban board where multiple users can drag and drop tasks to reorder them. Here's what we might build:

import { FractionalIndexingUtility } from './fractional-indexing.utility';

class Task {
  private priorityIndex: string;

  public prioritize(place: {
    below: Task | 'TOP_OF_LIST';
    above: Task | 'END_OF_LIST';
  }): void {
    this.priorityIndex = FractionalIndexingUtility.generateKeyBetween(
      place.below === 'TOP_OF_LIST' ? undefined : place.below.priorityIndex,
      place.above === 'END_OF_LIST' ? undefined : place.above.priorityIndex
    );
  }

  static PRIORITY_COMPARATOR(taskA: Task, taskB: Task): number {
    if (taskA.priorityIndex === taskB.priorityIndex) return 0;
    return taskA.priorityIndex < taskB.priorityIndex ? -1 : 1;
  }
}

With a "test every class" mindset, we'd write separate test files for Task and FractionalIndexingUtility. This creates tests coupled to implementation that break on refactors.

The Solution: Test Behavior, Not Implementation

Instead, test the behavior — that tasks can be prioritized and sorted correctly:

describe('Feature: Prioritize Tasks', () => {
  it('Scenario: Task is placed between two other tasks', () => {
    const testDsl = TaskTestDsl();

    /**
     * Given three tasks have been created
     * and are ordered as A, B, C
     */
    const taskA = testDsl.generate
      .task()
      .withId('task-a')
      .prioritizedToTopOfList()
      .build();

    const taskB = testDsl.generate
      .task()
      .withId('task-b')
      .prioritizedAfter(taskA)
      .build();

    const taskC = testDsl.generate
      .task()
      .withId('task-c')
      .prioritizedAfter(taskB)
      .build();

    /**
     * When task C is moved between task A and task B
     */
    taskC.prioritize({ below: taskA, above: taskB });

    /**
     * Then the tasks are ordered as A, C, B
     */
    testDsl.assert
      .tasks([taskA, taskB, taskC])
      .sortWith(Task.PRIORITY_COMPARATOR, (c) =>
        c.toExpectedOrder([taskA, taskC, taskB])
      );
  });
});

Notice what we're testing: that tasks can be prioritized and sorted correctly. Notice what we're NOT testing: how fractional indexing works internally.

A Common Objection

"But if I don't test every class, how do I know my utility works?"

You test it through the behavior it supports. The utility isn't shipping to customers — the prioritization feature is. If prioritization tests pass, the utility works.

The confusion often comes from mixing essential complexity (the problem you're solving) with accidental complexity (how you solve it). Tests should be written against essential complexity.

Refactoring Without Breaking Tests

Following TDD's Red-Green-Refactor cycle, we can extract the fractional indexing logic to a utility class and our tests don't break. Because we're testing behavior, not implementation.

This is the power of testing at the layer of behavior: refactoring implementation details doesn't break tests.

When to Test What

Test the Behavior:

  • Public methods that express business behavior
  • State transitions and business rules
  • Interfaces and contracts
  • Outcomes and results

Don't Test the Implementation:

  • Private methods
  • Helper classes and utilities (test them through the behavior they support)
  • Static utility methods
  • Internal algorithms (test that they produce correct outcomes)

The Real-World Impact

We reduced test count by 60-70% while increasing confidence because we were testing behaviors, not implementation details.

What's Next

We've covered how to write resilient tests and what to test. In the next post, we'll explore how tests serve as living documentation — how well-written tests communicate intent, document behaviors, and help both developers and stakeholders understand what the system does and why.