Why Paul is a Socialist*

2024-02-21

* when writing unit tests

Info

blog adaption of my notes on unit testing.

Testers of the system, unit [sic]! You have nothing to lose but your (mock dependency) chains!

…in which we discover unit test philosophy and enlightenment. And may end up on a list for googling the communist manifesto.

Terminology

Like most software development terminology, [unit testing is] very ill-defined, and I see confusion can often occur when people think that it’s more tightly defined than it actually is.

— Martin Fowler

We need a common language for conversation. These are my own definitions.

Term	Definition
unit	system that loses its identity and function when divided
interface	defined set of messages a unit accepts and emits
inputs	messages sent to the unit e.g. intake queues, runtime stack
outputs	messages sent from the unit e.g. output queues, runtime stack, includes exceptions
unit client	system that sends messages to or receives messages from the unit
side effects	any unit-affected change that is observable outside its defined interface, e.g. I/O
dependency	an external system the unit relies on to do work e.g. other units via message, the system clock
system under test (SUT)	set of units tested together; the set can be singular
unit test	program that executes and observes a SUT
solitary unit test	a unit test with isolated dependencies and inputs
sociable unit test	a unit test that relies on other units to fulfill some behavior
mock	a stand-in object for a dependency or input; for purposes of this page, mocks are automatically generated via DSL or framework and provide an API to make assertions on data it observes
integration test	test that observes side effects
functional test	test with user-visible effects and assertions
pure function	function or sub-routine with no side effects; the outputs are determined entirely by the inputs

Philosophy

The fundamental question

Should a unit test’s scope be identical to the named unit under test, or is the unit better understood as “system under test” - unit plus dependencies?

Two schools prefer Solitary and Sociable unit tests, respectively.

Martin Fowler calls these the mockist and classic styles.

See section Why Paul is a Socialist.

The Solitary Tester

The solitary tester asserts the unit under test should be observed in isolation with explicitly controlled inputs, outputs, and dependencies.
They believe it is possible to consistently layer abstractions to facilitate this style of testing.
Ergo, tests should prefer mocking out all dependencies and inputs.

Advantages

Fast unit tests
Repeatable unit tests
Encourages system design broken into discrete parts

Disadvantages

Mocking libraries are not zero cost. They add cognitive complexity and execution overhead.
- Writing tests can become an exercise in fighting the type checker and the mocking library to approximate real world dependencies.
Mocks are not always faster than production code.
- In many cases, the overhead of a mock library is equivalent to or even dwarfs the cost of real dependencies.
- This includes some “I/O” cases. The following are fast enough and reliable enough that a fake is often not warranted.
  - local pipes and sockets
  - loopback
  - in-process DBs like LevelDB, RocksDB, and SQLite
Mocking libraries + a dependency injection framework (DI) encourage hiding production logic in singletons that can only practically be used in the context of the DI framework.
- The ease of DI encourages deeply nested, sprawling dependency graphs.
- A powerful mocking framework makes it easy to construct these graphs for tests, even in the absence of explicit interfaces.
- Combined, they encourage a production coding style with wide, deeply nested dependency graphs. This makes units difficult to use outside a DI or mocking framework context.
The interfaces and data shapes of the production code must be encoded N+1 times: once in the production code and once at every point a test depends on those interfaces and data shapes.
- If mocks are fully specified - types, and values - this pattern is brittle in the face of refactoring.
- If mocks are not fully specified - e.g. liberal use of any() and lenient() - the tests tend toward false signal.

The Sociable Tester

The sociable tester asserts it is better to include production code dependencies in a test rather than depend on an anemic approximation of the production code.
They believe nothing is a substitute for production dependencies.
Ergo, tests should default to executing production dependencies as much as possible, with mock implementations used as-needed to address execution time and side effect concerns.

Advantages

Tests exercise the unit in context of “real” production code because in practice, units are not units.
Improves practical test coverage.
Encourages design without layers of needless abstractions, making the system easier to understand.

Disadvantages

Execution time - if the production code is slow for some value of slow, cycle time suffers.
Side effects - if the production code is not deterministic or relies on external systems, tests are fragile and unrepeatable
Units accrete dependence on implementation details.

Why Paul is a Socialist

Units are not units

The fundamental difference between the two styles is one of unit scope.

The solitary tester asserts:

Given unit A with a production dependency on B_1 (A -> B_1) …
Dependency B_2 - a mock - is sufficiently representative of B_1 behavior that …
Tests for system A -> B_2 yield inferences about system A -> B_1

The sociable tester asserts:

Such inferences are highly problematic in practice. Tests for A -> B_2 are merely testing A -> B_2 at best, and performatively testing the mocks at worst.
Languages commonly used in industry do not provide the facilities to adequately control side effects. They creep in.
As a result, tests that heavily rely on mocks are only as good as the mocks are at masquerading as the real thing.

Mocks create an illusion of functional purity that does not exist in the real world.

But, but …

what if we are using a language or design that adequately controls side effects?
what if B_1 is a pure function?
or just pure data?
isn’t there value in using a mock B_2 so tests for A are completely isolated?

Is B_1 execution time slow?
If not, a deterministic B_1 is not inferior to a mock. Mocking out pure data or pure functions is as performative as it gets from both a practical and theoretical view.

Integration, End-To-End Tests, Functional tests

Relying on end-to-end tests to cover the unintended blind spots created by mock-heavy unit tests is problematic.

Integration and functional tests tend to be slow, killing cycle time.
In practice, integration and functional tests are often an anemic after thought.

Lemma

A fast set of “unit” tests that overlaps integration and functional test use cases is more valuable than a fast set of “unit” tests that only test pure functional properties.

YAGNI - Ya Ain’t Gonna Need It

A common argument against testing dependency code in unit tests is it encourages leaky abstractions. This is a fair criticism and where such implementation dependence impacts test reliability and speed, I believe it is worth providing a substitute implementation.

Outside that constraint, abstraction for the sake of it is performative.

Mocks

Your mock object is a joke; that object is mocking you. For needing it.

— Rich Hickey, JVM Languages Summit 2008

[Tests that heavily rely on mocks] are reliable and fast, but they tend to “lock in” implementation, making refactoring difficult, and they have to be supplemented with broad tests. It’s also easy to make poor-quality tests that are hard to read, or end up only testing themselves.

— James Shore, Testing Without Mocks

Mock alternatives

For performance or test reliability, it’s not always possible to use production code in unit tests. What should be done when this is the case?

Nullables

For an alternative to mocks, see Nullables in Testing Without Mocks. These are typically hand-crafted and do not rely on a mocking framework. Instead they require carefully constructed interfaces around side effects.

Side-effecting dependencies (clients for DBs, queues, workflow engines, etc) write all their logic with a narrowly defined interface abstracting the external system. There are two implementations of that interface

the production implementation
a test implementation that
- only consumes cpu and memory resources
- maintains a side effect log for later verification

Some might even call this a mock.

The important differences from a mock are:

A unit implements all its functionality in terms of the narrow side effect interface.
A unit is responsible for providing a version of itself using the test implementation for their clients to use in their own unit tests.
Unit clients configure their dependencies in this test mode for their own unit tests.
Assertions are made against the resulting system state and side effect log.
In this way, the production dependency code is exercised in client tests and the cohesive system behavior validated.
Execution is fast and deterministic.
The test implementation is not scattered across multiple dependent unit tests in mocks.

Reference

Testing Without Mocks: A Pattern Language by James Shore
- Clear examples and terminology
- Demonstrates high quality, efficient testing without mocks
Martin Fowler
- unit test
- mocks aren’t stubs

Related tags:

email comments to paul@bauer.codes

Why Paul is a Socialist*

Terminology

Philosophy

The Solitary Tester

Advantages

Disadvantages

The Sociable Tester

Advantages

Disadvantages

Why Paul is a Socialist

Units are not units

Integration, End-To-End Tests, Functional tests

YAGNI - Ya Ain’t Gonna Need It

Mocks

Mock alternatives

Nullables

Reference

site menu