unit testing - Paul's Notes

rel:: [[Resilient Software Engineering MOC]] # unit testing ## Reference - [Testing Without Mocks: A Pattern Language](https://www.jamesshore.com/v2/projects/nullables/testing-without-mocks) by [[James Shore]] ^17ec97 - Clear examples and terminology - Demonstrates high quality, efficient testing without mocks - [[Martin Fowler]] - [unit test](https://martinfowler.com/bliki/UnitTest.html) - [mocks aren't stubs](https://martinfowler.com/articles/mocksArentStubs.html) %% - [[202401101649 Conversation on Unit Test Philosophy]] %% ## Terminology > Like most software development terminology, _\[unit testing is\]_ very ill-defined, and I see confusion can often occur when people think that it's more tightly defined than it actually is. > > — [[Martin Fowler]] - [Unit Test](https://martinfowler.com/bliki/UnitTest.html) We need a common language for conversation. These are my own definitions. | term | definition | | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | unit | system that loses its identity and function when divided | | interface | defined set of messages a unit accepts and emits | | inputs | messages sent to the unit e.g. intake queues, runtime stack | | outputs | messages sent from the unit e.g. output queues, runtime stack, includes exceptions | | unit client | system that sends messages to or receives messages from the unit | | side effects | any unit-affected change that is observable outside its defined interface, e.g. I/O | | dependency | an external system the unit relies on to do work e.g. other units via message, the system clock | | system under test (SUT) | set of units tested together; the set can be singular | | unit test | program that executes and observes a SUT | | solitary unit test | a unit test with isolated dependencies and inputs | | sociable unit test | a unit test that relies on other units to fulfill some behavior | | mock | a stand-in object for a dependency or input; for purposes of this page, mocks are automatically generated via DSL or framework and provide an API to make assertions on data it observes | | integration test | test that observes side effects | | functional test | test with user-visible effects and assertions | | pure function | function or sub-routine with no side effects; the outputs are determined entirely by the inputs | ## Philosophy > [!question] The fundamental question > Should a unit test's scope be identical to the named unit under test, or is the unit better understood as "system under test" - [[#Units are not units|unit plus dependencies]]? Two schools prefer **Solitary** and **Sociable** unit tests, respectively. Martin Fowler calls these [the mockist and classic styles](https://martinfowler.com/articles/mocksArentStubs.html#ClassicalAndMockistTesting). See [[#Why Paul is a Socialist|Why Paul is a Socialist]]. ### The Solitary Tester The solitary tester asserts the unit under test should be observed in isolation with explicitly controlled inputs, outputs, and dependencies. They believe it is possible to consistently layer abstractions to facilitate this style of testing. Ergo, tests should prefer mocking out all dependencies and inputs. #### Advantages - Fast unit tests - Repeatable unit tests - Encourages system design broken into discrete parts #### Disadvantages - Mocking libraries are not zero cost. They add cognitive complexity and execution overhead. - Writing tests can become an exercise in fighting the type checker and the mocking library to approximate real world dependencies. - Mocks are not always meaningfully faster than production code. - In many cases, the overhead of a mock library is equivalent to or even dwarfs the cost of real dependencies. - This includes some "I/O" cases. The following are fast enough and reliable enough that reaching for a fake is often not warranted. - sockets - in-process DBs like LevelDB, [[RocksDB]], and [[SQLite]] - Mocking libraries + a dependency injection framework (DI) encourage hiding production logic in singletons that can only practically be used in the context of the DI framework. - The ease of DI encourages deeply nested, sprawling dependency graphs. - A powerful mocking framework makes it easy to construct these graphs for tests, even in the absence of explicit interfaces. - Combined, they encourage a production coding style with wide, deeply nested dependency graphs. This makes units difficult to use outside a DI or mocking framework context. - The interfaces and data shapes of the production code must be encoded N+1 times: once in the production code and once at every point a test depends on those interfaces and data shapes. - If mocks are fully specified - types, and values - this pattern is brittle in the face of refactoring. - If mocks are not fully specified - e.g. liberal use of `any()` and `lenient()` - the tests tend toward false signal. ### The Sociable Tester The sociable tester asserts it is better to include production code dependencies in a test rather than depend on an anemic approximation of the production code. They believe nothing is a substitute for production dependencies. Ergo, tests should default to executing production dependencies as much as possible, with mock implementations used as-needed to address execution time and side effect concerns. #### Advantages - Tests exercise the unit in context of "real" production code because in practice, [[#Units are not units|units are not units]]. - Improves [[#Integration, End-To-End Tests, Functional tests|practical test coverage]]. - Encourages design without layers of [[#YAGNI - Ya Ain't Gonna Need It|needless abstractions]], making the system easier to understand. #### Disadvantages - Execution time - if the production code is slow for some value of slow, cycle time suffers. - Side effects - if the production code is not deterministic or relies on external systems, tests are fragile and unrepeatable - Units accrete dependence on implementation details. ## Why Paul is a Socialist > Testers of the system, unit \[_sic_\]! You have nothing to lose but your (mock dependency) chains! ### Units are not units The fundamental difference between the two styles is one of unit scope. The solitary tester asserts: 1. Given unit `A` with a production dependency on `B_1` (`A -> B_1`) ... 2. Dependency `B_2` - a mock - is sufficiently representative of `B_1` behavior that ... 3. Tests for system `A -> B_2` yield inferences about system `A -> B_1` The sociable tester asserts: - Such inferences are highly problematic in practice. Tests for `A -> B_2` are merely testing `A -> B_2` at best, and performatively testing the mocks at worst. - Languages commonly used in industry **do not provide the facilities to adequately control side effects**. They creep in. - As a result, tests that heavily rely on mocks are only as good as the mocks are at masquerading as the real thing. **Mocks create an illusion of functional purity that does not exist in the real world.** > But, what if we are using a language _or design_ that adequately controls side effects? > What if `B_1` is a pure function? > Or just pure data? > Isn't there value in using a mock `B_2` so tests for `A` are completely isolated? Is `B_1` execution time slow? If not, a deterministic `B_1` is not inferior to a mock. Mocking out pure data or pure functions is as performative as it gets from both a practical and theoretical view. ### Integration, End-To-End Tests, Functional tests Relying on end-to-end tests to cover the unintended blind spots created by mock-heavy unit tests is problematic. 1. Integration and functional tests tend to be slow, killing cycle time. 2. In practice, integration and functional tests are often an anemic after thought. > [!note] Lemma A fast set of "unit" tests that overlaps integration and functional test use cases is more valuable than a fast set of "unit" tests that only test pure functional properties. ### YAGNI - Ya Ain't Gonna Need It A common argument against testing dependency code in unit tests is it encourages leaky abstractions. _This is a fair criticism_ and where such implementation dependence impacts test reliability and speed, I believe it is worth providing [[#Mock alternatives|a substitute implementation]]. Outside that constraint, abstraction for the sake of it is performative. ## Mocks > Your mock object is a joke; that object is mocking you. For needing it. > > — [[Rich Hickey]] at [JVM Languages Summit, 2008](https://www.infoq.com/presentations/hickey-clojure/) > \[Tests that heavily rely on mocks\] are reliable and fast, but they tend to “lock in” implementation, making refactoring difficult, and they have to be supplemented with broad tests. It’s also easy to make poor-quality tests that are hard to read, or end up only testing themselves. > > — [[#^17ec97|Testing Without Mocks]] ### Mock alternatives For performance or test reliability, it's not always possible to use production code in unit tests. What should be done when this is the case? #### Nullables For an alternative to mocks, see Nullables in [[#^17ec97|Testing Without Mocks]]. These are typically hand-crafted and do not rely on a mocking framework. Instead they require carefully constructed interfaces around side effects. Side-effecting dependencies (clients for DBs, queues, workflow engines, etc) write all their logic with a narrowly defined interface abstracting the external system. There are two implementations of that interface - the production implementation - a test implementation that - only consumes cpu and memory resources - maintains a side effect log for later verification Some might even call this a mock. The important differences from a mock are: - A unit implements all its functionality in terms of the narrow side effect interface. - A unit is responsible for providing a version of itself using the test implementation for their clients to use in their own unit tests. - Unit clients configure their dependencies in this test mode for their own unit tests. - Assertions are made against the resulting system state and side effect log. - In this way, the production dependency code is exercised in client tests and the cohesive system behavior validated. - Execution is fast and deterministic. - The test implementation is not scattered across multiple dependent unit tests in mocks.