Ten years in software

Last updated December 15th, 2016

I have been working professionally as a Software Engineer for the past 10 years. In that time, I've learned a huge amount, gained a bit of confidence, and largely ignored the social nature of our field. I haven't given back to the community and now feel like it's a good time to change that. I've been very lucky in my career thus far and want to share the broad lessons that I've learned along the way.

This is part five of a series of pieces written reflecting on my career:

The value of a test

When writing software, you'll eventually need to verify that it does what you intended. Sometimes this is a manual process, and other times you'll automate it with code. For better or worse, at every place I've worked so far, it was up to the judgement of the developer whether to write automated tests or to manually verify behavior. At Sendio, we tended to lean on the side of manual testing. At IMVU we wrote lots of tests (often in advance of implementation), and aimed to eliminate slow or intermittent tests. At Etsy, we landed somewhere inbetween.

Regardless of the practice, a test is only valuable when it fails. A test's failure tells you something. The more reliable information a test failure gives you, the more valuable that test is.

I believe there are exactly three kinds of tests, and combining qualities of each will only cause trouble:

Unit tests: Tests that verify the contract upheld by individual components or functions.
Acceptance tests: Tests that simulate user behavior to verify success criteria.
Fuzz tests: Tests that look for bugs in a function or component by generating unexpected input.

Good tests are fast, reliable, informative, easy to read, and will help you predictably deliver better software. Bad tests are slow, unpredictable, useless when they fail, hard to decipher, and will strike fear into your heart when you have to work with them.

Here's what I've learned about writing and maintaining these kinds of tests.

Names matter

Tests are like everything else in the world: coming up with a good name is difficult, but doing so will make things much easier in the future. A test should describe the behavior that it is verifying, it should not describe the value that it's verifying. To demonstrate, here are a bunch of good and bad names for tests—read each and ask yourself if the name helps give context:

MutexHandler::test_locking_twice_fails
AtlasComponent::test_happypath_1
EmailValidator::test_fuzz_rfc5322_addr_spec
AccountManagement::test_logged_out_user_cannot_see_account_details
Random::test_entropy_quality
TwoPC::test_double_ack_does_not_succeed_twice
UiToolKit::test_bad_data_fails
FrontendAnalytics::test_multiple_beacon_requests_are_aggregated
PasswordManager::test_batch_bcrypt_rehash_finishes_quickly
LoginForm::test_form_submit_does_the_right_thing

Failure messages matter

If a test always passes, it's mostly useless.

When writing a test, it's important to consider what the operator will see when the test fails:

Does the failure say that the result doesn't match expected values (Expected "1" but got "0"), or does it describe the invariant which is broken (session token missing)?
Does the failure show that a value is incorrectly used (TypeError: null is not an object) or that a specific computation resulted in a bad value (user preferences are missing)?
Does the failure show that an item was not present in a collection (Item3 not found in [Item1, Item10]) or does it just say that something failed (Expected false but got true)?

Take the time to use the appropriate assertion for your tests. If you need to assert business-specific logic, take the time to write assertions which use the language of your business.

Only test public methods

An interface provides the verbs that you can use to operate a component. An object's contract is the set of guarantees that its public interface provides. Tests are meant to uphold guarantees. This means your test should only interact with objects through their public interface.

If you're trying to assert behavior (without checking an object's side effects) without using a public interface, you're testing how it does it not what it does.

Regressions happen when what it does changes.

Nobody will notice if how it does it changes.

If you change all of the internal data structures abstracted away by an object, your tests should still pass. If you think you really need to test the internal data structures, you probably should be measuring the performance characteristics instead (more on that later).

You can control all of the preconditions

If a test relies on an external value—the database, the network, the system time, or the filesystem, it's prone to failures due to preexisting conditions. To avoid this, use dependency injection.

Provide your code with interfaces which implement access to these internal values. In this way, a test can use in-memory data to verify that a "database" contains the correct value, or that the current local time is on a leap second or a daylight savings time boundary, or that a file already exists with unexpected data.

In addition to making your tests more reliable, this technique also removes I/O, making your tests extremely fast.

Don't use mocks/stubs for individual functions/methods

Here, "mocks" and "stubs" refer to the practice of altering the execution of specific functions at test runtime.

If you need to mock out a method in order for a test to pass, you're testing how it does it, not what it does. This distinction is really important: doing things wrong is not the same as doing things differently.

If a test tells you that your component is doing things differently, you'll resent your tests when you refactor your codebase or perform perfectly valid optimizations. However, if your test tells you that your component is doing things wrong, that's something which will save you and your customers from failure.

Relying on mocks or stubs for individual functions/methods will make a system more fragile over time. Avoid this as much as possible.

Verify your fakes with the same test suite

Here, "fakes" refer to alternate implementations of components which uphold the interface's contract—an example being an in-memory filesystem fake.

Let's say you have a flaky test which has an external dependency on user account data which is stored in the database. One way to reduce the flakiness would be to remove its dependence on the database itself. To do this, you could extract an interface which is an abstraction around access to that user account data and then provide two implementations of that data: one which uses the real database and the other which uses an in-memory data structure. You can then use dependency injection so that the test uses the in-memory implementation. This way, the test removes the external dependency and can always control the user account data.

However, that's not good enough. Now your test depends on the in-memory implementation. If the database implementation were to change in a breaking way (and not the in-memory implementation), the test would falsely pass, but production would fail!

To prevent against this, you can write a test suite which verifies the user account interface's contract. The exact same test suite can then be run against the "fake" in-memory implementation and the "real" I/O-backed implementation.

This will ensure that changes to one will not impact the other and that you can use the faster and more reliable in-memory implementation wherever you need to access user data.

Speed can be an invariant

Sometimes systems need to be fast. Enforcing this can be the same as enforcing the contract that your system upholds.

Let's say you're building a 3d engine—there will be components which must perform their behavior within a fixed amount of wall clock time (on your minimum supported hardware). Rendering one frame of a 3d scene will need to take less than one frame (1/60hz = 16.67ms).

Write a test to ensure this performance constraint is met! Generate a big load of data which represents your worst case, and measure the wall clock time of rendering that scene.

This isn't limited to realtime systems. Let's say you're refactoring a system to be more generalized, but don't want to unintentionally introduce performance regressions. Take a known workload for that system, measure the time it takes to execute it, and write an automated test to verify that the processing of that same workload finishes in the same or less wall clock time than the original system. This can be extremely valuable, as unintentional regressions can be detected early in development.