Part 2: Testing

If writing and debugging code are the top ways an engineer spends their day then testing is probably a close third. Whether a frontend engineer tests out the UI manually after implementing a new feature, or a backend engineer writes unit tests for some business logic, there is a wide range of different activities that could be considered 'testing' when it comes to modern software teams.

Unit Testing

Unit testing really is the cornerstone of the software testing world. It is the first line of defense against regressions as a codebase grows, and serves as useful documentation as well. It can also provide a useful metric known as 'code coverage', which is a percentage calculation of how much of a codebase is covered by testing.

The appeal of unit testing is the sense of proof that comes with a feature that is fully-tested. Mathematical thinkers that are used to working with mathematical proofs will feel right at home writing tests for their code, and a lot of engineers like to start with tests before even writing code, which is known as test-driven development (TDD).

TDD is a great practice for engineers to get in the habit of, and oftentimes results in focused, well-architected code. However, there can be situations where unit testing is not always worth the effort, especially for a lot of frontend projects. Unit tests are great for algorithmic code that takes advantage of complex business logic, but things start to get a little more interesting once you introduce side-effects such as HTTP requests, database queries, async workflows etc.

For side-effects such as database queries, extensive mocking is typically used. This is where the value of unit testing starts to come into question, at least for me. One of the consistent arguments I hear for TDD is that it likely results in well-structured code. Really what they mean by that is it results in testable code, which is sort of redundant. Ironically, however, due to the prevalence of imperative style programming, nobody questions the fact that their code typically mixes in side-effects without any consideration as to how that is handled. Instead, they just mock out these side effects in tests, further justifying this bad practice.

For those not familiar with Elm, or CycleJS, these frontend framework take a functional approach instead and keep the code side-effect free by moving all side-effects to the 'boundaries' of the application i.e. the edges. They both do this in an elegant and beautiful way that keeps your code pure and therefore actually testable AND truly well-architected.

In summary, to me 'mocking' can be a code smell, though a common practice in the imperative or OOP world. Even in functional languages you may still need to use mocks, but they have a tendency to do a better job isolating side-effects from pure code, meaning you can likely just focus on testing pure code instead of testing imperative code full of side-effects.

Integration Testing

Integration testing makes the most sense for API development based on how heavily-integrated they are (3rd-party services and APIs), how much they work with databases, and also considering how a good API should be well abstracted and highly declarative, minimizing code that is actually worth unit testing.

For example, with a GraphQL API you are typically defining a schema (declarative), creating database models, which are just a configuration of fields (declarative), and typically using some ORM or query builder, based on these models, which should already be well-tested. If you don't have complex business rules or access permissions and don't integrate with many 3rd-party services your API should be largely declarative, which is a good thing.

This is why integration tests make more sense than unit testing for APIs. If half of your time writing unit tests is spent on mocking services then what certainty is it going to provide? You're better of just hitting the real services and making sure the full integration works end-to-end (e2e).

The main problem with this approach is it's usually more tedious to write out integration tests because a lot of the time you may be parsing large JSON documents or need to seed databases with data. In general they are larger in scope than unit tests which is what makes them more complex but also provide a higher degree of confidence in the systems you are testing.

E2E Testing

If testing a running API with real HTTP requests (REST, GraphQL etc) is considered an integration test, then add in the UI and you've upgraded to an end-to-end test (E2E). Depending on who you talk to the lines between integration and e2e tests are a bit blurry, but as a rule of thumb I think of E2E as testing the final product the user would interact with, while integration tests are for testing a technical component that is not necessarily for the end user, such as an API or data pipeline.

E2E tests provide the most value, in my opinion, but are also the most open-ended as far as what to test and how to test it. I think this is the underlying problem in testing, is that the higher-level it is (entire end product instead of single function), the more it covers and the more that can go wrong to fail the test, but also the more it reflects the actual user experience. The more granular the test, however, the more focused the test can be and the more predictable its outcome, but the less value it provides in the grander scheme.

It doesn't really matter if all the unit tests pass, or if all the integration tests pass. It doesn't matter if they fail either. What really matters is whether or not the end user has been affected. E2E tests most closely approximate this, while unit and integration tests provide a different level of confidence, albeit in a technical sense rather than product sense.

There is an additional benefit to e2e tests, and that is that they can be, and typically are, written by someone other than the engineer. This is valuable for several reasons, the first being that engineers are notoriously bad at testing their own code, because they 'test' it in the way they wrote it, not in how someone else may use the feature. Secondly, this is valuable because it enables developers to focus on writing code while someone else tests it. It also provides an easy way for the business to put a dollar amount into testing, rather than it being opaquely part of an engineer's job, where they could spend anywhere from 0-40%+ of their time testing their code.

Performance Testing

It's easy to see how performance testing might not get prioritized in a lot of organizations when there is so much testing to be done already, whether that be at the unit, integration or e2e level. Realistically, most medium and large projects will have all 3 of these types of testing to some degree, which is already a big investment. Testing performance is definitely a lower priority than making sure the product even works, but in some cases can be equally important.

There is a fine line when optimizing performance for the future. You don't want to overengineer your software for success it may never reach, nor do you want to be caught with your pants down should you go viral or are struck with good fortune that exponentiates your business growth. The last thing you want is your tech to not be able to handle all the success you worked so hard for.

The reality is you can typically just compensate for this by throwing more server power at the problem. This could end up being costly, but does minimize the risk of a 'best-case scenario' of rapid growth. After all, the buckets of money you should be pulling in can easily cover your increased server costs. However, you still want to be proactive and efficient with your server resources.

The best way to be proactive with performance early on is to work load testing into your stack with regular tests that are recorded and plotted against a graph. Blow up your servers every morning at 3:30am when nobody is using them and push your applications to their limits and track how it responds. You'll be able to see when releases introduce bloat by the drop in performance, or actually measure the optimizations that increase performance. Getting a baseline from the very beginning will really help prevent your software from slowing down as it grows in functionality.