Sane Programming
Programmers have argued about the best way to program for a long time, maybe since the very beginning of the craft. There is an abundance of opinions on the subject, so I haven’t been in a rush to add mine. However, I recently thought about what I have learned during the past decade of professional software development and decided to write it down. The following thoughts are focused on object oriented (OO) programming, but at least partially apply to imperative and functional programming as well. As I couldn’t come up with a fancy name for my style of programming, I’m just going to call it “sane programming”.
The first key element of sane programming is to forget about most OO principles and any related high-brow philosophy which you might have picked up over the years. The truth is that OO has not achieved its promised goals. That would be fine, except that many programmers are still trying to somehow unlock these promised advantages and keep themselves paralyzed in the futile quest for true OO. This is both a distraction and a waste of time. Instead of trying to live up to the textbook definition, just accept that the theory doesn’t work. Don’t try overly hard to model the domain in your source code, as your code will rarely match the domain. Don’t add any unneeded abstractions because they might become useful in the future. Adding abstractions just makes everything more confusing, which Java standard classes like BufferedReader or BufferedInputStream nicely illustrate. Also, don’t try to model your code in a formal notation like UML before implementing it as the proper design will only reveal itself during the implementation. If you want to brainstorm before writing the code a simple sketch on a whiteboard should be more than enough.
Another pillar of sane programming is to avoid class inheritance. This advice isn’t really new: Already back in 1995, the (in)famous gang of four book advised to avoid class inheritance and to use composition instead. My experience is that inheriting behavior via class inheritance just isn’t that useful. How often do you want one class to do almost the same thing as another class, after all? You might use it for mocks (a.k.a. test doubles), but that is usually handled by a framework and can also be achieved via type inheritance (a.k.a. implementing an interface). Inheriting behavior can lead to very hard to understand code if you have a deep inheritance hierarchy and/or lots of switching between the subclass(es) and the superclass. Behavior inheritance is the primary reason for the hyperbole that everything important happens somewhere else in object oriented programming. Without it, your code should become much easier to understand. That said, I recognize that behavior inheritance might be useful if you’re programming a library.
To produce sane object oriented code, you also have to accept that some of your classes will just be data containers. Combining behavior and state is one of the core ideas of OO and while this works well for some classes, it is a terrible fit for others. This problem is especially painful in languages like Java where you’re limited to just a single return type per method and hence often have to create extra classes to return structured data. These kinds of classes often don’t have any methods except getters and trying to somehow define more sophisticated methods will just make the code worse. Even if it violates the OO textbooks, using objects just as fancy structs works just fine. In fact, it works so well that Oracle introduced records in Java 14 to cover this use case.
Another part of sane programming is to minimize state as much as possible. Usage of members to store data should be avoided as much as possible as it makes your objects unsuited for multithreading. Instead, use parameters and pass these around in private methods. If you absolutely have to store state in your object, try to make it immutable. That way you can safely share objects with other threads. What you should store as member variables are the dependencies of the class. Of course, you should try to avoid having state in these as well.
The next core idea of sane programming is to simplify your code as much as possible. Some developers like to make their code future proof by adding all kinds of frills and indirections just in case you might need them in the future. Modern multi-paradigm languages offer a plethora of tools for this kind of fortune telling. The end result is unnecessarily confusing code which only at intense inspection reveals its uselessness. Predicting the future is difficult at best and any fortune telling abilities are better used at the stock market instead of in the code base.
Next, let’s talk about class and method sizes. Too large methods and classes are difficult to understand as you need to keep too much information in your head. On the flipside, too small methods and classes are also confusing as you have to jump around all the time to understand the big picture. Having micro classes of just 20 to 50 lines of code is a great way to make your code unreadable. So, what’s the sweet spot? As a rule of thumb, I like to have about 5 to 20 lines of code for a method and 100 to 200 lines for a class. Obviously, these are no hard rules and there will be situations where you’ll violate them. If a class is too big, you need to split it into multiple ones. For me the best sign of a too big class is that it is difficult to unit test. As soon as you feel the desire to directly test a private method of a class, it is time to split the class and to make the private method into a public method of the new class.
While we’re on the subject of classes: I strongly recommend using constructor injection to get the dependencies into your classes. This approach works well even if you don’t use a dependency injection framework like Spring. It also makes your code easily testable. If you want to have a default constructor for some reason, you can always have two constructors: One where all the dependencies are passed as parameters and one where the default constructor gets them. As long as your object is completely initialized after the constructor has run everything is fine.
As mentioned, constructor injection makes your code easily testable. A well-written automated test suite is the biggest productivity booster in programming that I know of. Even though writing automated tests (or at least claiming to do it) has become very common in the industry, I still encounter lots of code without any test coverage as well as many poorly built tests. It is important to keep in mind that not all automated tests are created equal: Unit tests are much faster, have better defect localization and are much more stable than integration tests and system tests. This means that integration and system tests are much more difficult to build well as a poorly written integration test might just decrease your productivity because you’ll spend so much timing fixing it. Hence, I recommend focusing on unit tests and to only sprinkle in some integration tests for things which cannot be tested on a unit level (e.g., some fancy SQL statement). The better your test suite, the higher your confidence when making a code change. In the end, the goal of your automated tests is to keep your productivity high as you can be sure that old bugs don’t slip back into the system. Naturally, this only works if you diligently write automated tests for every discovered bug. Whether your write the tests before the production code or later doesn’t matter. Also, avoid paying too much attention to code coverage metrics as they are deceiving: You can have a code coverage of 100% and still miss critical corner cases. On the flipside, a code coverage of 80% can be perfectly fine if the missed code is irrelevant.
A nice suite of automated tests also enables you to refactor your code if its structure doesn’t fit anymore. You should always question the existing code structure when making a code change. Often, you’ll discover that subtly changing it will make your current task easier and make the whole code easier to understand. This way you prevent degradation of the codebase over time and gradually improve it. You should always leave the code in better shape than you found it in. To do this you need confidence that the code will still work after your changes. This confidence will be provided by your automated tests and by the clarity of your code which is why these two things are so important.
Another important way to achieve code quality are static code checks. However, per default most of these checks only produce warnings which is insufficient as many developers ignore warnings. Hence, I recommend only using static code checks which produce errors. Using a tool like Jenkins for your continuous integration pipeline allows you to break the build if static code checks report any issues (including warnings) which is great. Of course, it is difficult to retrofit an existing codebase with such strict checks as you need to fix all the issues in the codebase before you can turn the checks on. My recommendation is to use an iterative approach where you turn on one check at a time, fix all the raised issues and then commit the changes. If you continue to increase the number of active checks over time, your code will slowly improve as well or at least not get any worse.
For large projects it is also very important to properly modularize your code. Working with a very large repository causes all kinds of problems and also increases coordination costs between different teams. To avoid this, the code needs to be split on a semantic basis and, ideally, so that it also matches the organizational structure (see also Conway’s law). Naturally, the first is more important than the latter as the organizational structure might change over time. However, we need to find a balance when splitting the code: A single repository with one million lines of code is obviously too big, but each additional repository adds a fixed base cost, so having thousand repositories with thousand lines of code is bad as well. There are no hard rules to find this balance. But how do you know that you need to modularize? My main cue to modularize further is a slow and/or unstable continuous integration pipeline. Paying attention to modularization early pays off as it is very difficult and time consuming to develop proper boundaries and APIs as an afterthought. You can avoid of lot of pain by keeping this goal in mind.
That concludes my overview on sane programming. I hope you found it helpful. If you liked this blog post, please share it with someone. You can also follow me on Twitter/X.