Ten years in software
Last updated
I have been working professionally as a Software Engineer for the past 10 years. In that time, I've learned a huge amount, gained a bit of confidence, and largely ignored the social nature of our field. I haven't given back to the community and now feel like it's a good time to change that. I've been very lucky in my career thus far and want to share the broad lessons that I've learned along the way.
This is part two of a series of pieces written reflecting on my career:
- Part one: How to solve problems
- Part two: Study other people's code
- Part three: Burnout is self-inflicted
- Part four: Fear is the mind-killer
- Part five: The value of a test
tl;dr: Study other people's code
I believe that if you want to be a strong software engineer, you need to study other people's code. Fortunately, almost all the software that powers the Internet and makes our computers work is open source and completely free to study. The act of studying other people's code is surprisingly not emphasized in academic settings (that I'm aware of), nor is it something that I've heard discussed much by my coworkers at work.
The Architecture of Open Source Applications is a freely available (user contributed) set of books which outline the architecture of various open source applications. Each chapter is a broad overview of a single piece of open source software: how it is structured/layered, what the teams have learned, as well as pitfalls/surprises which have occurred throughout its life. It's a great read.
Learning at the workplace
My first "real" job was as an engineer at Sendio, a very small company (only 2-4 engineers while I worked there). Sendio built, sold, and supported a drop-in Linux based host which acted as an SMTP gateway/"proxy". Many of the tools we used were written by Dan Bernstein (aka djb), a cryptographer who also is the author of many incredibly efficient, flexible, and resilient pieces of software. We used qmail, daemontools, and djbdns. On the web, he is brash, rigid, headstrong, idealistic, and often makes ad-hominem attacks on his perceived competitors. In short, he's obnoxious. I'm not a fan of his personal affect, but if you can look past that, his software is remarkably well designed. I learned a lot by working with his software, and believe his work should be studied closely.
A lot has been written about DJB's software (the DJB legacy, the DJB way), and what follows is my set of takeaways from my experience using, debugging, and studying his software as we used it at Sendio.
Question your core tools
C programs are notorious for having security vulnerabilities because the standard library is filled with dangerous methods. Misusing printf can cause writes to arbitrary memory locations, correctly handling null-terminated strings is extremely hard to get right, and using UNIX/POSIX time has surprising behavior with respect to leap seconds. These all stem from surprising behavior in the POSIX standard and the C standard library. I'd argue that these are design flaws.
Take a step back and question your language's standard library, core primitives, or even your operating system's
interface. It may be unintentionally causing subtle, dangerous bugs in your software's design and behavior. Take for
instance array allocation, reallocation, concatenation, and truncation. Typically you would have to deal with the
memory primitives of malloc
/calloc
/realloc
, free
, and memcpy
/memmove
to perform these operations. But these
are very error prone and difficult to get right.
But you must remember that robust software can be built from more fragile or error-prone primitives. Just like how the strong delivery guarantees of TCP are built on an unreliable IP transport, it's possible to build a better interface on top of these not so great primitives. As an example, djb implemented an array library to perform these common array operations at a higher level, without having to concern yourself with the manual calculation of the size of the arrays. It's simpler to write (you say what you mean), simpler to read (the operations are of a large granularity), and harder to get wrong (the names map directly to the operations being performed on an array).
Remember that you are not limited to the standard library interface. You can build a better interface. Doing it can make your code safer, better to write, and easier to read.
Write simple tools that work well together
The program envdir
is deceptively simple for its utility. It runs a program with a modified set of
environment variables defined within a directory. The provided directory can contain any number of files whose names are
the environment variable names and whose contents become the environment variable contents. When used correctly, it lets
your program rely on a configuration directory instead of a configuration file. For example, you could have an
environment directory for a web application which holds the following files:
my-envdir/
├── DB_USER
├── DB_PASSWORD
├── DEPLOY_COMMIT_ID
└── THIRD_PARTY_APP_SECRET
Running this application via envdir my-envdir my-application --arg1 --arg2
will launch my-application --arg1 --arg2
with environment variables set for DB_USER
, DB_PASSWORD
, DEPLOY_COMMIT_ID
, THIRD_PARTY_APP_SECRET
, whose values
are set to the contents of the files on disk. The application can then do whatever it needs to with its configuration
environment. A nice benefit of keeping these items as separate files in the filesystem is that these files can
can have separate permissions for additional layered security as well as could be populated from a version controlled
repository for easy auditing.
In addition to envdir
there are several complementary programs which provide a similarly small pieces of reusable
functionality for bootstrapping an application:
tcpserver
just listens for a TCP connection and launches a child process for each connection. It can accept/reject whitelisted/blacklisted hosts, limit concurrency, launch the child process under a less privileged user/group, and more.multilog
routes its input to any number of rotated log files within directories according to a simple script. It can also automatically timestamp log lines, pipe log lines through external programs, and more.argv0
runs a program with argv0 set to a specified string. This is useful for formatting how various processes are viewed viaps
,top
, or other/proc
interfaces.setlock
runs a program after obtaining an exclusive lock on a file.recordio
runs a program while logging its input and output to a separate location.
Paired with these programs, simple shell scripts can easily be turned into powerful, rate-limited, privilege-reduced, debuggable, and concurrency-constrained network services which can be inspected easily with other system monitoring tools.
Document expected behavior
One of the trends I've noticed with popular software packages is the emphasis on quick-start guides and how-to write-ups as a primary means for documentation. This type of documentation is ideal for inflexible tools which serve a singular purpose. Often, these tools have many knobs for configuration within a domain, but fall flat when used in a way which was unanticipated by the authors. Within DJB's software, his documentation is a concise description of the behavior and interface. In some cases, it's so brief that it feels as though a piece might be missing.
I think this is the hallmark of flexible software, where examples and how-this would limit the realm of possibilities that the software provides. Documented behavior is more valuable than a how-to when you have flexible tools.