In my last article on the 2-pizza team rule, I talked about how software development practices should be examined and evaluated before blindly applying them. In this article, I will present another practice that often gets overlooked: software versioning. Specifically, I will talk about how software versioning has been used traditionally, how it gets misapplied in today’s software service development, and provide a practical example of versioning that fits the development workflow.
What is software versioning?
Software versioning is the practice of assigning unique identifiers to different releases of software. A software release is typically an executable for a compute platform, but it can also be just its source code.
The primary use of versioning is to distinguish software releases from each other, but they may have other purposes as well. For example:
- Sequence-based versioning is used for understanding the order in which the software releases were released. For example, version 2 comes after version 1.
- A variant of sequence-based versioning is a pattern that assigns significance to the change between versions. This can be further used for finding out if there are changes between software releases that break compatibility in, for example, the API. For example, in semantic versioning, each version consists of three numbers, where each number represents the significance of the change.
- The release date of software can be used for versioning software to indicate when a particular release was made. Naturally, it simultaneously provides the benefits of sequence-based versioning.
Software versioning can also be mostly or entirely arbitrary and based on feelings. Version numbers can be used for marketing purposes to signal how new, mature, or significant a software release is. Versions may also be skipped for superstitious reasons (e.g. some might consider 13 an unlucky version number), for aesthetic reasons (e.g. the version looks ugly), or based on gut feeling (e.g. the maintainer just felt like there are enough changes for a new release).
Software versioning can also be a requirement. Some regulated fields require versioning to be present. For example, medical devices delivered as software must have a version number visible to the software consumers for identification and traceability purposes. Versioning may also be a part of a contractual obligation to a specific customer.
Challenges of versioning today
Applying well-known software versioning strategies in many of today’s software creates a few challenges.
Specifically, they are not helpful to the consumers of the software, and they are cumbersome to integrate into their development workflows.
Traditionally, software applications are run on someone else’s computer. New software releases are made available for users or administrators to download and install on their computers. In this case, a traditional versioning pattern such as semantic versioning (along with release notes) can help the software consumer to determine how the new release compares to what they are running at the moment, and make a more informed decision on when to upgrade the software. This applies to both executables and libraries.
Moreover, a traditional versioning pattern allows parallel updates to be performed for different versions of the software, which benefits those who wish to avoid risky changes. For example, there can be two parallel major versions that both still receive security patches, so that users of both old and new major versions can keep on using their preferred software versions safely.
However, a lot of the software produced today follows a service model, which is built and consumed in a very different way compared to traditional software. This includes web services and in some cases modern desktop and mobile applications. For example, Google Docs, Spotify, Netflix, and Instagram follow this model.
The service software is usually run by the same people that build it or at least people very close to those who build it (e.g. an operations team in the same organization). This means that there is a direct line of communication between the two parties, which helps in bridging compatibility issue gaps.
Instead of arbitrary user environments, the service software is often run in a handful (or even just one) of homogenous compute platforms, and most likely they run the exact same version of the software. This limits the amount of work required to support different versions of the software or figure out different upgrade paths.
The running software is constantly updated when rapid deployment practices such as CI/CD and feature flags are used. The software might even get multiple updates per day. For the developers, manually bumping the version number slows down deployment as the developer has to make up a version number before the release goes out. Practices such as conventional commit can help by making the versioning numbering automatable, but it still requires effort and discipline to support it. The use of feature flags blurs the line between versions anyway as features can be switched on and off without new releases.
The consumers who depend on the APIs of the software care about the API compatibility but not the software version itself. As long as the API is not broken or the consumer can migrate to a new version of the API, it does not matter what version of the software they are consuming. This often means that the application has to support multiple versions of the API simultaneously to provide consumers with a graceful migration path. This makes it also difficult for developers to use software versioning as a meaningful way to express compatibility.
The consumers of service software do not have any control over which version of the software they run. Updates to the software are installed whether they like it or not which means that the software consumer can no longer make any informed decisions on whether to upgrade or not. Not that it is a huge loss anyway since the consumers of service software most likely do not even care about the version they interact with: How many times have you been interested in which version of YouTube is currently running? There are, of course, exceptions to this such as the regulated fields mentioned earlier.
Qualities of versioning for services
Now that we have identified the issues in applying traditional versioning to services, let’s examine the situation from the ground up: Why do we need versioning? What if we do not have it at all? If we do not have versioning, then we lose the following qualities.
- Ability to distinguish instances of software from one another.
- Ability to trace a running application to the source code that produced it.
- Ability to tell which software release comes after another.
- Ability to tell how big of an update I am going to have when upgrading from one version to another.
There may be more qualities to consider, but these are the ones that I often think of as features of versioning.
Without the first quality, there is no convenient way to know if all the instances of my software are in sync or if there are instances that still need to be updated. This is very likely since deployments can (and eventually will) fail and leave some instances of the software still running the old version. New changes to the software may also appear during the upgrade, which further complicates the situation. Since there is no way to know which instances need to be updated, there is no way to converge to a single version among all the instances.
The second quality helps tie issues in the software to the revision of the software’s source code. Without it, it is impossible to understand where a bug can be coming from and understand what changes have been done afterward. Moreover, reproducing the bug in a testing environment becomes impossible as well, which limits the ability to show that the issue is fixed in a later version. Technically, the link to the software revision could be embedded to the software release some way (e.g. exposed in a special API endpoint or as plain text in a special file), but being able to make the connection just from the software version is more convenient because it requires less introspection on the release.
The third quality helps understand whether the next version of the software to deploy is an upgrade or a downgrade, but it is not necessary. The mindset I use for services is to always have the latest working version running. If the latest version does not work, I roll back and keep running the last version that did (downgrade), and fix the issue in the next version (upgrade). While knowing when an upgrade or a downgrade happens is useful, I do not need the versioning system to keep track of it for me.
The idea behind the fourth quality is to be able to tell whether the next version will patch something under the hood or produce breaking changes. Ideally, there would never be any changes that break compatibility with the last version, because it can cause disruptions to the consumers using the service. However, as services evolve, we often need a way to push consumers to new, incompatible APIs as well. To achieve this, we can build the service in such a way that it supports multiple versions of the API simultaneously from the same software, thus providing consumers a graceful path to migrate to the newer APIs. Once there are no users for the old APIs (confirmed using telemetry data), the API support can be dropped without disruption. The API versions can be decoupled from the software version itself, so we do not need to use the software version to track compatibility changes.
In summary, the first two qualities are important for good operations and maintenance practices, while the rest of them do not provide much value. Therefore, we can build our versioning on the first two.
Versioning to fit the service development workflow
In addition to the qualities mentioned above, it is beneficial to find a versioning solution that fits the development workflow used for building and deploying the service software. If a versioning strategy makes releasing software complicated or too hard to implement and maintain, then it also negatively affects the ability to deliver software as well.
A typical workflow for developing and delivering software for me works like this:
- I push software code changes to a feature branch
- The continuous integration (CI) system pulls the changes, and tests and checks the changes
- A colleague reviews the changes (if necessary)
- I merge the changes to the main branch
- The CI system creates a new release
- The continuous delivery (CD) system either automatically deploys the release or has a one-click button to deploy it
- If feature flags are used, additional features might be toggled using it
This flow is repeated multiple times a week or even a day depending on the amount of development activity, which is why there is heavy reliance on automating the flow as far as possible. Introducing additional manual steps may bring too much overhead. For this reason, the version of the software should be decided automatically as well.
Additionally, it would be nice to have the automation around versioning to be as easy to understand, use, and maintain. It preferably would not require a lot of thinking on how it works, debugging, or fixing.
Example: Version control revision
For service software, I have found source code’s version control revisions to be useful for software versioning. This means that the revision ID (e.g. the commit hash in Git) is used as the version of the software. Let’s examine how this fulfills the requirements.
Since the revision ID is a unique ID that is calculated for each change in the version control system, it is granular enough to be used for distinguishing software versions from each other. This of course means that all software changes would need to be recorded in the version control to be true, which I consider best practice anyway.
By using the revision ID as the version, we directly link the version to the source code that produced the software release. Therefore, we can trace a software version to its source code. This makes it also easy to understand why the version is what it is.
The revision ID is also easy to obtain for automated software releases. Many CI platforms already provide this information via environment variables (e.g.
CI_COMMIT_SHA in GitLab) or it can be obtained using version control tools (e.g. the command
git rev-parse HEAD). No manual steps from developers or anyone else are required to decide the version. There also does not need to be a careful analysis done on source code or source history to determine the version either.
There are, of course, downsides to this strategy. Just by looking at the different versions, it is impossible to understand which version comes after which or what changes they contain. Also, the long revision ID is hard to use in discussions (try reading the ID out loud to someone). However, these are qualities I have found worth sacrificing.
Overall, the very simple nature of this versioning strategy makes it especially useful when there is a desire to frequently publish changes and there are no other known requirements for versioning.
Software versioning patterns such as semantic versioning still have their place when it comes to releasing software for a wide audience to run on their computing environments i.e. in situations where you are not the only consumer. This includes end-user software, CLI tools, software libraries, infrastructure software, and so on.
However, when the software is provided as a service via an API, the familiar versioning patterns start to become counterproductive. For this reason, it is important to reconsider starting from the beginning what versioning is needed for, and design or select a versioning system based on those needs. In this article, I presented how I found a suitable versioning strategy by applying this method.
There is no one type of versioning pattern that fits all cases. Each one has its pros, cons, and situations where they fit the best. Just because one pattern is popular does not mean that it applies to all situations.
Thanks for reading!