Branch/Merge diagram showing a bug fix propagating
from the old release branch to the ongoing development
branch via another release branch.

The Artifact

An artifact is a set of files with the following properties:

The main distinction between some random set of files and an artifact is the documentation requirement. It's a major bonus if that information is machine readable.

Note that being able to reproduce the build is not a strict requirement. Actually requiring reproducability is very very difficult to achieve in practice, so it is not reasonable to build a process that assumes reproducability. Instead, it is better to design a process that allows the exact same artifact to be used in different settings (testing, staging and production) and thereby avoiding rebuilding artifacts altogether.

Examples

Installers on a download page

The download page can fulfill the documentation requirement, if it has links or other means to identify the revisions and other artifacts used in the construction of the installer. For example, this download page contains a variety of links detailing exactly what went into building the three installers. Drill down on the "Changes since last good build" and see how not only the changes to the product itself, but also changes to various dependencies are included. You can further drill down all the way to the source code itself. Note that the information is also vailable in JSON format.

Source Packages

Since these contain just the source files and their build scripts themselves, they trivially fulfill the documentation requirement. They are usually not very useful artifacts for the end user, but can often serve as pre-requisite artifacts to be used for building artifacts that are usefull for the end-user.

Java JAR or WAR files

By themselves, they do not quite have enough information. Various systems (ivy, maven) have been built to attach the required documentation to make these into proper artifacts.

Notes

Why it's hard to reproduce builds

At first, it is not clear why it should be that hard. Part of it is that we generally are willing to accept "obvious" differences, as long as a reproduced build is "mostly" the same. After all, if builds were totally random, no software engineering could get done. The question is more one of risk analysis. What are the risks involved in rebuilding? Would you deploy a rebuilt artifact into production without testing?

One of my major misgivings about the otherwise excellent maven build system is their release plugin, which makes the basic mistake of performing the tagging prior to the build, therefore requiring you to rebuild for release. The obvious workaround is to always build for release, but that would be much slower.

The Problem with Source Packages

Source packages are a common way to distribute open source software. Download the tarball, unpack, type "./configure; make; sudo make install" and you're done - most of the time. Assuming this works, there still is trouble:

Source Packages are best used only as built time dependencies to produce your own binary artifact, which should be made to travel between test, staging and production without modifications.