The H2 Wiki


reproducible-tar

Reproducible tar

If you want to build a tar file in a reproducible way how would you do it? For the sake of argument say you want to preserve

reproducible-builds.org suggests

# requires GNU Tar 1.28+
$ tar --sort=name \
      --mtime="@${SOURCE_DATE_EPOCH}" \
      --owner=0 --group=0 --numeric-owner \
      -cf product.tar build

If you have GNU Tar < 1.28 then you can replace the --sort flag with find and sort. You might also want to use --mode="go-rwx,u-rw" to preserve only the executable bit of the file permissions. Additionally, I see no reason to allow the mtime to vary at all. All in, I suggest

find <files> -print0 \
| sort -z \
| tar -cf <output>.tar \
      --format=posix \
      --numeric-owner \
      --owner=0 \
      --group=0 \
      --mode="go-rwx,u-rw" \
      --mtime='1970-01-01' \
      --no-recursion \
      --null \
      --files-from -

Pax

GNU Tar uses a GNU-specific file format. There’s a somewhat more capable format called “pax” and it’s defined in the POSIX.1-2001 specification. The GNU Tar manual is somewhat worrying because it says that

[The posix] archive format will be the default format for future versions of GNU tar.

If you don’t want to use a file format that’s losing its default status in the future you might be tempted to switch to pax now. Unfortunately, pax seems to have a lot of downsides for reproducible builds. The Wikipedia entry doesn’t describe the format and the pax tool does not even support the pax format! The best place to learn about the pax specification is possibly from the Open Group Base Specification Issue 7.

The single biggest downside is that pax can contain a lot of additional fields and it might be hard to persuade your archive creation program to create a file in a reproducible way. For example, if you try to create a pax archive containing one empty file, thus

touch example \
 && tar -cf output.tar \
        --format=posix \
        --numeric-owner \
        --owner=0 \
        --group=0 \
        --mode="go-rwx,u-rw"
        --mtime='1970-01-01' \
        example \
  && hexdump -C output.tar

then you will see that pax creates atime and ctime fields in extended pax headers. I cannot find any way to tell GNU Tar to turn these off.

In conclusion, reproducible builds are currently best done with GNU Tar format.