tar
If you want to build a tar file in a reproducible way how would you do it? For the sake of argument say you want to preserve
reproducible-builds.org suggests
# requires GNU Tar 1.28+
$ tar --sort=name \
--mtime="@${SOURCE_DATE_EPOCH}" \
--owner=0 --group=0 --numeric-owner \
-cf product.tar build
If you have GNU Tar < 1.28 then you can replace the --sort
flag with
find
and sort
. You might also want to use --mode="go-rwx,u-rw"
to
preserve only the executable bit of the file permissions. Additionally, I
see no reason to allow the mtime
to vary at all. All in, I suggest
find <files> -print0 \
| sort -z \
| tar -cf <output>.tar \
--format=posix \
--numeric-owner \
--owner=0 \
--group=0 \
--mode="go-rwx,u-rw" \
--mtime='1970-01-01' \
--no-recursion \
--null \
--files-from -
GNU Tar uses a GNU-specific file format. There’s a somewhat more capable format called “pax” and it’s defined in the POSIX.1-2001 specification. The GNU Tar manual is somewhat worrying because it says that
[The posix] archive format will be the default format for future versions of GNU tar.
If you don’t want to use a file format that’s losing its default status in
the future you might be tempted to switch to pax now. Unfortunately, pax
seems to have a lot of downsides for reproducible builds. The Wikipedia
entry doesn’t describe the
format and the pax
tool does not even support the pax
format! The best
place to learn about the pax specification is possibly from the Open Group
Base Specification Issue
7.
The single biggest downside is that pax can contain a lot of additional fields and it might be hard to persuade your archive creation program to create a file in a reproducible way. For example, if you try to create a pax archive containing one empty file, thus
touch example \
&& tar -cf - \
--format=posix \
--numeric-owner \
--owner=0 \
--group=0 \
--mode="go-rwx,u-rw" \
--mtime='1970-01-01 00:00:00Z' \
example \
| hexdump -C
then you will see that pax creates atime
and ctime
fields in extended
pax headers. I cannot find any way to tell GNU Tar to turn these off.
In conclusion, reproducible builds are currently best done with GNU Tar format.
ivfsurtm contacted me to point out that it is possible to omit the
atime
and ctime
fields with the following command
touch example \
&& tar --format=posix \
--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime,delete=mtime \
--mtime='1970-01-01 00:00:00Z' \
--sort=name \
--numeric-owner \
--owner=0 \
--group=0 \
--mode="go-rwx,u-rw" \
-cvf - \
example \
| hexdump -C
The options are documented in the Gnu tar manual.