4mLIBARCHIVE-FORMATS24m(5)	   File Formats Manual	      4mLIBARCHIVE-FORMATS24m(5)

1mNAME0m
       libarchive-formats  —  archive  formats supported by the libarchive li‐
       brary

1mDESCRIPTION0m
       The 4mlibarchive24m(3) library reads  and  writes  a  variety  of  streaming
       archive formats.	 Generally speaking, all of these archive formats con‐
       sist  of a series of “entries”.	Each entry stores a single file system
       object, such as a file, directory, or symbolic link.

       The following provides a brief description of each format supported  by
       libarchive,  with some information about recognized extensions or limi‐
       tations of the current library support.	Note that just because a  for‐
       mat  is supported by libarchive does not imply that a program that uses
       libarchive will support that format.  Applications that use  libarchive
       specify which formats they wish to support, though many programs do use
       libarchive convenience functions to enable all supported formats.

   1mTar Formats0m
       The  4mlibarchive24m(3)	library	 can  read most tar archives.  It can write
       POSIX-standard “ustar” and “pax interchange” formats as well as v7  tar
       format and a subset of the legacy GNU tar format.

       All  tar formats store each entry in one or more 512-byte records.  The
       first record is used for file metadata, including filename,  timestamp,
       and  mode  information,	and  the  file	data  is  stored in subsequent
       records.	 Later variants have extended this by either appropriating un‐
       defined areas of the header record, extending the  header  to  multiple
       records,	 or  by storing special entries that modify the interpretation
       of subsequent entries.

       1mgnutar	 22mThe	 4mlibarchive24m(3)  library  can  read	 most	GNU-format   tar
	       archives.   It  currently  supports the most popular GNU exten‐
	       sions, including modern long filename and linkname support,  as
	       well  as atime and ctime data.  The libarchive library does not
	       support multi-volume archives, nor the old  GNU	long  filename
	       format.	It can read GNU sparse file entries, including the new
	       POSIX-based formats.

	       The  4mlibarchive24m(3)	library can write GNU tar format, including
	       long filename and linkname support, as well as atime and	 ctime
	       data.

       1mpax	 22mThe	 4mlibarchive24m(3)  library  can read and write POSIX-compliant
	       pax  interchange	 format	 archives.   Pax  interchange	format
	       archives are an extension of the older ustar format that adds a
	       separate	 entry	with additional attributes stored as key/value
	       pairs immediately before each regular entry.  The  presence  of
	       these additional entries is the only difference between pax in‐
	       terchange  format and the older ustar format.  The extended at‐
	       tributes are of unlimited length and are stored as  UTF-8  Uni‐
	       code strings.  Keywords defined in the standard are in all low‐
	       ercase;	vendors are allowed to define custom keys by preceding
	       them with the vendor name in all uppercase.  When  writing  pax
	       archives,  libarchive  uses  many of the SCHILY keys defined by
	       Joerg Schilling's “star” archiver and a	few  LIBARCHIVE	 keys.
	       The  libarchive	library	 can  read most of the SCHILY keys and
	       most of the GNU keys introduced by GNU tar.   It	 silently  ig‐
	       nores any keywords that it does not understand.

	       The  pax	 interchange  format converts filenames to Unicode and
	       stores them using the UTF-8 encoding.  Prior to libarchive 3.0,
	       libarchive erroneously assumed that the	system	wide-character
	       routines	 natively  supported  Unicode.	This caused it to mis-
	       handle non-ASCII filenames on systems that did not satisfy this
	       assumption.

       1mrestricted pax0m
	       The libarchive library can also write pax archives in which  it
	       attempts	 to  suppress  the  extended attributes entry whenever
	       possible.  The result will be identical to a ustar archive  un‐
	       less  the extended attributes entry is required to store a long
	       file name, long linkname, extended ACL, file flags, or  if  any
	       of  the	standard  ustar data (user name, group name, UID, GID,
	       etc) cannot be fully represented in the ustar header.   In  all
	       cases,  the  result  can	 be dearchived by any program that can
	       read POSIX-compliant pax interchange format archives.  Programs
	       that correctly read ustar format (see below) will also be  able
	       to  read this format; any extended attributes will be extracted
	       as separate files stored in 4mPaxHeader24m directories.

       1mustar	 22mThe libarchive library can both read  and  write  this  format.
	       This format has the following limitations:
	       1m•   22mDevice	major  and  minor  numbers  are limited to 21 bits.
		   Nodes with larger numbers will not be added to the archive.
	       1m•   22mPath names  in	the  archive  are  limited  to	255  bytes.
		   (Shorter  if	 there	is no / character in exactly the right
		   place.)
	       1m•   22mSymbolic links and hard links are  stored  in  the  archive
		   with the name of the referenced file.  This name is limited
		   to 100 bytes.
	       1m•   22mExtended  attributes,  file flags, and other extended secu‐
		   rity information cannot be stored.
	       1m•   22mArchive entries are limited to 8 gigabytes in size.
	       Note that the pax interchange format has none of these restric‐
	       tions.  The ustar format is old and widely  supported.	It  is
	       recommended when compatibility is the primary concern.

       1mv7	 22mThe	 libarchive  library  can  read and write the legacy v7 tar
	       format.	This format has the following limitations:
	       1m•   22mOnly regular files, directories, and symbolic links can	 be
		   archived.   Block  and  character  device nodes, FIFOs, and
		   sockets cannot be archived.
	       1m•   22mPath names in the archive are limited to 100 bytes.
	       1m•   22mSymbolic links and hard links are  stored  in  the  archive
		   with the name of the referenced file.  This name is limited
		   to 100 bytes.
	       1m•   22mUser and group information are stored as numeric IDs; there
		   is no provision for storing user or group names.
	       1m•   22mExtended  attributes,  file flags, and other extended secu‐
		   rity information cannot be stored.
	       1m•   22mArchive entries are limited to 8 gigabytes in size.
	       Generally, users should prefer the ustar format for portability
	       as the v7 tar format is both less useful and less portable.

       The libarchive library also reads a variety of commonly-used extensions
       to the basic tar format.	 These extensions are recognized automatically
       whenever they appear.

       Numeric extensions.
	       The POSIX standards require fixed-length numeric fields	to  be
	       written	with some character position reserved for terminators.
	       Libarchive allows these fields to be written without terminator
	       characters.  This extends the allowable range;  in  particular,
	       ustar archives with this extension can support entries up to 64
	       gigabytes  in size.  Libarchive also recognizes base-256 values
	       in most numeric fields.	This essentially removes  all  limita‐
	       tions on file size, modification time, and device numbers.

       Solaris extensions
	       Libarchive  recognizes ACL and extended attribute records writ‐
	       ten by Solaris tar.

       The first tar program appeared in Seventh Edition Unix  in  1979.   The
       first  official	standard for the tar file format was the “ustar” (Unix
       Standard Tar) format defined by POSIX in 1988.	POSIX.1-2001  extended
       the ustar format to create the “pax interchange” format.

   1mCpio Formats0m
       The libarchive library can read and write a number of common cpio vari‐
       ants.  A cpio archive stores each entry as a fixed-size header followed
       by a variable-length filename and variable-length data.	Unlike the tar
       format, the cpio format does only minimal padding of the header or file
       data.   There  are several cpio variants, which differ primarily in how
       they store the initial header: some store the values as octal or	 hexa‐
       decimal numbers in ASCII, others as binary values of varying byte order
       and length.

       1mbinary	 22mThe	 libarchive library transparently reads both big-endian and
	       little-endian variants of the the two binary cpio formats;  the
	       original	 one  from  PWB/UNIX, and the later, more widely used,
	       variant.	 This format used 32-bit binary values for  file  size
	       and  mtime, and 16-bit binary values for the other fields.  The
	       formats support only the file types present in UNIX at the time
	       of their creation.  File sizes are limited to 24	 bits  in  the
	       PWB format, because of the limits of the file system, and to 31
	       bits in the newer binary format, where signed 32 bit longs were
	       used.

       1modc	 22mThis  is  the  POSIX  standardized	format, which is officially
	       known as the “cpio interchange format” or  the  “octet-oriented
	       cpio  archive format” and sometimes unofficially referred to as
	       the “old character format”.  This format stores the header con‐
	       tents as octal values in ASCII.	It is standard, portable,  and
	       immune  from  byte-order	 confusion.   File sizes and mtime are
	       limited to 33 bits (8GB file size), other fields are limited to
	       18 bits.

       1mSVR4/newc0m
	       The libarchive library can read both CRC and  non-CRC  variants
	       of  this	 format.  The SVR4 format uses eight-digit hexadecimal
	       values for all header fields.  This limits file	size  to  4GB,
	       and  also  limits  the  mtime and other fields to 32 bits.  The
	       SVR4 format can optionally include a CRC of the file  contents,
	       although libarchive does not currently verify this CRC.

       Cpio  first appeared in PWB/UNIX 1.0, which was released within AT&T in
       1977.  PWB/UNIX 1.0 formed the basis of System III Unix, released  out‐
       side  of	 AT&T  in 1981.	 This makes cpio older than tar, although cpio
       was not included in Version 7 AT&T Unix.	 As a result, the tar  command
       became  much better known in universities and research groups that used
       Version 7.  The combination of the 1mfind  22mand  1mcpio  22mutilities	provided
       very  precise  control  over file selection.  Unfortunately, the format
       has many limitations that make it unsuitable for widespread use.	  Only
       the  POSIX format permits files over 4GB, and its 18-bit limit for most
       other fields makes it unsuitable for modern systems.  In addition, cpio
       formats only store numeric UID/GID  values  (not	 usernames  and	 group
       names), which can make it very difficult to correctly transfer archives
       across systems with dissimilar user numbering.

   1mShar Formats0m
       A “shell archive” is a shell script that, when executed on a POSIX-com‐
       pliant  system, will recreate a collection of file system objects.  The
       libarchive library can write two different kinds of shar archives:

       1mshar	 22mThe traditional shar format uses a limited set  of	POSIX  com‐
	       mands, including 4mecho24m(1), 4mmkdir24m(1), and 4msed24m(1).  It is suitable
	       for  portably  archiving small collections of plain text files.
	       However, it is not generally  well-suited  for  large  archives
	       (many  implementations  of  4msh24m(1)  have limits on the size of a
	       script) nor should it be used with non-text files.

       1mshardump0m
	       This  format  is	 similar  to  shar  but	 encodes  files	 using
	       4muuencode24m(1)	 so  that  the result will be a plain text file re‐
	       gardless of the file contents.	It  also  includes  additional
	       shell  commands	that attempt to reproduce as many file attrib‐
	       utes as possible, including owner, mode, and flags.  The	 addi‐
	       tional  commands	 used to restore file attributes make shardump
	       archives less portable than plain shar archives.

   1mISO9660 format0m
       Libarchive can read and extract from files containing ISO9660-compliant
       CDROM images.  In many cases, this can remove the need to burn a physi‐
       cal CDROM just in order to read the files contained in an  ISO9660  im‐
       age.  It also avoids security and complexity issues that come with vir‐
       tual  mounts and loopback devices.  Libarchive supports the most common
       Rockridge extensions and has partial support for Joliet extensions.  If
       both extensions are present, the Joliet extensions will be used and the
       Rockridge extensions will be ignored.  In particular, this  can	create
       problems	 with hardlinks and symlinks, which are supported by Rockridge
       but not by Joliet.

       Libarchive reads ISO9660 images using a streaming strategy.   This  al‐
       lows  it	 to read compressed images directly (decompressing on the fly)
       and allows it to read images directly from network sockets, pipes,  and
       other  non-seekable  data  sources.  This strategy works well for opti‐
       mized ISO9660 images created by many popular programs.	Such  programs
       collect all directory information at the beginning of the ISO9660 image
       so it can be read from a physical disk with a minimum of seeking.  How‐
       ever, not all ISO9660 images can be read in this fashion.

       Libarchive  can also write ISO9660 images.  Such images are fully opti‐
       mized with the directory information preceding all file data.  This  is
       done  by storing all file data to a temporary file while collecting di‐
       rectory information in memory.  When the image is finished,  libarchive
       writes  out the directory structure followed by the file data.  The lo‐
       cation used for the temporary file can be changed by the usual environ‐
       ment variables.

   1mZip format0m
       Libarchive can read and write zip  format  archives  that  have	uncom‐
       pressed	entries	 and  entries compressed with the “deflate” algorithm.
       Other zip compression algorithms are not supported.  It can extract jar
       archives, archives that use Zip64 extensions  and  self-extracting  zip
       archives.   Libarchive  can  use either of two different strategies for
       reading Zip archives: a streaming strategy which is fast and can handle
       extremely large archives, and a seeking strategy	 which	can  correctly
       process	self-extracting Zip archives and archives with deleted members
       or other in-place modifications.

       The streaming reader processes Zip archives as they are read.   It  can
       read  archives  of arbitrary size from tape or network sockets, and can
       decode Zip archives that have been separately  compressed  or  encoded.
       However,	 self-extracting  Zip archives and archives with certain types
       of modifications cannot be correctly handled.   Such  archives  require
       that  the reader first process the Central Directory, which is ordinar‐
       ily located at the end of a Zip archive and is thus inaccessible to the
       streaming reader.  If the program using	libarchive  has	 enabled  seek
       support,	 then libarchive will use this to processes the central direc‐
       tory first.

       In particular, the seeking reader must  be  used	 to  correctly	handle
       self-extracting	archives.  Such archives consist of a program followed
       by a regular Zip archive.  The streaming reader cannot parse  the  ini‐
       tial program portion, but the seeking reader starts by reading the Cen‐
       tral  Directory	from  the end of the archive.  Similarly, Zip archives
       that have been modified in-place can  have  deleted  entries  or	 other
       garbage	data that can only be accurately detected by first reading the
       Central Directory.

   1mArchive (library) file format0m
       The Unix archive format (commonly created by the 4mar24m(1) archiver)  is  a
       general-purpose	format	which  is  used	 almost exclusively for object
       files to be read by the link editor 4mld24m(1).	The  ar	 format	 has  never
       been  standardised.   There are two common variants: the GNU format de‐
       rived from SVR4, and the BSD format, which first	 appeared  in  4.4BSD.
       The  two differ primarily in their handling of filenames longer than 15
       characters: the GNU/SVR4 variant writes a filename table at the	begin‐
       ning of the archive; the BSD format stores each long filename in an ex‐
       tension	area  adjacent	to the entry.  Libarchive can read both exten‐
       sions, including archives that may include both	types  of  long	 file‐
       names.	Programs  using	 libarchive  can write GNU/SVR4 format if they
       provide an entry called 4m//24m containing a filename table  to	be  written
       into  the  archive  before any of the entries.  Any entries whose names
       are not in the filename table will  be  written	using  BSD-style  long
       filenames.  This can cause problems for programs such as GNU ld that do
       not support the BSD-style long filenames.

   1mmtree0m
       Libarchive can read and write files in 4mmtree24m(5) format.  This format is
       not  a  true archive format, but rather a textual description of a file
       hierarchy in which each line specifies the name of a file and  provides
       specific metadata about that file.  Libarchive can read all of the key‐
       words  supported	 by  both the NetBSD and FreeBSD versions of 4mmtree24m(8),
       although many  of  the  keywords	 cannot	 currently  be	stored	in  an
       archive_entry  object.	When  writing,	libarchive supports use of the
       4marchive_write_set_options24m(3) interface to specify which keywords should
       be included in the output.  If libarchive was compiled with  access  to
       suitable	 cryptographic	libraries  (such as the OpenSSL libraries), it
       can compute hash entries such as 1msha512 22mor 1mmd5  22mfrom  file  data  being
       written to the mtree writer.

       When  reading  an  mtree file, libarchive will locate the corresponding
       files on disk using the 1mcontents 22mkeyword  if  present  or  the  regular
       filename.  If it can locate and open the file on disk, it will use that
       to  fill	 in  any metadata that is missing from the mtree file and will
       read  the  file	contents  and  return  those  to  the  program	 using
       libarchive.   If it cannot locate and open the file on disk, libarchive
       will return an error for any attempt to read the entry body.

   1m7-Zip0m
       Libarchive can read and write 7-Zip format archives.  TODO:  Need  more
       information

   1mCAB0m
       Libarchive  can read Microsoft Cabinet ( “CAB”) format archives.	 TODO:
       Need more information.

   1mLHA0m
       TODO: Information about libarchive's LHA support

   1mRAR0m
       Libarchive has limited support for reading RAR format  archives.	  Cur‐
       rently,	libarchive  can read RARv3 format archives which have been ei‐
       ther created uncompressed, or compressed using any of  the  compression
       methods	supported by the RARv3 format.	Libarchive can also read self-
       extracting RAR archives.

   1mWarc0m
       Libarchive can read and write “web archives”.  TODO: Need more informa‐
       tion

   1mXAR0m
       Libarchive can read and write the XAR format used by many Apple	tools.
       TODO: Need more information

1mSEE ALSO0m
       4mar24m(1),  4mcpio24m(1), 4mmkisofs24m(1), 4mshar24m(1), 4mtar24m(1), 4mzip24m(1), 4mzlib24m(3), 4mcpio24m(5),
       4mmtree24m(5), 4mtar24m(5)

Debian			       December 27, 2016	 4mLIBARCHIVE-FORMATS24m(5)