Behind the phrase “data archiving” is the basic idea of backing up files or entire dir­ect­or­ies and storing them in a secure location, often in a com­pressed form. For reasons of data security, archiving was an important factor in server en­vir­on­ments at an early stage: Ori­gin­ally server data was stored on tape drives – a backup method which is still used for large data volumes. To make this archiving method as efficient as possible, the packing programme tar (short for tape archiver) was developed for Unix systems in 1979. With the help of tar, files and dir­ect­or­ies can to this day still be packed into a single data file and then recovered with the user rights still remaining intact – as long as the source and target both support the Unix or Linux data file dir­ect­or­ies.

For the archiving process to free up ad­di­tion­al storage space, .tar-data files are often com­pressed with the help of different tools, like gzip, bzip2, or lzop. But what are the different com­pres­sion pro­grammes and formats? And why are they still so important today for systems like Linux tar?

Cloud Backup powered by Acronis
Mitigate downtime with total workload pro­tec­tion
  • Automatic backup and easy recovery
  • Intuitive schedul­ing and man­age­ment
  • AI-based threat pro­tec­tion

The most popular com­pres­sion pro­grammes for Linux

There are a number of free com­pres­sion tools for Linux dis­tri­bu­tions that all have one thing in common: they can be operated via command line or terminal. Short commands can quickly compress data files, such as HTML documents, to save storage space and bandwidth when sending via networks or the internet. In addition, there are standard graphic in­ter­faces for these tools, as well as archive managers, which combine several com­pres­sion pro­grammes – that must be installed as well – into a single visual user interface. Control of the graphic interface obviously requires ad­di­tion­al system resources, which is why use of a terminal generally remains the best choice for com­pres­sion.

The main dif­fer­ence between the in­di­vidu­al pro­grammes is the com­pres­sion rate, which is ac­com­pan­ied by different com­pres­sion durations. In most cases, however, different modes can also be selected in the tool itself to offer either the best possible storage reduction or the quickest possible com­pres­sion time. Another feature that dif­fer­en­ti­ates com­pres­sion software is the output format. Due to the different al­gorithms used by the various pro­grammes, com­pressed files have different pack format and require specific pro­grammes to be unpacked.

gzip

gzip (GNU zip) is one of the most used Linux com­pres­sion methods. The tool es­pe­cially plays an important role in web de­vel­op­ment, which is based on the deflate algorithm and was ori­gin­ally developed as a successor to the Unix original rock compress for the GNU platform. Today, the ap­plic­a­tion pro­grammed in C can be used for ex­tract­ing and packing files not only on Linux, but also on Windows and macOS systems. gzip builds 32,000 bytes (32KB) data blocks, which is why it’s con­sidered obsolete in modern com­pres­sion pro­grammes.

In terms of speed, the free pack programme is still among the top options, which is why common web server software such as Apache, IIS, or NGINX usually implement it in the form of their own modules to answer user queries with com­pressed data packets in the shortest possible time. Ad­di­tion­al in­form­a­tion about the func­tion­al­ity and use of the GPL-licensed com­pres­sion tool can be found in our article on the program.

Benefits Drawbacks
Fast com­pres­sion process Small block size
Standard popular web server software Low com­pres­sion ratio

bzip2

For a loss-free and high-quality com­pres­sion of files under Linux, bzip2 is almost marketed under a BSD-similar license. The ap­plic­a­tion uses a three-layer com­pres­sion method: First the Burrows-Wheeler Trans­form­a­tion is used to sort the incoming data into different blocks. These are 900,000 bytes (900KB) each, and then undergo a Move-to-front trans­form­a­tion. Finally, a Huffman coding provides for the actual com­pres­sion of the data. Files packages with bzip2 are given the format­ting .bz2.

The programme, developed by Julian Seward, trumps other tools by far in terms of com­pres­sion, but also takes a lot more time to complete the process. One of the biggest ad­vant­ages is that you can work with partially damaged archives in unpacking bz2. With the help of bzip2recover, you can at least extract and unpack all readable blocks. bzip2 is the official successor of bzip, which worked with an arith­met­ic code and wasn’t developed further for patent reasons.

Benefits Drawbacks
Strong com­pres­sion rate Very slow
Unpacking partially damaged archives possible

p7zip

p7zip is a portal  of the free, LGPL-licensed 7-zip archive programme for POSIX platforms. The portal is the only solution under Linux that fully supports the .7z format. The packing programme is based on the Lempel-Ziv-Markov algorithm (LZMA) developed by Igor Pavlov in 1998, which works with a dic­tion­ary method and can, in principle, be regarded as a further de­vel­op­ment of Deflate (with ap­prox­im­ately 50% stronger com­pres­sion). A created file archive can be split into as many parts as required, with password pro­tec­tion and optional en­cryp­tion using AES-256 (header).

LZMA provides excellent results with its high com­pres­sion rate, and also performs well in terms of speed. But the archiving tool also places very high demands on system per­form­ance. A good processor (at least 2GHz) and suf­fi­cient memory (2GB or more) are basic pre­requis­ites, es­pe­cially for high com­pres­sion levels. Aside from use via terminal or an archive manager, p7zip-gui also has its own graphic interface for the ported 7-zip ap­plic­a­tion.

Benefits Drawbacks
Excellent ratio of com­pres­sion and duration Very high system re­quire­ments
Password pro­tec­tion and header en­cryp­tion possible

lzop

The com­pres­sion programme lzop (Lempel-Ziv-Oberhumer-Packer) focuses on the speed of the packing and unpacking processes, just like gzip, and averages even better results than the GNU tool. It’s based on its namesake, the Lempel-Ziv-Oberhumer Algorithm (LZO), which was published in 1996 under the GNU General Public License (GPL). The resource-efficient com­pres­sion works according to the dic­tion­ary method: Repeating strings are replaced by a symbol, which points to the cor­res­pond­ing entry of the same, first-recorded string in the dic­tion­ary. The files are processed in blocks of 256,000 bytes (256KB). By default, the original file will remain in the process.

Beside a top-level com­pres­sion speed and com­pat­ib­il­ity with gzip, the de­vel­op­ment of lzop focused on the port­ab­il­ity of the software as a top issue. For this reason, versions exist for virtually all platforms, including macOS and Windows. Com­pressed files contain the format .lzo.

Benefits Drawbacks
Very quick com­pres­sion Com­pres­sion ratio rather low due to the high speed
High port­ab­il­ity

Popular tools and formats: A tabular com­par­is­on

gzip bzip2 p7zip lzop
Operating systems Cross-platform Linux/Unix, Windows Unix-like Cross-platform
License GNU GPL BSD-like GNU LGPL GNU GPL
Com­pres­sion procedure Deflate algorithm Burrows-Wheeler trans­form­a­tion, move-to-front trans­form­a­tion, Huffman coding LZMA algorithm LZO algorithm
Data format .gz .bz2 .7z .lzo
En­cryp­tion AES-256
Com­pres­sion mode 1–9 1–9 0–9 1, 3, 7–9
Strengths Very fast Very good com­pres­sion rate Superb com­pres­sion rate, com­presses file dir­ect­or­ies Very fast, com­presses file dir­ect­or­ies
Weak­nesses Only com­presses single files Moderate speed, only com­presses single files High system per­form­ance demands Weak com­pres­sion rate

The table overview makes it apparent that there is no single in­dis­pens­able com­pres­sion tool, but instead demon­strates that the choice of programme depends on the operation scenario. p7zip, for example, has clear ad­vant­ages, such as the strength of com­pres­sion rate and the pos­sib­il­ity for AES-256 en­cryp­tion, which is worth quite a lot when security plays a large role. Ad­di­tion­ally, p7zip and lzop both allow for the com­pres­sion of entire file dir­ect­or­ies, while with gzip and bzip2 only single files can be com­pressed. On the other hand, p7zip also makes high demands on the system per­form­ance, making it less suitable for small-scale com­pres­sion.

How data com­pres­sion works with Linux tools

The mentioned packing pro­grammes differ sig­ni­fic­antly in terms of com­pres­sion rates and speed. When it comes to the syntax and use of these tools, though, the sim­il­ar­it­ies are no­tice­able. All pro­grammes can be used without a specific graphic interface or archive manager, via the command line. Beginners can quickly become ac­cus­tomed to the different para­met­ers and commands. As an example, we’ll show you how to compress files with bzip2 under Linux and then unpack such files in the .bz2 format.

The universal syntax of bzip2 has the following form:

bzip2 Optional file(s)

For the standard com­pres­sion process it’s not necessary to specify options. This is only required if you want to change com­pres­sion settings, access the overview menu, or unpack a .bz2 file. For example, to pack the text document test.txt, you just need to complete the command

bzip2 test.txt

to delete the original file and replace it with the com­pressed file test.txt.bz2. By placing the documents together, you can also package multiple files with a single command:

bzip2 text.txt test2.txt test3.txt

If you want to de­com­press a packed document, it’s necessary – as mentioned earlier – to set the cor­res­pond­ing option para­met­ers (-d):

bzip2 –d test.txt

Here’s an overview of some other bzip2 command options:

Command De­scrip­tion
-1 … -9 Gives the com­pres­sion rate on a scale of 1 to 9, where 1 is the weakest rate and 9 is the strongest; Default value is 5
-f Starts the com­pres­sion, even if a .bz2 file of the same name already exists; in this case, the existing file is over­writ­ten
-c Writes the packed document to the standard output (usually the desktop)
-q Blocks all bzip2 messages
-v Shows ad­di­tion­al in­form­a­tion, like the com­pres­sion rate for all processed files
-t Checks the integrity of the selected file
-k If you add this parameter to a com­pres­sion command, the original file will remain
-h Opens the overview menu

Reasons for high tar demand

The archiving programme tar has been in operation for over 30 years and has hardly lost any of its value. Partially, this is because the tool allows data to be archived while retaining file defin­i­tions. Mainly, though, it’s because it allows for the packing of complete file dir­ect­or­ies. This makes tar the perfect partner of com­pres­sion­al tools like gzip and bzip2, which only allow for single file data com­pres­sion.

In the first step, the packing programme compiles all data files in a selected directory into a single archive file without unlinking any of the contained files. In the second step, the files are com­pressed using one of the specific com­pres­sion pro­grammes. As a result of this com­pres­sion, which is either described as pro­gress­ive, compact, or solid, the archive files are given extended formats, such as .tar.gz (.tgz for short) or .tar.bz2 (.tbz2 for short). The packing programme also allows for the sub­sequent unpacking of such files (e.g. file type .tar.gz).

Tar archive: How to (un)pack .tar.gz and Co. under Linux

The com­bin­a­tion of tar and a com­pres­sion tool isn’t required, so you can also combine files in an archive that you haven’t pre­vi­ously packaged or don’t want to compress. For example, if you want to bundle the un­com­pressed test documents test.txt and test2.txt in the same archive named archive.tar, the following command will suffice:

tar –cf archive.tar test.txt test2.txt

To unzip this archive on Linux, replace the –c (create new archive) parameter with –x (extract files from archive). If not only a certain archive component is to be unzipped, then the file(s) can be omitted:

tar –xf archiv.tar

Al­tern­at­ively, if you aim to pack a com­pressed archive – for example, on the basis of the gzip com­pres­sion, including the extended format­ting .tar.gz – then tar also offers cor­res­pond­ing options. Since the programme has im­ple­men­ted options for com­pres­sion and de­com­pres­sion with the bzip2, xz, compress, and gzip pack pro­grammes, this is also possible with a single command:

tar –czf archive.tar.gz test.txt test2.txt

 The command to unpack .tar.gz differs from the equi­val­ent for un­com­pressed dir­ect­or­ies only through the spe­cific­a­tion of the pack programme parameter:

tar –xzf archive.tar.gz
Tip

The parameter –f, which lets you select the re­spect­ive archive file, must always be in the last place – the following char­ac­ters are always in­ter­preted as a file.

The most important commands of the archiving ap­plic­a­tion

In addition to the pre­vi­ously listed command options for easy file archiving, there are several ad­di­tion­al para­met­ers to specify the pack or unpack process. These include the com­pres­sion methods already mentioned, options for setting up dir­ect­or­ies, as well as options for checking and pre­view­ing the tar archives:

Command De­scrip­tion
--help Access the tar menu
-c Create a new archive
-d Allows you to compare files in the archive and in the file system
-f Writes the selected files to an archive with the specified file name; Reads the data from the archive with the specified file name
-j Com­presses archives with bzip2 or unzips same archives
-J Com­presses archives with xz or unzips same archives
-k Prevents existing files from being over­writ­ten when they’re extracted from an archive
-p Ensures that access priv­ileges remain during ex­trac­tion
-r Adds files to a pre­vi­ously created archive
-t Displays the contents of the selected archive
-u Adds only those files to an archive that are younger than their archive version
-x Unzips files from an archive
-z Com­presses archive with gzip or unzips same archive
-Z Com­presses files with compress or unzips same archive
-A Im­ple­ments the contents of an archive into another archive
-C Changes to the specified directory to unzip the selected archive
-M Option to create, display, or extra a multi-part archive
-W Checks the archive after the archiving process
Tip

Some options, like adding files to an existing archive (-r), don’t work with com­pressed archives. These have to be unzipped first.

Examples:

Display the content of an archive

tar –tf archive.tar

Update contents of an archive (doesn’t include sub­dir­ect­or­ies!)

tar –uf archive.tar file(s)

Expand contents of an archive

tar –rf archive.tar New File

Compare contents of an archive with the file system (run in the archive directory!)

tar –dvf archive.tar

File Roller: the archive manager for GNOME

File Roller is a graphic user interface for various com­pres­sion tools and packing pro­grammes, which is standard for the operation of command lines. The archive manager is available for the GNOME and Unity desktop en­vir­on­ments, and has been dis­trib­uted under the GNU General Public License since 2001. It allows contents of various archive files to be viewed, as well as for files to be extracted, deleted, or added to them. It is also possible to create new com­pressed or unchanged files and archives, as well as to convert them to another format. For this purpose, the main window of the software offers various buttons and menus alongside a drag-and-drop function. In addition to tar archive formats like tar.gz, File Roller supports the following formats:

  • .7z
  • .tar
  • .gzip
  • .bzip2
  • .ar
  • .jar
  • .cpio

File Roller is pre­in­stalled on some Linux dis­tri­bu­tions, such as Ubuntu, by default, but can also be installed manually using the re­spect­ive package manager or from the official homepage. An al­tern­at­ive for the desktop en­vir­on­ment KDE is Ark.

Go to Main Menu