Working with Archives

Most Windows users are familiar with the graphical user interface for working with .zip files. The 7-zip application can work with other formats on Windows. Linux administrators typically work with archives from the command line.

Learning Objectives

You should be able to:

  • Create archives in the Linux terminal
  • Extract archives in the Linux terminal

Video Walkthrough

Use this video to follow along with the steps in this lab.

Tar

The tar program gets its name from "tape archive." Tar files are often used to distribute applications and data.

The tar file format combines multiple files but does not compress them. The tar format is combined with compression algorithms to take the single tar file and compress it. Files might be compressed using the gzip, bzip, or other algorithms. The compression algorithm is typically appended to the file extension, e.g., archive.tar.gz or archive.tar.bz.

Create Files for an Archive

The purpose of this section is just to create some data that will be archived.

  • Create a new folder in named compressme in your home directory.
cd ~
mkdir compressme
cd compressme
  • Create 3 files with .txt extensions. You can create these files in several ways. Use whichever method you want.
  • Here's one way:
touch one.txt
nano one.txt
  • (Edit the text in nano, save it with control+o, and exit with control+x.)
  • Here's another way:
echo "This goes in #2" >> two.txt
  • Here's another way:
vim three.txt

(Press i to enter insert mode. Add text. Press escape to enter normal mode, then :wq to save and exit.)

File Verification

Before going on, it is important to be able to check whether everything has worked the way you wanted it to work. Being able to verify your work is critical.

  • Ensure that you are in the ~/compressme directory using pwd.
  • Use ls to verify the three files you expect are present.
  • Use cat to verify the contents of the three files.

If anything is not as you expected, fix the issues before moving to the next steps where you will create an archive.

Create an Archive

The tar command can be used to work with archives. It has a lot of options

  • The following command was used to create the gzipped archive.
tar -zcvf myfiles.tar.gz *.txt
  • The tar arguments are explained below.
    • z tells tar to use the gzip compression algorithm.
    • c tells tar to create a new archive.
    • v tells tar to be "verbose." It will provide more status updates in the terminal while processing files.
    • f tells tar which archive filename to use. Because myfiles.tar.gz comes directly after the f, tar will use that filename for the archive.
    • *.txt tells tar to include every file in the current directory with a .txt extension.
  • The following command was used to create the archive using the bzip2 compression algorithm.
tar -jcvf myfiles.tar.bz2 *.txt
  • The Linux system you are using may not have the bzip2 compression library installed. If that's the case, you can install it with the following commands.
sudo apt update
sudo apt install bzip2
  • Then run the command again to compress the file using the bzip2 algorithm.
tar -jcvf myfiles.tar.bz2 *.txt
  • There are only 2 differences between this tar command using bzip2 and the previous one using gzip. First, the j option is used instead of z. Second, the archive's file extension is different. The extensions help us humans understand what the file contains--Linux does not care.
  • List the files. There should be five now.
ls

Your output should be similar to the following.

myfiles.tar.bz2 myfiles.tar.gz one.txt two.txt three.txt

Extract an Archive

  • To make sure the extraction works, delete the text files.
rm *.txt
  • Verify that the text files are gone.
ls
  • The text files should be gone. Only the two archives should remain.
  • Extract the .gz file with the following command.
tar -xvf myfiles.tar.gz
  • x tells the tar command to extract file files. v is still for verbose. And f tells tar which archive file to work with.
  • Use ls to verify that the files exist.
ls
  • The text files should be present.
  • Display the contents of the files to ensure that they were fully restored.
cat one.txt
cat two.txt
cat three.txt
  • Delete the files.
rm *.txt
  • Extract the files from the .bz2 archive.
tar -xvf myfiles.tar.bz2
  • Note that you did not have to specify the compression protocol. The tar command figured it out.
  • Verify that all files were extracted correctly using ls and cat.

Downloading and Extracting Archives

In this section, you will download and extract the source code for PuTTY.

wget https://the.earth.li/~sgtatham/putty/latest/putty-0.79.tar.gz
  • If the download did not work, check that you're using the URL from the website. There may be a newer version of PuTTY, and so the version in the URL would be different.
  • Use the file command to inspect the properties.
file putty-0.79.tar.gz

You should see information like the following.

putty-0.79.tar.gz: gzip compressed data, from Unix, original size modulo 2^32 10516480
  • The file command does not care about the file name's extension. Instead, the file command performs various tests on a file's contents to determine what kind of file it is: archive, web page, image, video, etc. Sometimes, you might accidentally download a web page instead of an archive, and tar might give you an error indicating that the archive is invalid. You might use file to check that the file you downloaded is an archive.
  • Extract the files.
tar -xvf putty-0.79.tar.gz
  • If tar could not find the file, double-check that the archive file name matches what you downloaded. You may need to modify the filename.
  • Run ls. You should see a new folder, named something like putty-0.78. This indicates that the operation worked. If it worked, you can delete the archive.
rm pu*.gz

Instead of writing the entire file name, a wildcard was used to match the archive file name.

  • Navigate to the new folder (using the name of the directory on your system, which may differ from the example command below).
cd putty-0.79
  • (Hint, type cd pu and then hit tab. This should autocomplete the correct directory name.)
  • View some of the files. For example, output the README file.
less README
  • Press q to quit the less program.
  • Browse other files using vim, nano, cat, less, and/or more.

Cleanup

Save any screenshots of your work. If you want to delete the entire compressme folder, run the following command.

cd ~
rm -rf compressme

Challenge

  • Create an archive of archives.
  • Explore the different compression algorithms available.
man tar
  • Find out what XKCD thinks about tar.

Reflection

  • Why would it be helpful to work with archives from the terminal instead of a graphical user interface?

Key Terms

  • Archive: A file that contains one or more files and directories, stored in a single file for easier management and transfer. Archives are often used for backup, distribution, and storage purposes. Common archive formats include .tar, .zip, and .rar.
  • tar: A Unix-based utility and file format used to create and manipulate archive files. The tar command stands for "tape archive" and is commonly used to bundle multiple files and directories into a single archive file, often with a .tar extension. It does not compress files by itself but is often used in combination with compression tools like gzip or bzip2.
  • Compression Algorithm: A method used to reduce the size of files by encoding data more efficiently. Compression algorithms can be lossless, preserving the original data exactly, or lossy, where some data is discarded for higher compression rates.
  • zip: A widely-used archive file format that supports lossless data compression. The zip format can contain multiple files and directories, and it compresses them to reduce storage space.
  • gzip: A file compression utility and format based on the DEFLATE algorithm. The gzip command is used to compress files, resulting in a .gz extension. It is commonly used in combination with tar to create compressed archive files with a .tar.gz extension.
  • bzip: A file compression utility and format that uses the Burrows-Wheeler algorithm for higher compression ratios compared to gzip. The bzip2 command is used to compress files, resulting in a .bz2 extension. It is also used with tar to create compressed archive files with a .tar.bz2 extension.