Working with Archives
Most Windows users are familiar with the graphical user interface for working with .zip files. The 7-zip application can work with other formats on Windows. Linux administrators typically work with archives from the command line.
Learning Objectives
You should be able to:
- Create archives in the Linux terminal
- Extract archives in the Linux terminal
Video Walkthrough
Use this video to follow along with the steps in this lab.
Tar
The tar program gets its name from "tape archive." Tar files are often used to distribute applications and data.
The tar file format combines multiple files but does not compress them. The tar format is combined with compression algorithms to take the single tar file and compress it. Files might be compressed using the gzip, bzip, or other algorithms. The compression algorithm is typically appended to the file extension, e.g., archive.tar.gz or archive.tar.bz.
Create Files for an Archive
The purpose of this section is just to create some data that will be archived.
- Create a new folder in named
compressmein your home directory.
cd ~
mkdir compressme
cd compressme
- Create 3 files with .txt extensions. You can create these files in several ways. Use whichever method you want.
- Here's one way:
touch one.txt
nano one.txt
- (Edit the text in nano, save it with control+o, and exit with control+x.)
- Here's another way:
echo "This goes in #2" >> two.txt
- Here's another way:
vim three.txt
(Press i to enter insert mode. Add text. Press escape to enter normal mode, then :wq to save and exit.)
File Verification
Before going on, it is important to be able to check whether everything has worked the way you wanted it to work. Being able to verify your work is critical.
- Ensure that you are in the
~/compressmedirectory usingpwd. - Use
lsto verify the three files you expect are present. - Use
catto verify the contents of the three files.
If anything is not as you expected, fix the issues before moving to the next steps where you will create an archive.
Create an Archive
The tar command can be used to work with archives. It has a lot of options
- The following command was used to create the gzipped archive.
tar -zcvf myfiles.tar.gz *.txt
- The
tararguments are explained below.ztellstarto use the gzip compression algorithm.ctellstarto create a new archive.vtellstarto be "verbose." It will provide more status updates in the terminal while processing files.ftellstarwhich archive filename to use. Becausemyfiles.tar.gzcomes directly after thef,tarwill use that filename for the archive.*.txttellstarto include every file in the current directory with a .txt extension.
- The following command was used to create the archive using the bzip2 compression algorithm.
tar -jcvf myfiles.tar.bz2 *.txt
- The Linux system you are using may not have the
bzip2compression library installed. If that's the case, you can install it with the following commands.
sudo apt update
sudo apt install bzip2
- Then run the command again to compress the file using the
bzip2algorithm.
tar -jcvf myfiles.tar.bz2 *.txt
- There are only 2 differences between this
tarcommand usingbzip2and the previous one usinggzip. First, thejoption is used instead ofz. Second, the archive's file extension is different. The extensions help us humans understand what the file contains--Linux does not care. - List the files. There should be five now.
ls
Your output should be similar to the following.
myfiles.tar.bz2 myfiles.tar.gz one.txt two.txt three.txt
Extract an Archive
- To make sure the extraction works, delete the text files.
rm *.txt
- Verify that the text files are gone.
ls
- The text files should be gone. Only the two archives should remain.
- Extract the .gz file with the following command.
tar -xvf myfiles.tar.gz
xtells thetarcommand to extract file files.vis still for verbose. Andftellstarwhich archive file to work with.- Use
lsto verify that the files exist.
ls
- The text files should be present.
- Display the contents of the files to ensure that they were fully restored.
cat one.txt
cat two.txt
cat three.txt
- Delete the files.
rm *.txt
- Extract the files from the .bz2 archive.
tar -xvf myfiles.tar.bz2
- Note that you did not have to specify the compression protocol. The
tarcommand figured it out. - Verify that all files were extracted correctly using
lsandcat.
Downloading and Extracting Archives
In this section, you will download and extract the source code for PuTTY.
- Search the internet for
putty download. You should be taken to a URL like https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html. - Find the download link to the .tar.gz source archive.
- Copy the download link to the tar.tz file. It should be something like https://the.earth.li/~sgtatham/putty/latest/putty-0.78.tar.gz, but the version number will likely be different.
- Run the following command to download the file. It's about 3 megabytes, so the download should be quick.
wget https://the.earth.li/~sgtatham/putty/latest/putty-0.79.tar.gz
- If the download did not work, check that you're using the URL from the website. There may be a newer version of PuTTY, and so the version in the URL would be different.
- Use the
filecommand to inspect the properties.
file putty-0.79.tar.gz
You should see information like the following.
putty-0.79.tar.gz: gzip compressed data, from Unix, original size modulo 2^32 10516480
- The
filecommand does not care about the file name's extension. Instead, thefilecommand performs various tests on a file's contents to determine what kind of file it is: archive, web page, image, video, etc. Sometimes, you might accidentally download a web page instead of an archive, andtarmight give you an error indicating that the archive is invalid. You might usefileto check that the file you downloaded is an archive. - Extract the files.
tar -xvf putty-0.79.tar.gz
- If
tarcould not find the file, double-check that the archive file name matches what you downloaded. You may need to modify the filename. - Run
ls. You should see a new folder, named something likeputty-0.78. This indicates that the operation worked. If it worked, you can delete the archive.
rm pu*.gz
Instead of writing the entire file name, a wildcard was used to match the archive file name.
- Navigate to the new folder (using the name of the directory on your system, which may differ from the example command below).
cd putty-0.79
- (Hint, type
cd puand then hit tab. This should autocomplete the correct directory name.) - View some of the files. For example, output the README file.
less README
- Press
qto quit thelessprogram. - Browse other files using
vim,nano,cat,less, and/ormore.
Cleanup
Save any screenshots of your work. If you want to delete the entire compressme folder, run the following command.
cd ~
rm -rf compressme
Challenge
- Create an archive of archives.
- Explore the different compression algorithms available.
man tar
- Find out what XKCD thinks about
tar.
Reflection
- Why would it be helpful to work with archives from the terminal instead of a graphical user interface?
Key Terms
- Archive: A file that contains one or more files and directories, stored in a single file for easier management and transfer. Archives are often used for backup, distribution, and storage purposes. Common archive formats include
.tar,.zip, and.rar. - tar: A Unix-based utility and file format used to create and manipulate archive files. The
tarcommand stands for "tape archive" and is commonly used to bundle multiple files and directories into a single archive file, often with a.tarextension. It does not compress files by itself but is often used in combination with compression tools likegziporbzip2. - Compression Algorithm: A method used to reduce the size of files by encoding data more efficiently. Compression algorithms can be lossless, preserving the original data exactly, or lossy, where some data is discarded for higher compression rates.
- zip: A widely-used archive file format that supports lossless data compression. The
zipformat can contain multiple files and directories, and it compresses them to reduce storage space. - gzip: A file compression utility and format based on the DEFLATE algorithm. The
gzipcommand is used to compress files, resulting in a.gzextension. It is commonly used in combination withtarto create compressed archive files with a.tar.gzextension. - bzip: A file compression utility and format that uses the Burrows-Wheeler algorithm for higher compression ratios compared to
gzip. Thebzip2command is used to compress files, resulting in a.bz2extension. It is also used withtarto create compressed archive files with a.tar.bz2extension.