Searching For Text

Tools exist to search files for specific content. For example, you might look for the word "error" in log files. Or, you might try to find a specific customer's email address in a database dump. The grep command is a powerful tool for searching text. The grep command can also be used to filter the output of different Linux commands.

Learning Objectives

You should be able to:

Search file contents using grep
Look for content from command output using grep

Video Walkthrough

Use this video to follow along with the steps in this lab.

Use Cases

There are many instances when you might need to search for text. Below are a few common examples.

Troubleshooting: When something goes wrong, you might need to search log files for error messages.
Data Analysis: You might need to search a database dump for specific information.
Security: You might need to search for specific information in a file to ensure that sensitive information is not exposed. For example, you might need to search for social security numbers in a file to ensure that they are not stored in plain text.

Regular expressions are powerful tools for searching text. Regular expressions allow you to create flexible patterns to match text. Regular expressions are used in tools like Microsoft Excel, Google Sheets, Notepad++, Visual Studio Code, and many programming languages.

Load Files

Run the following command in the terminal to ensure you are in your home directory.

cd ~

Run the following command to delete the cyfunfiles directory (if it exists).

rm -rf cyfunfiles

Run the following command to download files from the internet. This will create a new folder called cyfunfiles in your home directory.

git clone https://github.com/jimmarq/cyfunfiles.git

Change directories to the cyfunfiles directory.

cd cyfunfiles

Change directories to the linux_finding subdirectory.

cd linux_finding

Look at the files in the folder with ls.

ls

Contents of the cyfunfiles directory

Searching for File Content

The grep command looks inside files for specific contents. The syntax is grep [options] [search pattern] [file(s) to search]. This syntax says, "Look for this pattern in this place."

Run the following command to look for the word celery in the file veggies.txt.

grep "celery" veggies.txt

If the output is not blank, it means the text was found.
Search for the word "strawberry" in veggies.txt.

grep strawberry veggies.txt

In this case, there is no output, meaning that the search did not return any results. Notice that no quotation marks were put around the word strawberry. This is because the word does not contain any spaces. If the word had spaces, the quotation marks would be required.

Search results for celery and strawberry

Run the following command to look for the word banana in all files in the current folder.

grep banana ./*

The "./" tells Linux to look in the current folder, and the asterisk tells Linux to match any file name. The grep command should find the word banana in fruits.txt.

Useful Options

The following are useful grep options.

-i - ignore case. By default, Linux is case-sensitive. This option tells Linux to ignore the case.

grep -i BaNaNa ./*

This would match the word "banana" no matter what case each character used.
Without the -i option, the search would only match "BaNaNa."
The following command will not match any content.

grep Banana ./*

The -r will search recursively. This option tells Linux to search all subdirectories. Example:

cd ~
grep -r banana .

Note that this search is case-sensitive. To make it case-insensitive, add the -i option.

grep -ri BANANA .

-n - show line numbers. This option tells Linux to show the line number where the search term was found. Example:

grep -rni banana .

Line numbers are helpful if you want to edit the file and need to know where to make the change.

Searching in Command Output

Some Linux commands produce a lot of output, and it can be time-consuming to scroll through the output to find what you are looking for. The grep command can be used to search the output of other commands.

Run the following command to list the contents of the /etc/passwd file.

cat /etc/passwd

There is a lot of content. But perhaps you only want to look at the line that contains a user named root, or a user named ubuntu. Run the following command to search for the those usernames in the output of the cat command.

cat /etc/passwd | grep root
cat /etc/passwd | grep ubuntu

In this case, the grep command is missing the file name. This is because the grep command is getting its input from the cat command. The | symbol is called a pipe. It tells Linux to take the output of the command on the left and use it as input for the command on the right.
List the files in the /dev/ directory. The /dev/ directory contains files that represent hardware devices.

ls /dev/

There are a lot of files in this directory. Perhaps you only want to see the files that represent hard drives. Run the following command to search for the character c in the output of the ls command.

ls /dev/ | grep c

Many Linux commands produce text output. Grep searches text. Therefore, grep can be paired with many Linux commands.

Regular Expressions

Regular expressions are search patterns. There is nothing "regular" about them. The name is basically a historical artifact. Regular expressions are used in many programming languages and tools, including Linux.

Here are some common elements of regular expressions:

. - match any character
* - match zero or more of the previous character
+ - match one or more of the previous character
? - match zero or one of the previous character
^ - match the beginning of a line
$ - match the end of a line
[] - match any character in the brackets
\ - escape character. This is used to escape special characters. For example, \. would match a literal period.

There are more regular expression elements, but these are common,

Ensure that your working directory is the linux_finding directory.

cd ~/cyfunfiles/linux_finding

The ,s[a-z]*@ regular expression searches for email addresses in the file that start with the letter s.

grep -E ,s[a-z]*@ customer_data.txt

The email address is always preceded by a comma in this particular file, so the search starts with a comma. Then, the letter s is used to indicate that an s should immediately follow. Brackets are used to define character sets. In this case, [a-z] says to match any character from a to z. The * means "zero or more of the previous character." The @ is a literal @ symbol. So in plain English, look for a comma followed by an s, followed by zero or more letters from a to z, followed by an @ symbol.
The following grep command looks for a single digit at the start of a line, followed by a comma.

grep ^[0-9], customer_data.txt

You could find multiple numbers followed by a comma by adding the *.

grep ^[0-9]*, customer_data.txt

The following grep command looks for any ID that ends with the number 5.

grep ^[0-9]*5, customer_data.txt

The following grep command looks for any line that ends with the letter "x."

grep x$ customer_data.txt

Here, the dollar sign means "end of line."

Challenge

Find entries in the /var/www/html/index.html for lines that contain the word "please."
- Find words that contain "Please" with a capital P in the same file.
- Look at the grep manual to find a way to perform a case-insensitive search.
How many employees work for the company "Tavu?" The file customer_data.txt in the linux_finding directory contains customer information.
Use grep to look for hardware devices that have "cd" in the name. (The /dev/ directory contains files that represent hardware devices.)
The following regular expression to look for a social security number is flawed. How can you fix it?

echo "123-45-3820" | grep [0-9]?-[0-9]?-[0-9]?

When would it be most useful to search by file name?
When would it be most useful to search for content within files?

Key Terms

grep: A command-line utility in Unix-like operating systems used to search for specific patterns within files. It stands for "Global Regular Expression Print" and allows users to filter and display lines in a file that match a given regular expression or string.
Regular Expressions: A sequence of characters that define a search pattern, often used for pattern matching within strings. Regular expressions are powerful tools for text processing, allowing complex search and replace operations. They are used in various programming languages and tools, including grep, sed, and text editors.