2.5.2. Searching files#

Now that you know how you can inspect files in a directory and what the output of ls -l means, it’s time to learn how to search for specific files. There are a large number of files on a typical Unix system, so finding that one file you want can be hard.

Here we present to general strategies:

  • Searching for particular files by their name, extension, or some other properties.

  • Searching in files, to find files that contain some specific text.

We’ll discuss useful commands for each strategy separately.

2.5.2.1. Searching for particular files#

If you know what file or what kind of file you are looking for, there are various useful commands that you could use:

  • ls -R: using ls with the -R (for recursive) option will show you all files in all subdirectories of the directory you’re listing. This can quickly give a lot of information!

    Exercise 2.54

    Try this in the root directory:

    $ ls -R /
    

    What do you think ls -lR will do? Try it.

  • which: the output of which program is the full path to the program program. Use this if you want to find out which version of a certain program you’re actually using.

  • whereis: does more or less the same as which, but it only looks for programs in the standard places where Unix stores its programs. It will not find programs you made yourself. When it finds a program, it also tries to print the location of the man page.

    Exercise 2.55

    Use which and whereis to find out where the ls, which and whereis commands are located.

  • find: the Swiss army knife of search tools. This command has a lot of options. The most often used one is -name, for example:

    $ find / -name "*.jpg"
    

    This instructs find to look at all file and directory names in the root directory /, or any of its subdirectories (so it searches recursively through the directory tree, similar to ls -R), and reports only the names which end in .jpg. If you want to exclude the possibility that find reports on any directory name that ends with .jpg (which is uncommon, but possible), then you could explicitly specify to only report on files with the -type f argument.

    Exercise 2.56

    Find all files below /usr/share which end in .txt. Also find all files below /etc.

    Other useful options of find allow you to just select files newer or older than a certain date, files of certain types, sizes, files with certain permissions, files belonging to a certain user or group, etc. See man find for the complete overview. For example:

    $ find . -size +80k -ctime +100
    

    Recursively finds all files in the current directory larger than 80 kilobytes, and which are older than 100 days.

  • locate: finds a file regardless of where it is on the system. Instead of actually searching the disk, it consults a database that is updated daily. This is much faster, but won’t show recently added files.

  • file: shows information on file types. Just by the name, it’s usually very hard to judge what a file contains. It might be a program, some text, some program source code, a figure, an image, a sound etc. The file command tries to give an accurate description of the contents of a file.

    Exercise 2.57

    Try this in your home directory:

    $ file *
    

    You can use file to find out whether you can view a file on the screen or not. File types you can show on the screen are: ASCII text, C++ code files, Bourne shell script text, Python scripts, etc. File types you cannot view this easy are: data, directory, symbolic link to …, executable, etc.

2.5.2.2. Searching through files#

Besides looking for particular files, you may also want to find files containing particular pieces of text. In this case, the command to use is grep. This command takes as parameters a string to search for and one or more files:

$ grep "19" myfile.txt

The search string can be just a piece of text – here it’s 19. It’s a good idea to put it between quotes ("), otherwise the shell might interpret a space to mean it’s a new parameter.

However, you can also supply grep with a regular expression. This is an expression matching various strings according to a number of rules. Regular expressions use so-called meta characters:

  • "^string" matches string only when it is located at the start of a line.

  • "string$" matches string only when it is located at the end of the line.

  • "." matches any character.

  • "[abc]" matches an a, b or c.

  • "[0123456789]" or "[0-9]" matches any single digit number.

  • "[^abc]" matches anything but an a, b or c.

  • "[abc]?" matches zero or one occurrences of any of the characters a, b or c.

  • "[abc]*" matches zero or more occurrences of any of the characters a, b or c.

  • "[abc]+" matches one or more occurrences of any of the characters a, b or c.

  • "[abc]{n}" matches n occurrences of any of the characters a, b or c.

To search for any of the characters used as meta characters, you can put a \ in front. For example, to look for some text between square brackets, use grep "\[.*\]".

Exercise 2.58

Use grep to find the following expressions in /usr/share/common-licenses/GPL. In each case, try to predict what will be the output:

  • "[A-Z]"

  • " [A-Z] "

  • " [A-Z][a-z]*"

  • "U[a-z]* "

  • "U[a-z]\{3\} "

Note that in the last sequence, the { and }-signs are “escaped” by putting a \ in front of them; otherwise, the shell would interpret the { and } and just pass "U[a-z]3 " to grep. More about this later, in Section 2.7.4.

It is also possible to search in multiple files, simply by using a wildcard for the filename, such as

$ grep "42" *.txt

Exercise 2.59

Find sentences containing the word Yoyodyne in all licenses in /usr/share/common-licenses.

We can combine finding files with searching in them using a clever find option, -exec, which makes it execute a command for every file it finds. The command to execute is specified by additional arguments after -exec, up to a final \; argument which indicates that the command started by -exec has been completely specified. Within this command, find will replace each {} argument by the path it found.

For example, we can run a grep command separately on all files found by find like this:

$ find /bin/ -type f -exec grep -H "# Copyright" {} \;
/bin/zmore:# Copyright (C) 2001, 2002, 2007 Free Software Foundation
/bin/zmore:# Copyright (C) 1992, 1993 Jean-loup Gailly
/bin/zdiff:# Copyright (C) 1998, 2002, 2006-2007, 2009-2022 Free Software Foundation, Inc.
/bin/zdiff:# Copyright (C) 1993 Jean-loup Gailly

(This probably produces many more results than shown here)

So in this example, find /bin/ -type f will find in a list of file paths, such as /bin/zmore, /bin/zdiff, etcetera. As a result, find will execute grep -H "# Copyright" /bin/zmore, then grep -H "# Copyright" /bin/zdiff, etcetera (the -H option of grep will make grep also print the filename for every matched line).