Mastering the Shell and Advanced Command-line Techniques

What is a shell?

This document is about using your shell to its full potential. It's about the features of a shell that can save you a lot of time and effort, to the point of making things possible that would otherwise unthinkable. In a sense, this is almost like an introduction to shell programming, but instead of exhaustively covering all capabilities and syntax, we'll focus on becoming familiar with tools and techniques that we can use on the fly.

So what's a shell, anyway? You use one, even if you don't know what it is. If you're running Linux, chances are that your shell is Bash. So what is it exactly, and what does it do for you?

A shell is a program that serves as a command-line interface between the user and the computer. It's the program that runs inside of a terminal window.

Most new users don't realize that the shell is a standalone program. When they open up a terminal, they think that they're getting to the heart of the computer (confusing the shell with the kernel), but that isn't a very accurate view of what's happening. I'll give you a real-life example in order to demonstrate what the shell is and what it's doing.

What the shell is thinking

You're in an "Introduction to C" class, and you have a project due tomorrow. You have to write a program that prints out "Hello, world!" to your screen. So you open up a terminal, and start to work. Maybe your session looks something like this:

[amcnabb@iocaine amcnabb]$ mkdir mynewfolder
[amcnabb@iocaine amcnabb]$ cd mynewfolder/
[amcnabb@iocaine mynewfolder]$ vi hello.c
[amcnabb@iocaine mynewfolder]$ gcc hello.c -o hello
[amcnabb@iocaine mynewfolder]$ ./hello
Hello, world!
[amcnabb@iocaine mynewfolder]$

That should look familiar. If it were a Java class, then it would have been Hello.java instead of hello.c and javac instead of gcc, and if you have serious problems it may have been emacs instead of vi, but it's the same idea.

Now, you can't understand computers without a little healthy personification, so let's look at what Mr. Bash is thinking:

"Let's give the user a prompt by printing out '[amcnabb@iocaine amcnabb]$ '.
Okay, now they've typed 'mkdir mynewfolder.' Let's fork and execute the 'mkdir' program for them.
Very nice. Now let's print out another prompt.
Okay, now they've said, 'cd mynewfolder/'--that means we have to change
our current working directory to ./mynewfolder.
Very good. Now we'll print out another prompt.
Now they've said 'vi hello.c' so we have to run the 'vi' program and wait for them to write their program."
etc., etc.

Hopefully you get the idea. As you can see, the shell is processing all of the commands that the user is typing in.

Time Saver

So far, we showed what the shell is, and what it's doing, and it all looks pretty simple: we type stuff, and the shell goes and does it. If it's all so very simple, how am I going to save any time?

The secret is that shells have been designed to do more than just run commands that the users type in. In fact, your shell is actually a '''simple programming language'''. I don't think you understand how significant that concept is, so I'm going to make a new section break to get your attention.

The significance of the shell being a programming language

So, every time you type something in a terminal, you are unwittingly writing a computer program on the fly. Cool, eh? I thought so, too.

You can take a lot of features you've learned from other programming languages such as loops, variables, tests, etc. and use them to save you time in everyday command-line life. There are also a ton of cool features that you don't see much in other programming languages such as pipes, file redirection, interaction with system tools and more.

The hard way

So, let's look at a situation: you just wrote a cool program. Because your mom is so proud of you whenever you make an accomplishment you decide to email her a copy of your code, so she can see how cool it is for herself. You have four C files, and a header file, and you want to send each of them as an individual message. You decide to use the command-line mail program to send the files. In the following session, you run mail five times, once for each file:

[amcnabb@iocaine mycoolprogram]$ mail mom@yahoo.com
Subject: coolprogram.c
    (You copy and paste coolprogram.c here)
.
[amcnabb@iocaine mycoolprogram]$ mail mom@yahoo.com
Subject: coolprogram.h
    (You copy and paste coolprogram.c here)
.
[amcnabb@iocaine mycoolprogram]$ mail mom@yahoo.com
Subject: mygui.c
    (You copy and paste coolprogram.c here)
.
[amcnabb@iocaine mycoolprogram]$ mail mom@yahoo.com
Subject: mynetcode.c
    (You copy and paste coolprogram.c here)
.
[amcnabb@iocaine mycoolprogram]$ mail mom@yahoo.com
Subject: mydatabase.c
    (You copy and paste coolprogram.c here)
.
[amcnabb@iocaine mycoolprogram]$

That was pretty long and tedious. Mom had better like your code or you're going to be really mad with all of the work you put into it. But there's an easier way.

The easy way

The following session shows how you could have saved yourself a lot of work by efficiently programming on the fly (and it's not even difficult!):

[amcnabb@iocaine mycoolprogram]$ for i in coolprogram.c coolprogram.h mygui.c mynetcode.c mydatabase.c; do
> mail -s $i mom@yahoo.com <$i
> done
[amcnabb@iocaine mycoolprogram]$

Man, that was so much quicker! And in fact, if we were using tab-completion, then we did almost no typing at all.

Now, you may say, "My mom is computer illiterate. She doesn't want to look at my code. This is a stupid help document." But if you're any more computer literate than she is, then you can just imagine that there are countless situations where you could save yourself a ton of time and typing. In the rest of this document, we'll review a lot of cool shell tricks.

Keystroke Saving Tricks

Tab Completion

Tab completion is one of the coolest and simplest things out there. When you're typing a command name or filename on the command line, when you hit the tab key, the shell will look at what you've already typed and try to complete the name. Let's look at an example:

[amcnabb@iocaine amcnabb]$ vi myp

Now we hit the tab key and...:

[amcnabb@iocaine amcnabb]$ vi myprogramwhoselongfilenameusedtomakeitannoyingtoedit.c

I think that's pretty straightforward. The only other issue with tab completion is the situation where you have several files that match. Let's assume that after the previous example we create a new file called myprogram.h. Let's see what happens this time:

[amcnabb@iocaine amcnabb]$ vi myp

Then we hit tab:

[amcnabb@iocaine amcnabb]$ vi myprogram

At this point the shell has completed the name as far is it can because there are two filenames that match what we've given. Now when we hit the tab key two more times it gives us a handy little menu, showing us the names of all the files we might want:

[amcnabb@iocaine amcnabb]$ vi myprogram
myprogram.h
myprogramwhoselongfilenameusedtomakeitannoyingtoedit.c
[amcnabb@iocaine amcnabb]$ vi myprogram

So now we just type the next letter of the filename we want and then hit tab again:

[amcnabb@iocaine amcnabb]$ vi myprogram
myprogram.h
myprogramwhoselongfilenameusedtomakeitannoyingtoedit.c
[amcnabb@iocaine amcnabb]$ vi myprogramwhoselongfilenameusedtomakeitannoyingtoedit.c

Be aware that certain shells have additional tab-completion features when so-configured. Zsh will let you hit tab repeatedly to cycle through the menu of possible files. Bash and Zsh will both let you tab complete when using scp to copy files from a remote system (it will connect and use a directory listing to find possible matching files).

Globbing (Pathname Expansion)

Globbing is the shell's feature that lets you refer to "hello*" and get every file that begins with "hello". This saves the effort of typing each filename individually. Completion with "*" is only the beginning.

All normal shells support basic globbing, which includes *, ?, and []. Most shells also offer some form of extended globbing with additional features. Unfortunately, every shell has its own unique system of extended globbing. Read man pages for more information.

In basic globbing, the question mark (?) matches any character. So "ab?" matches "abc", "abd", "aba", etc., but it doesn't match "ab" or "abcd". The star (*) matches any character any number of times (zero or more). The pattern "ab*" matches "ab", "abcd", and "abthiswillbethelastexample". Note that "*" will not match a leading "." (for hidden files). However a ".*" will match all hidden files. Bash will match "." and ".." given ".*". This often leads to unpredictable results. Be very careful. In Bash, the pattern ".??*" will not match "." or "..", but it will match most hidden files. ZSH will never match any glob expression to "." or "..", which is much better behavior.

The final element of basic globbing is the character class, denoted by brackets ([]). The character class is like a question mark (?) except that instead of matching any character, it matches any character in a given set. If you want to match a file called "aax", "aay", or "aaz", the correct expression is "aa[xyz]" or "aa[x-z]". As a more complex example, the class [a-e0-9yz+-] will match any letter from "a" to "e", any digit, "y", "z", "+", or "-". There are also predefined character classes (POSIX character classes), defined as follows:

:[[:alnum:]]: Alphanumeric characters (numbers and letters): [A-Za-Z0-9] :[[:alpha:]]: Alphabetic characters (letters): [A-Za-z] :[[:ascii:]]: ASCII characters (all possible characters; same as ?) :[[:blank:]]: Space or tab (not newline, etc.) :[[:cntrl:]]: Control characters (if you don't know what it is, you don't want it) :[[:digit:]]: Numerical characters: [0-9] :[[:graph:]]: Graphic printable characters (ASCII 33 to 126); the same as [[:print:]] without the space character. :[[:lower:]]: Lowercase letters :[[:print:]]: <nowiki>Printable characters (ASCII 32 to 126); the same as [[:graph:]] but with the space character. :[[:punct:]]: Punctuation characters :[[:space:]]: Whitespace (space, tab, newline, etc.) :[[:upper:]]: Uppercase letters :[[:word:]]: Word character: [A-Za-z0-9_] :[[:xdigit:]]: Hexadecimal digits (very cool): [0-9A-Fa-f]

You can also negate character classes. The expression [^[:punct:]] will match any character that is not a punctuation character.

Aliases

An alias is a nickname for a command. You can create an alias on the fly, but most often people create aliases for commands they commonly use and store them in their .bashrc. See the Bash Configuration howto for information on how to do that. We'll give a few examples here.

If I use the Mutt mail reading program, and typing mutt every day is just too long and tedious for me, I could create an alias like this:

[amcnabb@iocaine amcnabb]$ alias m="mutt"
[amcnabb@iocaine amcnabb]$

So from now on, any time I type m it will start up my mail reader. Note once again that you must put this in your .bashrc if you want it to continue to work the next time you login.

The following is an alias that many, many people use. As you know ls -l gives you a detailed directory listing, and it is a command that is very commonly used:

[amcnabb@iocaine howto]$ alias ll="ls -l"
[amcnabb@iocaine howto]$ ll
total 36
drwxr-xr-x    2 amcnabb  admin        4096 Jul  3 11:35 CVS
-rw-r--r--    1 amcnabb  admin         351 Jun 26 10:25 audience.var
-rw-r--r--    1 amcnabb  admin          12 Jun 26 10:25 authors.var
-rw-r--r--    1 amcnabb  admin         382 Jun 26 10:25 descr.var
lrwxrwxrwx    1 amcnabb  admin          13 Jul 17 16:32 footer.inc -> ../footer.inc
lrwxrwxrwx    1 amcnabb  admin          13 Jul 17 16:32 header.inc -> ../header.inc
-rw-r--r--    1 amcnabb  admin       15999 Jun 26 10:25 src.html
-rw-r--r--    1 amcnabb  admin          39 Jul  3 11:35 title.var
[amcnabb@iocaine howto]$

And it never hurts to have a nice complicated example:

[amcnabb@iocaine amcnabb]$ alias trumpetps="ssh -t trumpet ps aux '|' grep"
[amcnabb@iocaine amcnabb]$ trumpetps bash
amcnabb@trumpet's password:
amcnabb  22026  0.0  0.2  2048  872 pts/0    S    10:30   0:00 bash -c ps aux |
amcnabb  22029  0.0  0.1  1504  444 pts/0    S    10:30   0:00 grep bash
Connection to trumpet closed.
[amcnabb@iocaine amcnabb]$

If you're going to do anything more complicated than that, you'll have to use functions or shell scripts. It's good stuff, but we won't cover it here. The man page will help you out.

Redirection, Pipes, and Backticks

About stdin, stdout, and stderr

Unix associates three special "files" with each program: stdin (Standard Input), stdout (Standard Output), and stderr (Standard Error). Normally these files are mapped to your terminal, so that when you type something, the program does a read on stdin, and when it writes output on stdout, you see it on your screen. Let's look at mail again, this time paying attention to stdin and stdout. First, here's what the session looks like:

[amcnabb@iocaine amcnabb]$ mail ihatewindows@microsoft.com
Subject: Windows is Stupid!!
Windows sure is stupid!!!

Have a nice day.
.
Cc: billgates@microsoft.com
[amcnabb@iocaine amcnabb]$

So what's mail thinking here? This time we'll look at that same session from mail's perspective, with a pseudo-log of writes and reads:

write-to-stdout: "Subject: "
read-from-stdin: "Windows is Stupid!!"
read-from-stdin: "Windows sure is stupid!!!"
read-from-stdin: ""
read-from-stdin: "Have a nice day."
read-from-stdin: "."
write-to-stdout: "Cc: "
read-from-stdin: "billgates@microsoft.com"

That should be really, really clear.

Introduction to Redirection

This where it starts to get cool. You can redirect Standard Input, Output, and Error. In other words, you can tell programs to use some other file for its input and output instead of your terminal. One example (of many) where you would want to do this is if you want to log something. We can redirect the output of ls thus saving our directory listing to a file. Here's how that works:

[amcnabb@iocaine amcnabb]$ ls /usr >usr-listing
[amcnabb@iocaine amcnabb]$ cat usr-listing
X11R6
bin
dict
doc
etc
games
i386-glibc21-linux
i486-linux-libc5
include
java
kerberos
lib
libexec
local
man
network
sbin
share
src
tmp
[amcnabb@iocaine amcnabb]$

Pretty cool, eh? Now here's an example of redirecting standard input. We'll use mail again:

[amcnabb@iocaine amcnabb]$ mail -s "My own personal hatemail to Bill Gates" billgates@microsoft.com <hatemailfile
[amcnabb@iocaine amcnabb]$

So, in that example we sent a message to Bill Gates with the subject "My own personal hatemail to Bill Gates" and we took the body of our message from the file hatemailfile. Before you know it, he'll have launched the competitionkeeper missiles (Dilbert reference).

Details of Redirection

There are various kinds of redirection, so we'll look at some examples here.

command >hello:Take the standard output of command and write it to file hello, writing over any existing file with that name.
command <hello:Use file hello as standard input for command
command 2>hello:
 Take the standard error of command and write it to file hello, writing over any existing file with that name.
command &>hello:
 Take both the standard output and standard error of command and write it to file hello, writing over any existing file with that name.
command >>hello:
 Take the standard output of command and write it to file hello, appending to the end of any existing file with that name.
command <<WORD:This one is especially useful from shell scripts. It gets standard input by reading up to the next time it sees WORD (which can be any string).

File redirects can get much more complicated, but that's beyond the scope of this document. If you need more detail the best thing to do is to check out the man page for your shell.

Introduction to Pipes

Pipes are where this really gets cool. A pipe is a connection between the output of one program and the input of another. Let's start out with an example. We'll send an email containing a listing of the /usr directory by taking the output of ls and using it as the input for mail:

[amcnabb@iocaine amcnabb]$ ls /usr |mail -s "Listing of /usr" myfriend@hotmail.com
[amcnabb@iocaine amcnabb]$

That's it. You just put a vertical bar in between the two commands and the rest happens automatically.

There's nothing stopping you from stringing 17 commands together with pipes. In fact, that's very common practice, and you can do some really cool tricks that way. A string of commands connected by pipes is known as a pipeline.

Before we move on, let's look at an example of a complex pipeline. Say we want to send our boss a list of the names of all of the users with accounts on our machine who use the shell /bin/bash. We could either go through the /etc/passwd file line by line, copying, and pasting the users' names to another file, which we then mail off to our boss, or we can do a simple pipeline:

[amcnabb@iocaine amcnabb]$ grep '/bin/bash$' /etc/passwd |cut -d':' -f5 |mail -s "Users who use /bin/bash" boss@mycompany.com
[amcnabb@iocaine amcnabb]$

The possibilities are endless, especially when you are familiar with all of the UNIX tools that are available. We'll discuss these tools in detail shortly.

Backticks

The backtick is the backwards apostrophe left of the 1 key. Backticks in shell programming are used with a similar purpose as pipes. In the same way that pipes let you take the output of one command as the '''input''' of another, backticks let you take the output of one command as an '''argument''' to another.

Let's do a quick but cool example with backticks. As I mentioned earlier, I prefer zsh to bash. Suppose one day I get in a bad mood, and I'm mad at bash. In fact, I'm furious. In fact, I absolutely hate anyone who uses bash. So I decide to take revenge (don't try this at home!!):

iocaine# rm -rf `grep '/bin/bash$' /etc/passwd |cut -d':' -f6`
iocaine#

Is that cool or what?? If you didn't follow that, I just completely deleted all of the files of all of the bash-users on my system. Good stuff! Anyway, we'll talk about grep and cut in the next section.

You should also be aware that in many shells $(ls) is equivalent to `ls`. Some people prefer one to the other, but they work the same.

Xargs

Xargs is in this part of the document instead of under UNIX Tools because it goes right along with backticks. It also demonstrates pipes and other wonderful principles.

You can look at Xargs' functionality as a sort of specialized form of backticks. However, it is especially different in that instead of being a part of the shell it is a standalone program. Xargs takes its input and uses it as arguments to some new program that it executes. Here's an example:

[amcnabb@iocaine amcnabb]$ ls *.java |grep -v retarded |xargs javac
[amcnabb@iocaine amcnabb]$

So, we just compiled any Java files that don't contain "retarded" in their filename (so hello.java will compile but thisisretarded.java won't). This is equivalent to the following example using backticks (which you did in the last section):

[amcnabb@iocaine amcnabb]$ javac `ls *.java |grep -v retarded`
[amcnabb@iocaine amcnabb]$

When would you want to use backticks, and when would you want to use Xargs? In most cases it's simply up to your taste, but in certain cases there is a significant difference. You should use Xargs if you're passing a large number of arguments, as it is very flexible. It will call several instances of your program so that there aren't too many any one time, and you can even set it up to call several instances in parallel (at the same time), which can really speed it up. However, in some cases the backticks are more flexible. It's good to know both of them.

Go ahead and man xargs for more detail.

UNIX Tools

Using UNIX Tools in Everyday Life

A large part of the UNIX philosophy is the concept of "tools." These are small programs that only do one thing, but they do it well. These tools can be used alone, but more frequently they are strung together as a pipeline, as we explained in the previous section. We'll look at some detailed example after we've introduced the tools.

Many people use a proprietary operating system that operates under a different philosphy: the bloated application philosophy, where you have one or two programs that try to do everything but fail miserably. Now, don't get me wrong. That other operating system has other problems, too. Anyway, the reason I bring up this difference in philosophies is to point out that some stupid programmer in Washington cannot predict every task that I will need to carry out on my system. In fact, I doubt they could do that even if they hired some smart ones! When you have an array of system tools available as under UNIX, you can piece together a quick and dirty solution to your own problem in seconds. Things that are absolutely impossible in that other operating system are incredibly easy with UNIX.

In this section we'll look at many (but certainly not all) of the system tools that are standard on UNIX and UNIX-like operating systems. You don't have to memorize all of the syntax for each program, but you should have an idea of what's out there, and then when you need to use one you can refer to its man page or this document for details.

Another good resource for these tools is the GNU Coreutils info page. Info is like man with links. Do "pinfo coreutils" from the command line, and you'll get a nice categorized list of utilities, with links to detailed documentation. If you'd like a web version, go here: GNU Core-utils Manual

Grep

Grep is one of the most commonly used command-line programs in UNIX. It is a search tool, allowing you to find text in a file or set of files. Historically, the name comes from a command in the Ed editor: "g/re/p", meaning Globally search for the Regular Expression and Print the lines that match. Good stuff.

Regular expressions are beyond the scope of this document, so here we'll just look at searching for simple strings. Be aware that Grep can do extremely complicated searches, and if you want to use them than you can look at it's man page or do a Google search for help on regular expressions.

Here's a quick example, and then we'll look at syntax and options. Say we wrote a Java class a few days ago, and now we just want to see a quick list of all of the public methods and variables in our class. Let's look at a really easy way to do that with grep:

[amcnabb@iocaine parser]$ grep public LexicalAnalyzer.java
public class LexicalAnalyzer {
    public static String KEY_SCHEMES = "Schemes";
    public static String KEY_FACTS = "Facts";
    public static String KEY_RULES = "Rules";
    public static String KEY_QUERIES = "Queries";
    public static Token getTokens(InputStreamReader inputstream) throws java.io.IOException {
[amcnabb@iocaine parser]$

So what it really did was to print out every line that contains the word "public." We could have given grep several files and it would have outputted all matches in all of them.

One of the other main situations where you want to use grep is when you want to know which file some string is in. For example, if you can't remember which file in the "/etc" directory is the config file that holds your hostname (in all of our examples so far we've been on the machine "iocaine") we can run the following command:

[amcnabb@iocaine amcnabb]$ grep -sR iocaine /etc
/etc/dhcpc/dhcpcd-eth0.info:HOSTNAME=iocaine
/etc/dhcpc/dhcpcd-eth0.info.old:HOSTNAME=iocaine
/etc/dhcpcd/dhcpcd-eth0.info:HOSTNAME=iocaine
/etc/dhcpcd/dhcpcd-eth0.info.old:HOSTNAME=iocaine
[amcnabb@iocaine amcnabb]$

So that command recursively searched through all files in all subdirectories in the /etc/directory and printed out all matches. We see that there were four files that matched, and each of them had one occurrence. Note that it prints out the filename, then a colon, then the matching line from the file.

Grep is one of the most commonly used tools in UNIX. It is incredibly simple but extremely useful. Here is an abbreviation of the syntax of grep. To see a full list of options run grep --help. Note that if you don't specify a file, it will search in standard input, which makes Grep one of the most useful tools in pipelines:

Usage: grep [OPTION]... PATTERN [FILE] ...
Search for PATTERN in each FILE or standard input.
Example: grep -i 'hello world' menu.h main.c

   -i, --ignore-case         ignore case distinctions
   -s, --no-messages         suppress error messages
   -v, --invert-match        select non-matching lines
   -R, -r, --recursive       equivalent to --directories=recurse
   -c, --count               only print a count of matching lines per FILE
   -l, --files-with-matches  only print FILE names containing matches

For more information run man grep. Later on in this document we'll look at some examples of Grep used in pipelines, which is very, very cool.

Cat

If Grep seems just a little too complicated for you, then you'll love Cat. Cat is short for "concatenate." It takes the files you give it and strings them together. If you only give it one file then it just prints out the file. Not too bad, eh? Here's part of the output from cat --help:

Usage: cat [OPTION] [FILE]...
Concatenate FILE(s), or standard input, to standard output.

   -n, --number             number all output lines
   -v, --show-nonprinting   use ^ and M- notation, except for LFD and TAB
       --help     display this help and exit

With no FILE, or when FILE is -, read standard input.

There are a bunch of other options, too. Anyway, Cat is a very important and useful, though simple, program.

Split and Csplit

Split is the complement of Cat. It takes a file and... splits it into several files. You can specify how many bytes or how many lines to put in each file. This will split up the file hello into files containing 20 lines each (they'll be named new-aa, new-ab, etc.): split -l 20 hello new-.

Csplit is like split, except that it splits based on the Context (hence the "C") instead of the size. It uses regular expressions (which we mentioned with grep) to tell where to split the file. You give it a pattern enclosed in slashes (like this: /pattern/), and when it finds that pattern it will create a new file. Let's do an example, and first I'll give you the contents of inputfile:

hello there
this will be in the first file
this line contains the word mypattern so it will be in the 2nd file.
hello again
hi
this line has mypattern again but it's still in the second file
because we didn't tell it to match mypattern multiple times.

So here's the command we run and it's output:

[amcnabb@iocaine amcnabb]$ csplit -f new- inputfile '/mypattern/'
43
209
[amcnabb@iocaine amcnabb]$

Csplit is informing us that it made one file that is 43 bytes long and another that is 209 bytes long (new-00 and new-01). Even though mypattern shows up several times in inputfile it only outputs to two files because we didn't tell it to repeat. This next example shows what happens if you want it to split every time it sees mypattern:

[amcnabb@iocaine amcnabb]$ csplit -f new- inputfile '/mypattern/' '{*}'
43
84
64
61
[amcnabb@iocaine amcnabb]$

You'll find more information in the man page, but that's the gist of it.

Head and Tail

Head and Tail are two programs which the first x or last x lines of a given file or standard input. If you don't specify a number, they default to 10. So, head -n 42 mycoolfile will print out the first 42 lines of mycoolfile. Also, tail myotherfile will print out the last 10 lines of myotherfile.

One great use of Tail is to print all but the first few lines. When a program prints out a row of column headers, you can strip it with tail -n +2, which prints starting with the second line of input. Analagously, head -n -4 prints all but the last four lines of input.

This is all pretty simple, so we're not going to give any other examples. But one other really cool feature is that you can do tail -f /var/log/messages, which will print out the last 10 lines of /var/log/messages, but instead of quitting, it will then constantly check and print out anything that added to the file. This can be really handy.

Sort

Guess what Sort does. Yep, it sorts. It takes a file or standard input and sorts the lines. You can configure the order (it has support for dictionary order, numeric order, month order, reverse order, etc. You can also specify which field to sort on. If you have several files you can have it sort them and merge them together.

We'll look at one example under a few conditions. We'll run w, which shows all currently logged in users and what they're doing, and we'll sort it by what they're doing instead of by when they logged in, which happens to be the eighth field. The -b is important because it strips out extra whitespace:

[amcnabb@iocaine amcnabb]$ w -h |sort -b -k8
amcnabb  :0       -                 9:34am   ?     0.00s  0.47s  gnome-session
root     pts/4    -                10:45am 18:46   0.03s  0.00s  man rm
amcnabb  pts/5    -                10:47am 27:28   0.07s  0.01s  mutt
amcnabb  pts/6    -                10:47am 27:00   0.07s  0.04s  vim hello.c
amcnabb  pts/0    :0.0              9:36am 37.00s  1.47s  1.41s  vim src.html
amcnabb  pts/7    -                10:49am  0.00s  0.18s  0.01s  w -h
root     pts/3    -                11:01am  5:52   0.06s  0.07s  xterm
[amcnabb@iocaine amcnabb]$

Now we'll sort by username and what they're doing. So, whenever the username is the same it will subsort by what they're doing:

[amcnabb@iocaine amcnabb]$ w -h |sort -b -k1,1 -k8,8
amcnabb  :0       -                 9:34am   ?     0.00s  0.47s  gnome-session
amcnabb  pts/5    -                10:47am 27:58   0.07s  0.01s  mutt
amcnabb  pts/0    :0.0              9:36am 11.00s  1.47s  1.41s  vim src.html
amcnabb  pts/6    -                10:47am 27:30   0.07s  0.04s  vim hello.c
amcnabb  pts/7    -                10:49am  0.00s  0.18s  0.01s  w -h
root     pts/4    -                10:45am 19:16   0.03s  0.00s  man rm
root     pts/3    -                11:01am  6:22   0.06s  0.07s  xterm
[amcnabb@iocaine amcnabb]$

Note: if we had done -k1 -k8 instead of -k1,1 -k8,8, we would have had a problem. It would have sorted from field 1 to the end of the line first, and then it wouldn't even worry about field 8.

So that's sort. It can be extremely useful, and once you get the hang of it it's pretty easy to use.

Wc

Wc is short for "Word Count." It counts the number of words in a file, and while it's at it, it counts the number of lines and the number of characters, too. By default it prints 4 columns: 1) number of lines, 2) number of words, 3) number of bytes, 4) filename. You can limit it to just part of that information with the -l, -w, and -c options. If you don't want the filename listed just do wc <filename. So here's an example:

[amcnabb@iocaine shell_tricks]$ wc *
wc: CVS: Is a directory
   0       0       0 CVS
   2      21     110 audience.var
   1       2      14 authors.var
   6      63     370 descr.var
  22      63     751 footer.inc
 280     505    5265 header.inc
 798    4782   31534 src.html
   1       7      45 title.var
1110    5443   38089 total
[amcnabb@iocaine shell_tricks]$

Find

Find is a flexible tool used for recursive file searches. It has the ability to perform advanced searches involving any file attributes. It can give customized output or even execute other UNIX commands. We'll look at a few examples here, but the man page (run man find) has much more information.

For our first example, we'll search for a file by name. Let's say we're looking for a file called outline.pdf, which we know is somewhere in our school files. Here's how we'll find it:

/users/admin/amcnabb> find school -name outline.pdf
school/2003fall/phys220/outline.pdf
/users/admin/amcnabb>

The following example has us searching in everything underneath the current directory for any file with "openssl" in the filename. Note that we have to escape the star: we want the shell to ignore it because we're passing it to find. Also note that it matches for both files and directories:

/usr/include> find . -name \*openssl\*
./kde/kopenssl.h
./openssl
./openssl/opensslconf.h
./openssl/opensslv.h
/usr/include>

Hopefully those examples were pretty clear. They were simple examples where we searched by the "name" attribute of files. Find allows searching with 34 different tests, and you can combine them in complex ways. In general if you want to do something fancy, you'll want to spend some time in the man page looking at the details. We'll give a few examples here to give an idea of how it works:

/users/admin/amcnabb> touch my-new-file
/users/admin/amcnabb> find . -type f -mmin -5
./cs/docproj/docs/shelladv/.src.html.swp
./cs/docproj/docs/shelladv/src.html
./my-new-file
/users/admin/amcnabb>

In this example we searched for all regular files that have been modified within the last five minutes. Note that when you string tests together, find ANDs them together. The following example will show some more complex operators. Note that certain symbols, like (, ), and ; have to be escaped from the shell:

/users/admin/amcnabb> find /usr/include -type f -size +17k \( -name '*net*' -o -name '*crypt*' \)
/usr/include/kde/netwm.h
/usr/include/openssl/crypto.h
/usr/include/netdb.h
/usr/include/libnet/libnet-functions.h
/usr/include/libnet/libnet-headers.h
/users/admin/amcnabb>

This example would read in English, "Find in /usr/include all regular files larger than 17 KB which contain either 'net' or 'crypt' in their filenames."

The last topic we'll cover is using the output from find. Find has a number of ways to process what it finds, and it calls them "actions." The default action is "-print", which just prints out the filenames, and we've seen examples of it a few times now. First we'll look at ways of using the print action to get things done, and then we'll look at some other actions.

One way to use find's output is to pipe the output of find to another command. For example, to make a tar file containing all of the files we found, we would do this: find cs/docproj -mtime +5 |tar cfT atleast5dayoldfiles.tar - where the "T -" option tells tar to get its list of files from standard input.

We can also send the output to xargs (which we discussed previously). For example, find . -name 'hello*' |xargs rm -f will delete all files under the current directory whose names start with "hello".

Find can also execute an arbitrary command on each file. This is similar to using xargs, except that sometimes you need to run a command once on each file and not all of them as a group. The action used for this is "-exec". You give the exec action a command to run, and where ever you say "{}" it will put the filename it found, and ";" signals the end of the command. In the following example, note that touch is creating the files we specify. Also note that "-maxdepth 1" tells find to only look at the first level, so it doesn't recurse.:

/users/admin/amcnabb> touch hello1 hello2 hello3 hello4 /users/admin/amcnabb> ls hello* hello1 hello2 hello3 hello4 /users/admin/amcnabb> find . -maxdepth 1 -name hello* -exec mv {} {}.old ; /users/admin/amcnabb> ls hello* hello1.old hello2.old hello3.old hello4.old /users/admin/amcnabb>

This is great, but if you want to do input redirection or a pipeline in your command it won't work, since exec just executes an individual command. If you want to do something more complex, you'll have to use the printf action and a fancy trick. Printf prints a formatted string. It substitutes information when it sees a % sign. In the following example, note that %p prints out the path to the file it found and n prints a newline character:

/users/admin/amcnabb> find . -maxdepth 1 -name hello\* -printf "mail -s subject amcnabb@cs.byu.edu <%p\n"
mail -s subject amcnabb@cs.byu.edu <./hello1
mail -s subject amcnabb@cs.byu.edu <./hello2
mail -s subject amcnabb@cs.byu.edu <./hello3
mail -s subject amcnabb@cs.byu.edu <./hello4
/users/admin/amcnabb>

To complete the trick, we'll tell bash to treat that output as a script and to run it:

/users/admin/amcnabb> find . -maxdepth 1 -name hello\* -printf "mail -s subject amcnabb@cs.byu.edu <%p\n" |bash -s

If you want to do more complex examples, have fun with the man page.

Cut, Paste, and Join

Cut, Paste, and Join let you work with fields. By default they expect fields to be tab-delimited, though you could change that. Here is a colon-delimited file (I'm giving it to you to help you get the feeling for what fields are all about):

Joe Schmoe:jschmoe:04/14/1973:801-333-2366
Fred Smith:fsmith:03/11/1966:801-622-3819
Frank Jones:jones3:11/23/1970:801-111-2893

We have four fields here which are delimited by colons. Cut, Paste, and Join will let us manipulate files like this. With them we can extract fields, insert fields, and join files.

Cut is generally more useful than the others, so we'll only give an example of it. To grab the username field of the above file, which is named "hi.txt", we need to note which field we need (field 2) and what the delimeter is (":"). Then we run cut:

amcnabb@iocaine:~/hi% cut -f2 -d: hi.txt
jschmoe
fsmith
jones3
amcnabb@iocaine:~/hi%

The default delimeter is the tab character, and you'll usually want to specify another one. If it's a space, you'll either escape it (-d\ ``) or put it in quotes (-d' '). You can also specify multiple fields and ranges of fields, like so: ``-f2,8,3,4-5,9- (print the 2nd, 8th, 3rd, 4th through 5th fields and from the 9th field to the end of the line).

One other problem you might have with cut is if you are working with a file where fields are separated by miscellaneous numbers of spaces--it could be a space, or five spaces, or three spaces, and it's different on every line. Cut always assumes that every time it sees a delimeter it's a new field, so in this case it will think there are lots of empty fields in between each of the spaces. The following output of w demonstrates this situation (the tail +2 is just to get rid of the uptime header):

amcnabb@iocaine:~/hi% w |tail -n +2
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
amcnabb  :0       -                11:42   ?xdm?   1:49   0.02s /bin/sh /users/admin/amcn
mikew    pts/842  li6-32.members.l 12:21    2:08   0.11s  0.05s ssh root@mail
lurch    pts/843  hemlock.cs.byu.e 12:22   31:03   0.16s  0.16s -zsh
lurch    pts/845  hemlock.cs.byu.e 12:22   30:46   0.17s  0.17s -zsh
amcnabb@iocaine:~/hi% w |cut -d' ' -f 3

:0



amcnabb@iocaine:~/hi%

That sure didn't give us the third field (FROM). One way to deal with this is to use Sed before Cut (you could also use Awk). This will look very strange because we're using regular expressions. All sed is doing is replacing any group of one or more spaces it finds with a single space:

amcnabb@iocaine:~/hi% w |tail -n +2 |sed 's/ \+/ /g'
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
amcnabb :0 - 11:42 ?xdm? 1:48 0.02s /bin/sh /users/
mikew pts/842 li6-32.members.l 12:21 22:22 0.11s 0.05s ssh root@mail
lurch pts/843 hemlock.cs.byu.e 12:22 23:16 0.16s 0.16s -zsh
lurch pts/845 hemlock.cs.byu.e 12:22 22:59 0.17s 0.17s -zsh
amcnabb@iocaine:~/hi% w |tail -n +2 |sed 's/ \+/ /g' |cut -d' ' -f3
FROM
-
li6-32.members.l
hemlock.cs.byu.e
hemlock.cs.byu.e
amcnabb@iocaine:~/hi%

Success!

Compound Commands: Loops, etc.

Introduction to Compound Commands

Most people would probably say that information about compound commands in shells would be much more appropriate in a shell programming document than in a text about using shells. If I couldn't use these techniques on-the-fly every day, I might as well be using Windows!! These principles are essential if you want to be efficient in your shell.

We'll cover some techniques that you're probably familiar with from other programming languages as well as others that you may never have heard of before. We'll go over expressions, lists, for loops, for-in (aka foreach) loops, and while loops. Shells also have other features that would fall under this category, such as if-statements, switch statements, function definitions, and other wonderful features, but we won't go over them because you're less likely to use them from the command line. Be aware that they exist, and if you're interested you can do some research on shell programming.

Over time, shells have developed more complex syntax that adds new features. An example of this is the C-style for loop. However, this really isn't needed. In UNIX there is a little program for everything, and these programs are powerful enough to make fancy syntax unnecessary. In general, when given a choice, we'll cover the older way of doing things because 1) it's cooler and 2) it's more portable to different shells.

For-in (aka foreach)

Do not confuse for-in with the for found in most other programming languages. The for we're talking about is parallel to the foreach found in Perl. In Zsh for and foreach are synonyms, but in Bash you can only do for, so weode> is set to Evan, the second time it's set to Byron, and the third time it's set to Clint. In shells you always dereference variables with a $ sign.

In our next example we'll use the output from ls Remember that backticks take the output of a command and use it inline:

[root@iocaine root]# for terminal in `ls /dev/pts/*`; do echo 'The Russians are coming!' >$terminal; done
The Russians are coming!
[root@iocaine root]#

So, this takes the output of ls /dev/pts/*, which is a listing of all open tty's, and it prints the message, "The Russians are coming!" In other words, it's broadcasting a message to all users who are currently logged in. Now, in this particular case we could also have done echo 'The Russians are coming!' |tee /dev/pts/* with the same effect, but this still serves as a good example of how for-in works.

By the way, you should be aware that you don't have to write the whole list of commands on one line. I'll give you one more example of a for-in, this time also demonstrating how the shell will let you continue your input on several lines:

[amcnabb@iocaine amcnabb]$ for i in `grep -l myglobalvar *.c`
> do
> echo "Mailing file $i..."
> mail -s "File $i, which refers to myglobalvar" mymom@hotmail.com <$i
> done
Mailing file blah.c...
Mailing file hello.c...
Mailing file ihateemacs.c...
[amcnabb@iocaine amcnabb]$

Recall that grep -l returns a list of matching files. This command emails all matching files to mymom@hotmail.com and outputs a handy little message as it's doing each one. As you can see, for-in is a very useful thing.

Seq (the shell-style equivalent of C-style for loops)

You can actually do the equivalent of C-style for loops in normal shell for loops by using a program called Seq. Technically, most shells have the feature of doing C-style for loops. The syntax is just a little bit different:

amcnabb@iocaine:~/hi% for (( i=0; i<5; i++ )); do echo $i; done
0
1
2
3
4
amcnabb@iocaine:~/hi%

However, the traditional way, using Seq, is much cooler and shorter. Here's an example:

amcnabb@iocaine:~/hi% for i in `seq 1 5`; do echo $i; done
1
2
3
4
5
amcnabb@iocaine:~/hi%

Subshells

Subshells let you add another layer of sophistication to pipelines. A subshell is just a shell process which is subservient to the main shell. So why use a subshell--why not just have the main shell do everything? The reason for using a subshell is that since it's a separate process, it has its own environment. Most importantly, it has its own working directory, and it has its own standard input and output.

One of the best uses of subshells is to build a pipe where the programs on the two ends are in different working directories. Here's an example involving Tar. Recall that Tar combines a set of files into a single archive. Tar files are very commonly used for distributing UNIX programs and source code. Tar can also create an archive to standard output and read an archive from standard input, which makes it very useful on the command line. Here's a way to use tar to copy a whole directory hierarchy from one location to another:

amcnabb@strychnine:~% (cd hello1; tar cf - .) | (cd hello2; tar xvf -)
./
./hi1
./dir/
./dir/hi2
amcnabb@strychnine:~%

While

A while loop in the shell works about the same as one in any other language. The syntax is "while [something]; do [something]; done". An infinite loop printing out "Hello" would look like this:

amcnabb@iocaine:~% while true; do echo "Hello"; done
Hello
Hello
Hello
Hello
Hello
...

The contents of the loop can be more complex. This one will only print "Hello" once a second:

amcnabb@iocaine:~% while true
> do
> echo "Hello"
> sleep 1;
> done
Hello
Hello
Hello
Hello

Note that it makes no difference whether you use a newline or a semicolon to separate statements. This last example could have been written on one line as while true; do echo "Hello"; sleep 1; done with the exact same effect.

More complex while loops will use the test command (or the equivalent <nowiki>[[ test-expression ]]</nowiki>). We'll look at one example and leave the rest up to the man pages:

amcnabb@iocaine:~% while test ! -f /tmp/aoeuaoeu; do sleep 1; done; echo The file exists now.
The file exists now.
amcnabb@iocaine:~%

In this example, we check once a second for the file /tmp/aoeuaoeu and hang until it exists. Once the file is created, we print, "The file exists now." Test lets you do complex conditional expressions on strings, integers, and files.

Where to find more information

If you want to learn more about shells and shell programming, there's a lot of information out there. Here are some places to start:

Man and Info pages:
 This is the best place to go with syntactical questions. Start up a terminal and run 'pinfo bash' or 'man bash' for information about Bash. To l-line utility programs, like cat, head, chmod, tee, sort, etc.
Shell Hacks:This sums up a lot of techniques that have been discussed here by giving examples of cool and/or useful shell hacks.
Bash Programming Introduction HOWTO:
 "This article intends to help you to start programming basic-intermediate shell scripts. It does not intend to be an advanced document (see the title)."
Advanced Bash Scripting Guide:
 "This tutorial assumes no previous knowledge of scripting or programming, but progresses rapidly toward an intermediate/advanced level of instruction ... all the while sneaking in little snippets of UNIX wisdom and lore. It serves as a textbook, a manual for self-study, and a reference and source of knowledge on shell scripting techniques. The exercises and heavily-commented examples invite active reader participation, under the premise that the only way to really learn scripting is to write scripts."