Linux命令行 - 中英文对照版

 主页   资讯   文章   代码 


All Unix-like operating systems rely heavily on text files for several types of data
storage. So it makes sense that there are many tools for manipulating text. In this
chapter, we will look at programs that are used to “slice and dice” text. In the next
chapter, we will look at more text processing, focusing on programs that are used to
format text for printing and other kinds of human consumption.

所有类 Unix 的操作系统都严重依赖于几种数据存储类型的文本文件。所以,

This chapter will revisit some old friends and introduce us to some new ones:


  • cat – Concatenate files and print on the standard output

  • cat – 连接文件并且打印到标准输出

  • sort – Sort lines of text files

  • sort – 给文本行排序

  • uniq – Report or omit repeated lines

  • uniq – 报告或者省略重复行

  • cut – Remove sections from each line of files

  • cut – 从每行中删除文本区域

  • paste – Merge lines of files

  • paste – 合并文件文本行

  • join – Join lines of two files on a common field

  • join – 基于某个共享字段来联合两个文件的文本行

  • comm – Compare two sorted files line by line

  • comm – 逐行比较两个有序的文件

  • diff – Compare files line by line

  • diff – 逐行比较文件

  • patch – Apply a diff file to an original

  • patch – 给原始文件打补丁

  • tr – Translate or delete characters

  • tr – 翻译或删除字符

  • sed – Stream editor for filtering and transforming text

  • sed – 用于筛选和转换文本的流编辑器

  • aspell – Interactive spell checker

  • aspell – 交互式拼写检查器


So far, we have learned a couple of text editors (nano and vim), looked a bunch of
configuration files, and have witnessed the output of dozens of commands, all in text.
But what else is text used for? For many things, it turns out.

到目前为止,我们已经知道了一对文本编辑器(nano 和 vim),看过一堆配置文件,并且目睹了
许多命令的输出都是文本格式。但是文本还被用来做什么? 它可以做很多事情。


Many people write documents using plain text formats. While it is easy to see how a
small text file could be useful for keeping simple notes, it is also possible to write large
documents in text format, as well. One popular approach is to write a large document in
a text format and then use a markup language to describe the formatting of the finished
document. Many scientific papers are written using this method, as Unix-based text
processing systems were among the first systems that supported the advanced
typographical layout needed by writers in technical disciplines.

因为基于 Unix 的文本处理系统位于支持技术学科作家所需要的高级排版布局的一流系统之列。


The world’s most popular type of electronic document is probably the web page. Web
pages are text documents that use either HTML (Hypertext Markup Language) or XML
(Extensible Markup Language) as markup languages to describe the document’s visual

世界上最流行的电子文档类型可能就是网页了。网页是文本文档,它们使用 HTML(超文本标记语言)或者是 XML


Email is an intrinsically text-based medium. Even non-text attachments are converted
into a text representation for transmission. We can see this for ourselves by downloading
an email message and then viewing it in less. We will see that the message begins with
a header that describes the source of the message and the processing it received during its
journey, followed by the body of the message with its content.

从本质上来说,email 是一个基于文本的媒介。为了传输,甚至非文本的附件也被转换成文本表示形式。
我们能看到这些,通过下载一个 email 信息,然后用 less 来浏览它。我们将会看到这条信息开始于一个标题,


On Unix-like systems, output destined for a printer is sent as plain text or, if the page
contains graphics, is converted into a text format page description language known as
PostScript, which is then sent to a program that generates the graphic dots to be printed.

在类 Unix 的系统中,输出会以纯文本格式发送到打印机,或者如果页面包含图形,其会被转换成
一种文本格式的页面描述语言,以 PostScript 著称,然后再被发送给一款能产生图形点阵的程序,


Many of the command line programs found on Unix-like systems were created to support
system administration and software development, and text processing programs are no
exception. Many of them are designed to solve software development problems. The
reason text processing is important to software developers is that all software starts out as
text. Source code, the part of the program the programmer actually writes, is always in
text format.

在类 Unix 系统中会发现许多命令行程序被用来支持系统管理和软件开发,并且文本处理程序也不例外。


Back in Chapter 7 (Redirection), we learned about some commands that are able to
accept standard input in addition to command line arguments. We only touched on them
briefly then, but now we will take a closer look at how they can be used to perform text



The cat program has a number of interesting options. Many of them are used to help
better visualize text content. One example is the -A option, which is used to display non-
printing characters in the text. There are times when we want to know if control
characters are embedded in our otherwise visible text. The most common of these are tab
characters (as opposed to spaces) and carriage returns, often present as end-of-line
characters in MS-DOS style text files. Another common situation is a file containing
lines of text with trailing spaces.

这个 cat 程序具有许多有趣的选项。其中许多选项用来帮助更好的可视化文本内容。一个例子是-A 选项,
最常用的控制字符是 tab 字符(而不是空格)和回车字符,在 MS-DOS 风格的文本文件中回车符经常作为

Let’s create a test file using cat as a primitive word processor. To do this, we’ll just
enter the command cat (along with specifying a file for redirected output) and type our
text, followed by Enter to properly end the line, then Ctrl-d, to indicate to cat that
we have reached end-of-file. In this example, we enter a leading tab character and follow
the line with some trailing spaces:

让我们创建一个测试文件,用 cat 程序作为一个简单的文字处理器。为此,我们将键入 cat 命令(随后指定了
用于重定向输出的文件),然后输入我们的文本,最后按下 Enter 键来结束这一行,然后按下组合键 Ctrl-d,
来指示 cat 程序,我们已经到达文件末尾了。在这个例子中,我们文本行的开头和末尾分别键入了一个 tab 字符以及一些空格。

[me@linuxbox ~]$ cat > foo.txt
    The quick brown fox jumped over the lazy dog.
[me@linuxbox ~]$

Next, we will use cat with the -A option to display the text:

下一步,我们将使用带有-A 选项的 cat 命令来显示这个文本:

[me@linuxbox ~]$ cat -A foo.txt
^IThe quick brown fox jumped over the lazy dog.       $
[me@linuxbox ~]$

As we can see in the results, the tab character in our text is represented by ^I. This is a
common notation that means “Control-I” which, as it turns out, is the same as a tab
character. We also see that a $ appears at the true end of the line, indicating that our text
contains trailing spaces.

在输出结果中我们看到,这个 tab 字符在我们的文本中由^I 字符来表示。这是一种常见的表示方法,意思是
“Control-I”,结果证明,它和 tab 字符是一样的。我们也看到一个$字符出现在文本行真正的结尾处,

MS-DOS Text Vs. Unix Text

MS-DOS 文本 Vs. Unix 文本

One of the reasons you may want to use cat to look for non-printing characters
in text is to spot hidden carriage returns. Where do hidden carriage returns come
from? DOS and Windows! Unix and DOS don’t define the end of a line the
same way in text files. Unix ends a line with a linefeed character (ASCII 10)
while MS-DOS and its derivatives use the sequence carriage return (ASCII 13)
and linefeed to terminate each line of text.

可能你想用 cat 程序在文本中查看非打印字符的一个原因是发现隐藏的回车符。那么
隐藏的回车符来自于哪里呢?它们来自于 DOS 和 Windows!Unix 和 DOS 在文本文件中定义每行
结束的方式不相同。Unix 通过一个换行符(ASCII 10)来结束一行,然而 MS-DOS 和它的
衍生品使用回车(ASCII 13)和换行字符序列来终止每个文本行。

There are a several ways to convert files from DOS to Unix format. On many
Linux systems, there are programs called dos2unix and unix2dos, which can
convert text files to and from DOS format. However, if you don’t have
dos2unix on your system, don’t worry. The process of converting text from
DOS to Unix format is very simple; it simply involves the removal of the
offending carriage returns. That is easily accomplished by a couple of the
programs discussed later in this chapter.

有几种方法能够把文件从 DOS 格式转变为 Unix 格式。在许多 Linux 系统中,有两个
程序叫做 dos2unix 和 unix2dos,它们能在两种格式之间转变文本文件。然而,如果你
的系统中没有安装 dos2unix 程序,也不要担心。文件从 DOS 格式转变为 Unix 格式的过程非常

cat also has options that are used to modify text. The two most prominent are -n,
which numbers lines, and -s, which suppresses the output of multiple blank lines. We
can demonstrate thusly:

cat 程序也包含用来修改文本的选项。最著名的两个选项是-n,其给文本行添加行号和-s,

[me@linuxbox ~]$ cat > foo.txt
The quick brown fox

jumped over the lazy dog.
[me@linuxbox ~]$ cat -ns foo.txt
1   The quick brown fox
3   jumped over the lazy dog.
[me@linuxbox ~]$

In this example, we create a new version of our foo.txt test file, which contains two
lines of text separated by two blank lines. After processing by cat with the -ns options,
the extra blank line is removed and the remaining lines are numbered. While this is not
much of a process to perform on text, it is a process.

在这个例子里,我们创建了一个测试文件 foo.txt 的新版本,其包含两行文本,由两个空白行分开。
经由带有-ns 选项的 cat 程序处理之后,多余的空白行被删除,并且对保留的文本行进行编号。


The sort program sorts the contents of standard input, or one or more files specified on
the command line, and sends the results to standard output. Using the same technique
that we used with cat, we can demonstrate processing of standard input directly from
the keyboard:

这个 sort 程序对标准输入的内容,或命令行中指定的一个或多个文件进行排序,然后把排序
结果发送到标准输出。使用与 cat 命令相同的技巧,我们能够演示如何用 sort 程序来处理标准输入:

[me@linuxbox ~]$ sort > foo.txt
[me@linuxbox ~]$ cat foo.txt

After entering the command, we type the letters “c”, “b”, and “a”, followed once again by
Ctrl-d to indicate end-of-file. We then view the resulting file and see that the lines
now appear in sorted order.

输入命令之后,我们键入字母“c”,“b”,和“a”,然后再按下 Ctrl-d 组合键来表示文件的结尾。

Since sort can accept multiple files on the command line as arguments, it is possible to
merge multiple files into a single sorted whole. For example, if we had three text files
and wanted to combine them into a single sorted file, we could do something like this:

因为 sort 程序能接受命令行中的多个文件作为参数,所以有可能把多个文件合并成一个有序的文件。例如,

sort file1.txt file2.txt file3.txt > final_sorted_list.txt

sort has several interesting options. Here is a partial list:

sort 程序有几个有趣的选项。这里只是一部分列表:

Table 21-1: Common sort Options
Option Long Option Description
-b --ignore-leading-blanks By default, sorting is performed on the entire line, starting with the first character in the line. This option causes sort to ignore leading spaces in lines and calculates sorting based on the first non-whitespace character on the line.
-f --ignore-case Makes sorting case insensitive.
-n --numeric-sort Performs sorting based on the numeric evaluation of a string. Using this option allows sorting to be performed on numeric values rather than alphabetic values.
-r --reverse Sort in reverse order. Results are in descending rather than ascending order.
-k --key=field1[,field2] Sort based on a key field located from field1 to field2 rather than the entire line. See discussion below.
-m --merge Treat each argument as the name of a presorted file. Merge multiple files into a single sorted result without performing any additional sorting.
-o --output=file Send sorted output to file rather than standard output.
-t --field-separator=char Define the field separator character. By default fields are separated by spaces or tabs.
表21-1: 常见的 sort 程序选项
选项 长选项 描述
-b --ignore-leading-blanks 默认情况下,对整行进行排序,从每行的第一个字符开始。这个选项导致 sort 程序忽略 每行开头的空格,从第一个非空白字符开始排序。
-f --ignore-case 让排序不区分大小写。
-n --numeric-sort 基于字符串的数值来排序。使用此选项允许根据数字值执行排序,而不是字母值。
-r --reverse 按相反顺序排序。结果按照降序排列,而不是升序。
-k --key=field1[,field2] 对从 field1到 field2之间的字符排序,而不是整个文本行。看下面的讨论。
-m --merge 把每个参数看作是一个预先排好序的文件。把多个文件合并成一个排好序的文件,而没有执行额外的排序。
-o --output=file 把排好序的输出结果发送到文件,而不是标准输出。
-t --field-separator=char 定义域分隔字符。默认情况下,域由空格或制表符分隔。

Although most of the options above are pretty self-explanatory, some are not. First, let’s
look at the -n option, used for numeric sorting. With this option, it is possible to sort
values based on numeric values. We can demonstrate this by sorting the results of the du
command to determine the largest users of disk space. Normally, the du command lists
the results of a summary in pathname order:

虽然以上大多数选项的含义是不言自喻的,但是有些也不是。首先,让我们看一下 -n 选项,被用做数值排序。
通过这个选项,有可能基于数值进行排序。我们通过对 du 命令的输出结果排序来说明这个选项,du 命令可以
确定最大的磁盘空间用户。通常,这个 du 命令列出的输出结果按照路径名来排序:

[me@linuxbox ~]$ du -s /usr/share/* | head
252     /usr/share/aclocal
96      /usr/share/acpi-support
8       /usr/share/adduser
196     /usr/share/alacarte
344     /usr/share/alsa
8       /usr/share/alsa-base
12488   /usr/share/anthy
8       /usr/share/apmd
21440   /usr/share/app-install
48      /usr/share/application-registry

In this example, we pipe the results into head to limit the results to the first ten lines.
We can produce a numerically sorted list to show the ten largest consumers of space this

在这个例子里面,我们把结果管道到 head 命令,把输出结果限制为前 10 行。我们能够产生一个按数值排序的
列表,来显示 10 个最大的空间消费者:

[me@linuxbox ~]$ du -s /usr/share/* | sort -nr | head
509940         /usr/share/locale-langpack
242660         /usr/share/doc
197560         /usr/share/fonts
179144         /usr/share/gnome
146764         /usr/share/myspell
144304         /usr/share/gimp
135880         /usr/share/dict
76508          /usr/share/icons
68072          /usr/share/apps
62844          /usr/share/foomatic

By using the -nr options, we produce a reverse numerical sort, with the largest values
appearing first in the results. This sort works because the numerical values occur at the
beginning of each line. But what if we want to sort a list based on some value found
within the line? For example, the results of an ls -l:

通过使用此 -nr 选项,我们产生了一个反向的数值排序,最大数值排列在第一位。这种排序起作用是
例如,命令 ls -l 的输出结果:

[me@linuxbox ~]$ ls -l /usr/bin | head
total 152948
-rwxr-xr-x 1 root   root     34824  2008-04-04  02:42 [
-rwxr-xr-x 1 root   root    101556  2007-11-27  06:08 a2p

Ignoring, for the moment, that ls can sort its results by size, we could use sort to sort
this list by file size, as well:

此刻,忽略 ls 程序能按照文件大小对输出结果进行排序,我们也能够使用 sort 程序来完成此任务:

[me@linuxbox ~]$ ls -l /usr/bin | sort -nr -k 5 | head
-rwxr-xr-x 1 root   root   8234216  2008-04-0717:42 inkscape
-rwxr-xr-x 1 root   root   8222692  2008-04-07 17:42 inkview

Many uses of sort involve the processing of tabular data, such as the results of the ls
command above. If we apply database terminology to the table above, we would say that
each row is a record and that each record consists of multiple fields, such as the file
attributes, link count, filename, file size and so on. sort is able to process individual
fields. In database terms, we are able to specify one or more key fields to use as sort keys.
In the example above, we specify the n and r options to perform a reverse numerical sort
and specify -k 5 to make sort use the fifth field as the key for sorting.

sort 程序的许多用法都涉及到处理表格数据,例如上面 ls 命令的输出结果。如果我们
例如文件属性,链接数,文件名,文件大小等等。sort 程序能够处理独立的字段。在数据库术语中,
n 和 r 选项来执行相反的数值排序,并且指定 -k 5,让 sort 程序使用第五字段作为排序的关键值。

The k option is very interesting and has many features, but first we need to talk about
how sort defines fields. Let’s consider a very simple text file consisting of a single line
containing the author’s name:

这个 k 选项非常有趣,而且还有很多特点,但是首先我们需要讲讲 sort 程序怎样来定义字段。

William      Shotts

By default, sort sees this line as having two fields. The first field contains the characters:

默认情况下,sort 程序把此行看作有两个字段。第一个字段包含字符:


and the second field contains the characters:


“ Shotts”

meaning that whitespace characters (spaces and tabs) are used as delimiters between
fields and that the delimiters are included in the field when sorting is performed.
Looking again at a line from our ls output, we can see that a line contains eight fields
and that the fifth field is the file size:

包含在字段当中。再看一下 ls 命令的输出,我们看到每行包含八个字段,并且第五个字段是文件大小:

-rwxr-xr-x 1 root root 8234216 2008-04-07 17:42 inkscape

For our next series of experiments, let’s consider the following file containing the history
of three popular Linux distributions released from 2006 to 2008. Each line in the file has
three fields: the distribution name, version number, and date of release in
MM/DD/YYYY format:

让我们考虑用下面的文件,其包含从 2006 年到 2008 年三款流行的 Linux 发行版的发行历史,来做一系列实验。
文件中的每一行都有三个字段:发行版的名称,版本号,和 MM/DD/YYYY 格式的发行日期:

SUSE        10.2   12/07/2006
Fedora          10     11/25/2008
SUSE            11.04  06/19/2008
Ubuntu          8.04   04/24/2008
Fedora          8      11/08/2007
SUSE            10.3   10/04/2007

Using a text editor (perhaps vim), we’ll enter this data and name the resulting file

使用一个文本编辑器(可能是 vim),我们将输入这些数据,并把产生的文件命名为 distros.txt。

Next, we’ll try sorting the file and observe the results:


[me@linuxbox ~]$ sort distros.txt
Fedora          10     11/25/2008
Fedora          5     03/20/2006
Fedora          6     10/24/2006
Fedora          7     05/31/2007
Fedora          8     11/08/2007

Well, it mostly worked. The problem occurs in the sorting of the Fedora version
numbers. Since a “1” comes before a “5” in the character set, version “10” ends up at the
top while version “9” falls to the bottom.

恩,大部分正确。问题出现在 Fedora 的版本号上。因为在字符集中 “1” 出现在 “5” 之前,版本号 “10” 在
最顶端,然而版本号 “9” 却掉到底端。

To fix this problem we are going to have to sort on multiple keys. We want to perform an
alphabetic sort on the first field and then a numeric sort on the third field. sort allows
multiple instances of the -k option so that multiple sort keys can be specified. In fact, a
key may include a range of fields. If no range is specified (as has been the case with our
previous examples), sort uses a key that begins with the specified field and extends to
the end of the line. Here is the syntax for our multi-key sort:

第三个字段执行数值排序。sort 程序允许多个 -k 选项的实例,所以可以指定多个排序关键值。事实上,
一个关键值可能包括一个字段区域。如果没有指定区域(如同之前的例子),sort 程序会使用一个键值,

[me@linuxbox ~]$ sort --key=1,1 --key=2n distros.txt
Fedora         5     03/20/2006
Fedora         6     10/24/2006
Fedora         7     05/31/2007

Though we used the long form of the option for clarity, -k 1,1 -k 2n would be
exactly equivalent. In the first instance of the key option, we specified a range of fields
to include in the first key. Since we wanted to limit the sort to just the first field, we
specified 1,1 which means “start at field one and end at field one.” In the second
instance, we specified 2n, which means that field two is the sort key and that the sort
should be numeric. An option letter may be included at the end of a key specifier to
indicate the type of sort to be performed. These option letters are the same as the global
options for the sort program: b (ignore leading blanks), n (numeric sort), r (reverse
sort), and so on.

虽然为了清晰,我们使用了选项的长格式,但是 -k 1,1 -k 2n 格式是等价的。在第一个 key 选项的实例中,
我们指定了一个字段区域。因为我们只想对第一个字段排序,我们指定了 1,1,
意味着“始于并且结束于第一个字段。”在第二个实例中,我们指定了 2n,意味着第二个字段是排序的键值,
选项字母和 sort 程序的全局选项一样:b(忽略开头的空格),n(数值排序),r(逆向排序),等等。

The third field in our list contains a date in an inconvenient format for sorting. On
computers, dates are usually formatted in YYYY-MM-DD order to make chronological
sorting easy, but ours are in the American format of MM/DD/YYYY. How can we sort
this list in chronological order?

我们列表中第三个字段包含的日期格式不利于排序。在计算机中,日期通常设置为 YYYY-MM-DD 格式,
这样使按时间顺序排序变得容易,但是我们的日期为美国格式 MM/DD/YYYY。那么我们怎样能按照

Fortunately, sort provides a way. The key option allows specification of offsets within
fields, so we can define keys within fields:

幸运地是,sort 程序提供了一种方式。这个 key 选项允许在字段中指定偏移量,所以我们能在字段中

[me@linuxbox ~]$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt
Fedora         10    11/25/2008
Ubuntu         8.10  10/30/2008
SUSE           11.0  06/19/2008

By specifying -k 3.7 we instruct sort to use a sort key that begins at the seventh
character within the third field, which corresponds to the start of the year. Likewise, we
specify -k 3.1 and -k 3.4 to isolate the month and day portions of the date. We also
add the n and r options to achieve a reverse numeric sort. The b option is included to
suppress the leading spaces (whose numbers vary from line to line, thereby affecting the
outcome of the sort) in the date field.

通过指定 -k 3.7,我们指示 sort 程序使用一个排序键值,其始于第三个字段中的第七个字符,对应于
年的开头。同样地,我们指定 -k 3.1和 -k 3.4来分离日期中的月和日。
我们也添加了 n 和 r 选项来实现一个逆向的数值排序。这个 b 选项用来删除日期字段中开头的空格(
行与行之间的空格数迥异,因此会影响 sort 程序的输出结果)。

Some files don’t use tabs and spaces as field delimiters; for example, the /etc/passwd

一些文件不会使用 tabs 和空格做为字段界定符;例如,这个 /etc/passwd 文件:

[me@linuxbox ~]$ head /etc/passwd

The fields in this file are delimited with colons (:), so how would we sort this file using a
key field? sort provides the -t option to define the field separator character. To sort
the passwd file on the seventh field (the account’s default shell), we could do this:

这个文件的字段之间通过冒号分隔开,所以我们怎样使用一个 key 字段来排序这个文件?sort 程序提供
了一个 -t 选项来定义分隔符。按照第七个字段(帐户的默认 shell)来排序此 passwd 文件,我们可以这样做:

[me@linuxbox ~]$ sort -t ':' -k 7 /etc/passwd | head
gdm:x:106:114:Gnome Display Manager:/var/lib/gdm:/bin/false
hplip:x:104:7:HPLIP system user,,,:/var/run/hplip:/bin/false
pulse:x:107:116:PulseAudio daemon,,,:/var/run/pulse:/bin/false

By specifying the colon character as the field separator, we can sort on the seventh field.



Compared to sort, the uniq program is a lightweight. uniq performs a seemingly
trivial task. When given a sorted file (including standard input), it removes any duplicate
lines and sends the results to standard output. It is often used in conjunction with sort
to clean the output of duplicates.

与 sort 程序相比,这个 uniq 程序是个轻量级程序。uniq 执行一个看似琐碎的行为。当给定一个
排好序的文件(包括标准输出),uniq 会删除任意重复行,并且把结果发送到标准输出。
它常常和 sort 程序一块使用,来清理重复的输出。

Tip: While uniq is a traditional Unix tool often used with sort, the
GNU version of sort supports a -u option, which removes duplicates from the
sorted output.

uniq 程序是一个传统的 Unix 工具,经常与 sort 程序一块使用,但是这个 GNU 版本的 sort 程序支持一个 -u 选项,其可以从排好序的输出结果中删除重复行。

Let’s make a text file to try this out:


[me@linuxbox ~]$ cat > foo.txt

Remember to type Ctrl-d to terminate standard input. Now, if we run uniq on our
text file:

记住输入 Ctrl-d 来终止标准输入。现在,如果我们对文本文件执行 uniq 命令:

[me@linuxbox ~]$ uniq foo.txt

the results are no different from our original file; the duplicates were not removed. For
uniq to actually do its job, the input must be sorted first:

输出结果与原始文件没有差异;重复行没有被删除。实际上,uniq 程序能完成任务,其输入必须是排好序的数据,

[me@linuxbox ~]$ sort foo.txt | uniq

This is because uniq only removes duplicate lines which are adjacent to each other.
uniq has several options. Here are the common ones:

这是因为 uniq 只会删除相邻的重复行。uniq 程序有几个选项。这里是一些常用选项:

Table 21-2: Common uniq Options
Option Description
-c Output a list of duplicate lines preceded by the number of times the line occurs.
-d Only output repeated lines, rather than unique lines.
-f n Ignore n leading fields in each line. Fields are separated by whitespace as they are in sort; however, unlike sort, uniq has no option for setting an alternate field separator.
-i Ignore case during the line comparisons.
-s n Skip (ignore) the leading n characters of each line.
-u Only output unique lines. This is the default.
表21-2: 常用的 uniq 选项
选项 说明
-c 输出所有的重复行,并且每行开头显示重复的次数。
-d 只输出重复行,而不是特有的文本行。
-f n 忽略每行开头的 n 个字段,字段之间由空格分隔,正如 sort 程序中的空格分隔符;然而, 不同于 sort 程序,uniq 没有选项来设置备用的字段分隔符。
-i 在比较文本行的时候忽略大小写。
-s n 跳过(忽略)每行开头的 n 个字符。
-u 只输出独有的文本行。这是默认的。

Here we see uniq used to report the number of duplicates found in our text file, using
the -c option:

这里我们看到 uniq 被用来报告文本文件中重复行的次数,使用这个-c 选项:

[me@linuxbox ~]$ sort foo.txt | uniq -c
        2 a
        2 b
        2 c


The next three programs we will discuss are used to peel columns of text out of files and
recombine them in useful ways.



The cut program is used to extract a section of text from a line and output the extracted
section to standard output. It can accept multiple file arguments or input from standard

这个 cut 程序被用来从文本行中抽取文本,并把其输出到标准输出。它能够接受多个文件参数或者

Specifying the section of the line to be extracted is somewhat awkward and is specified
using the following options:


Table 21-3: cut Selection Options
Option Description
-c char_list Extract the portion of the line defined by char_list. The list may consist of one or more comma-separated numerical ranges.
-f field_list Extract one or more fields from the line as defined by field_list. The list may contain one or more fields or field ranges separated by commas.
-d delim_char When -f is specified, use delim_char as the field delimiting character. By default, fields must be separated by a single tab character.
--complement Extract the entire line of text, except for those portions specified by -c and/or -f.
表21-3: cut 程序选择项
选项 说明
-c char_list 从文本行中抽取由 char_list 定义的文本。这个列表可能由一个或多个逗号 分隔开的数值区间组成。
-f field_list 从文本行中抽取一个或多个由 field_list 定义的字段。这个列表可能 包括一个或多个字段,或由逗号分隔开的字段区间。
-d delim_char 当指定-f 选项之后,使用 delim_char 做为字段分隔符。默认情况下, 字段之间必须由单个 tab 字符分隔开。
--complement 抽取整个文本行,除了那些由-c 和/或-f 选项指定的文本。

As we can see, the way cut extracts text is rather inflexible. cut is best used to extract
text from files that are produced by other programs, rather than text directly typed by
humans. We’ll take a look at our distros.txt file to see if it is “clean” enough to be
a good specimen for our cut examples. If we use cat with the -A option, we can see if
the file meets our requirements of tab separated fields:

正如我们所看到的,cut 程序抽取文本的方式相当不灵活。cut 命令最好用来从其它程序产生的文件中
抽取文本,而不是从人们直接输入的文本中抽取。我们将会看一下我们的 distros.txt 文件,看看
是否它足够 “整齐” 成为 cut 实例的一个好样本。如果我们使用带有 -A 选项的 cat 命令,我们能查看是否这个
文件符号由 tab 字符分离字段的要求。

[me@linuxbox ~]$ cat -A distros.txt

It looks good. No embedded spaces, just single tab characters between the fields. Since
the file uses tabs rather than spaces, we’ll use the -f option to extract a field:

看起来不错。字段之间仅仅是单个 tab 字符,没有嵌入空格。因为这个文件使用了 tab 而不是空格,
我们将使用 -f 选项来抽取一个字段:

[me@linuxbox ~]$ cut -f 3 distros.txt

Because our distros file is tab-delimited, it is best to use cut to extract fields rather
than characters. This is because when a file is tab-delimited, it is unlikely that each line
will contain the same number of characters, which makes calculating character positions
within the line difficult or impossible. In our example above, however, we now have
extracted a field that luckily contains data of identical length, so we can show how
character extraction works by extracting the year from each line:

因为我们的 distros 文件是由 tab 分隔开的,最好用 cut 来抽取字段而不是字符。这是因为一个由 tab 分离的文件,

[me@linuxbox ~]$ cut -f 3 distros.txt | cut -c 7-10

By running cut a second time on our list, we are able to extract character positions 7
through 10, which corresponds to the year in our date field. The 7-10 notation is an
example of a range. The cut man page contains a complete description of how ranges
can be specified.

通过对我们的列表再次运行 cut 命令,我们能够抽取从位置7到10的字符,其对应于日期字段的年份。
这个 7-10 表示法是一个区间的例子。cut 命令手册包含了一个如何指定区间的完整描述。

Expanding Tabs

展开 Tabs

Our distros.txt file is ideally formatted for extracting fields using cut. But
what if we wanted a file that could be fully manipulated with cut by characters,
rather than fields? This would require us to replace the tab characters within the
file with the corresponding number of spaces. Fortunately, the GNU Coreutils
package includes a tool for that. Named expand, this program accepts either
one or more file arguments or standard input, and outputs the modified text to
standard output.

distros.txt 的文件格式很适合使用 cut 程序来抽取字段。但是如果我们想要 cut 程序
代替 tab 字符。幸运地是,GNU 的 Coreutils 软件包有一个工具来解决这个问题。这个
程序名为 expand,它既可以接受一个或多个文件参数,也可以接受标准输入,并且把

If we process our distros.txt file with expand, we can use the cut -c to
extract any range of characters from the file. For example, we could use the
following command to extract the year of release from our list, by expanding the
file and using cut to extract every character from the twenty-third position to the
end of the line:

如果我们通过 expand 来处理 distros.txt 文件,我们能够使用 cut -c 命令来从文件中抽取
此文件,再使用 cut 命令,来抽取从位置 23 开始到行尾的每一个字符:

[me@linuxbox ~]$ expand distros.txt | cut -c 23-

Coreutils also provides the unexpand program to substitute tabs for spaces.

Coreutils 软件包也提供了 unexpand 程序,用 tab 来代替空格。

When working with fields, it is possible to specify a different field delimiter rather than
the tab character. Here we will extract the first field from the /etc/passwd file:

当操作字段的时候,有可能指定不同的字段分隔符,而不是 tab 字符。这里我们将会从/etc/passwd 文件中

[me@linuxbox ~]$ cut -d ':' -f 1 /etc/passwd | head

Using the -d option, we are able to specify the colon character as the field delimiter.

使用-d 选项,我们能够指定冒号做为字段分隔符。


The paste command does the opposite of cut. Rather than extracting a column of text
from a file, it adds one or more columns of text to a file. It does this by reading multiple
files and combining the fields found in each file into a single stream on standard output.
Like cut, paste accepts multiple file arguments and/or standard input. To demonstrate
how paste operates, we will perform some surgery on our distros.txt file to
produce a chronological list of releases.

这个 paste 命令的功能正好与 cut 相反。它会添加一个或多个文本列到文件中,而不是从文件中抽取文本列。
它通过读取多个文件,然后把每个文件中的字段整合成单个文本流,输入到标准输出。类似于 cut 命令,
paste 接受多个文件参数和 / 或标准输入。为了说明 paste 是怎样工作的,我们将会对 distros.txt 文件

From our earlier work with sort, we will first produce a list of distros sorted by date
and store the result in a file called distros-by-date.txt:

从我们之前使用 sort 的工作中,首先我们将产生一个按照日期排序的发行版列表,并把结果
存储在一个叫做 distros-by-date.txt 的文件中:

[me@linuxbox ~]$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt > distros-by-date.txt

Next, we will use cut to extract the first two fields from the file (the distro name and
version), and store that result in a file named distro-versions.txt:

下一步,我们将会使用 cut 命令从文件中抽取前两个字段(发行版名字和版本号),并把结果存储到
一个名为 distro-versions.txt 的文件中:

[me@linuxbox ~]$ cut -f 1,2 distros-by-date.txt > distros-versions.txt
[me@linuxbox ~]$ head distros-versions.txt
Fedora     10
Ubuntu     8.10
SUSE       11.0
Fedora     9
Ubuntu     8.04
Fedora     8
Ubuntu     7.10
SUSE       10.3
Fedora     7
Ubuntu     7.04

The final piece of preparation is to extract the release dates and store them a file named

最后的准备步骤是抽取发行日期,并把它们存储到一个名为 distro-dates.txt 文件中:

[me@linuxbox ~]$ cut -f 3 distros-by-date.txt > distros-dates.txt
[me@linuxbox ~]$ head distros-dates.txt

We now have the parts we need. To complete the process, use paste to put the column
of dates ahead of the distro names and versions, thus creating a chronological list. This is
done simply by using paste and ordering its arguments in the desired arrangement:

现在我们拥有了我们所需要的文本了。为了完成这个过程,使用 paste 命令来把日期列放到发行版名字
和版本号的前面,这样就创建了一个年代列表。通过使用 paste 命令,然后按照期望的顺序来安排它的

[me@linuxbox ~]$ paste distros-dates.txt distros-versions.txt
11/25/2008  Fedora     10
10/30/2008  Ubuntu     8.10
06/19/2008  SUSE       11.0
05/13/2008  Fedora     9
04/24/2008  Ubuntu     8.04
11/08/2007  Fedora     8
10/18/2007  Ubuntu     7.10
10/04/2007  SUSE       10.3
05/31/2007  Fedora     7
04/19/2007  Ubuntu     7.04


In some ways, join is like paste in that it adds columns to a file, but it uses a unique
way to do it. A join is an operation usually associated with relational databases where
data from multiple tables with a shared key field is combined to form a desired result.
The join program performs the same operation. It joins data from multiple files based
on a shared key field.

在某些方面,join 命令类似于 paste,它会往文件中添加列,但是它使用了独特的方法来完成。
一个 join 操作通常与关系型数据库有关联,在关系型数据库中来自多个享有共同关键域的表格的
数据结合起来,得到一个期望的结果。这个 join 程序执行相同的操作。它把来自于多个基于共享

To see how a join operation is used in a relational database, let’s imagine a very small
database consisting of two tables each containing a single record. The first table, called
CUSTOMERS, has three fields: a customer number (CUSTNUM), the customer’s first
name (FNAME) and the customer’s last name (LNAME):

为了知道在关系数据库中是怎样使用 join 操作的,让我们想象一个很小的数据库,这个数据库由两个
表格组成,每个表格包含一条记录。第一个表格,叫做 CUSTOMERS,有三个数据域:一个客户号(CUSTNUM),

========    =====       ======
4681934     John        Smith

The second table is called ORDERS and contains four fields: an order number
(ORDERNUM), the customer number (CUSTNUM), the quantity (QUAN), and the item
ordered (ITEM).

第二个表格叫做 ORDERS,其包含四个数据域:订单号(ORDERNUM),客户号(CUSTNUM),数量(QUAN),

========        =======     ==== ====
3014953305      4681934     1    Blue Widget

Note that both tables share the field CUSTNUM. This is important, as it allows a
relationship between the tables.

注意两个表格共享数据域 CUSTNUM。这很重要,因为它使表格之间建立了联系。

Performing a join operation would allow us to combine the fields in the two tables to
achieve a useful result, such as preparing an invoice. Using the matching values in the
CUSTNUM fields of both tables, a join operation could produce the following:

执行一个 join 操作将允许我们把两个表格中的数据域结合起来,得到一个有用的结果,例如准备
一张发货单。通过使用两个表格 CUSTNUM 数字域中匹配的数值,一个 join 操作会产生以下结果:

=====       =====       ==== ====
John        Smith       1    Blue Widget

To demonstrate the join program, we’ll need to make a couple of files with a shared
key. To do this, we will use our distros-by-date.txt file. From this file, we will
construct two additional files, one containing the release date (which will be our shared
key for this demonstration) and the release name:

为了说明 join 程序,我们需要创建一对包含共享键值的文件。为此,我们将使用我们的 distros.txt 文件。

[me@linuxbox ~]$ cut -f 1,1 distros-by-date.txt > distros-names.txt
[me@linuxbox ~]$ paste distros-dates.txt distros-names.txt > distros-key-names.txt
[me@linuxbox ~]$ head distros-key-names.txt
11/25/2008 Fedora
10/30/2008 Ubuntu
06/19/2008 SUSE
05/13/2008 Fedora
04/24/2008 Ubuntu
11/08/2007 Fedora
10/18/2007 Ubuntu
10/04/2007 SUSE
05/31/2007 Fedora
04/19/2007 Ubuntu

and the second file, which contains the release dates and the version numbers:


[me@linuxbox ~]$ cut -f 2,2 distros-by-date.txt > distros-vernums.txt
[me@linuxbox ~]$ paste distros-dates.txt distros-vernums.txt > distros-key-vernums.txt
[me@linuxbox ~]$ head distros-key-vernums.txt
11/25/2008 10
10/30/2008 8.10
06/19/2008 11.0
05/13/2008 9
04/24/2008 8.04
11/08/2007 8
10/18/2007 7.10
10/04/2007 10.3
05/31/2007 7
04/19/2007 7.04

We now have two files with a shared key (the “release date” field). It is important to
point out that the files must be sorted on the key field for join to work properly.

现在我们有两个具有共享键值( “发行日期” 数据域 )的文件。有必要指出,为了使 join 命令

[me@linuxbox ~]$ join distros-key-names.txt distros-key-vernums.txt | head
11/25/2008 Fedora 10
10/30/2008 Ubuntu 8.10
06/19/2008 SUSE 11.0
05/13/2008 Fedora 9
04/24/2008 Ubuntu 8.04
11/08/2007 Fedora 8
10/18/2007 Ubuntu 7.10
10/04/2007 SUSE 10.3
05/31/2007 Fedora 7
04/19/2007 Ubuntu 7.04

Note also that, by default, join uses whitespace as the input field delimiter and a single
space as the output field delimiter. This behavior can be modified by specifying options.
See the join man page for details.

也要注意,默认情况下,join 命令使用空白字符做为输入字段的界定符,一个空格作为输出字段
的界定符。这种行为可以通过指定的选项来修改。详细信息,参考 join 命令手册。


It is often useful to compare versions of text files. For system administrators and
software developers, this is particularly important. A system administrator may, for
example, need to compare an existing configuration file to a previous version to diagnose
a system problem. Likewise, a programmer frequently needs to see what changes have
been made to programs over time.



The comm program compares two text files and displays the lines that are unique to each
one and the lines they have in common. To demonstrate, we will create two nearly
identical text files using cat:

这个 comm 程序会比较两个文本文件,并且会显示每个文件特有的文本行和共有的文把行。
为了说明问题,通过使用 cat 命令,我们将会创建两个内容几乎相同的文本文件:

[me@linuxbox ~]$ cat > file1.txt
[me@linuxbox ~]$ cat > file2.txt

Next, we will compare the two files using comm:

下一步,我们将使用 comm 命令来比较这两个文件:

[me@linuxbox ~]$ comm file1.txt file2.txt

As we can see, comm produces three columns of output. The first column contains lines
unique to the first file argument; the second column, the lines unique to the second file
argument; the third column contains the lines shared by both files. comm supports
options in the form -n where n is either 1, 2 or 3. When used, these options specify
which column(s) to suppress. For example, if we only wanted to output the lines shared
by both files, we would suppress the output of columns one and two:

正如我们所见到的,comm 命令产生了三列输出。第一列包含第一个文件独有的文本行;第二列,
文本行是第二列独有的;第三列包含两个文件共有的文本行。comm 支持 -n 形式的选项,这里 n 代表
1,2 或 3。这些选项使用的时候,指定了要隐藏的列。例如,如果我们只想输出两个文件共享的文本行,

[me@linuxbox ~]$ comm -12 file1.txt file2.txt


Like the comm program, diff is used to detect the differences between files. However,
diff is a much more complex tool, supporting many output formats and the ability to
process large collections of text files at once. diff is often used by software developers
to examine changes between different versions of program source code, and thus has the
ability to recursively examine directories of source code often referred to as source trees.
One common use for diff is the creation of diff files or patches that are used by
programs such as patch (which we’ll discuss shortly) to convert one version of a file (or
files) to another version.

类似于 comm 程序,diff 程序被用来监测文件之间的差异。然而,diff 是一款更加复杂的工具,它支持
许多输出格式,并且一次能处理许多文本文件。软件开发员经常使用 diff 程序来检查不同程序源码
版本之间的更改,diff 能够递归地检查源码目录,经常称之为源码树。diff 程序的一个常见用例是
创建 diff 文件或者补丁,它会被其它程序使用,例如 patch 程序(我们一会儿讨论),来把文件

If we use diff to look at our previous example files:

如果我们使用 diff 程序,来查看我们之前的文件实例:

[me@linuxbox ~]$ diff file1.txt file2.txt
< a
> e

we see its default style of output: a terse description of the differences between the two
files. In the default format, each group of changes is preceded by a change command in
the form of range operation range to describe the positions and type of changes required
to convert the first file to the second file:

我们看到 diff 程序的默认输出风格:对两个文件之间差异的简短描述。在默认格式中,
每组的更改之前都是一个更改命令,其形式为 range operation range

Table 21-4: diff Change Commands
Change Description
r1ar2 Append the lines at the position r2 in the second file to the position r1 in the first file.
r1cr2 Change (replace) the lines at position r1 with the lines at the position r2 in the second file.
r1dr2 Delete the lines in the first file at position r1, which would have appeared at range r2 in the second file.
表21-4: diff 更改命令
改变 说明
r1ar2 把第二个文件中位置 r2 处的文件行添加到第一个文件中的 r1 处。
r1cr2 用第二个文件中位置 r2 处的文本行更改(替代)位置 r1 处的文本行。
r1dr2 删除第一个文件中位置 r1 处的文本行,这些文本行将会出现在第二个文件中位置 r2 处。

In this format, a range is a comma separated list of the starting line and the ending line.
While this format is the default (mostly for POSIX compliance and backward
compatibility with traditional Unix versions of diff), it is not as widely used as other,
optional formats. Two of the more popular formats are the context format and the unified

为了服从 POSIX 标准且向后与传统的 Unix diff 命令兼容),

When viewed using the context format (the -c option), we will see this:

当使用上下文模式(带上 -c 选项),我们将看到这些:

[me@linuxbox ~]$ diff -c file1.txt file2.txt
*** file1.txt    2008-12-23 06:40:13.000000000 -0500
--- file2.txt   2008-12-23 06:40:34.000000000 -0500
*** 1,4 ****
- a
--- 1,4 ----
  + e

The output begins with the names of the two files and their timestamps. The first file is
marked with asterisks and the second file is marked with dashes. Throughout the
remainder of the listing, these markers will signify their respective files. Next, we see
groups of changes, including the default number of surrounding context lines. In the first
group, we see:


*** 1,4 ***

which indicates lines one through four in the first file. Later we see:


--- 1,4 ---

which indicates lines one through four in the second file. Within a change group, lines
begin with one of four indicators:


Table 21-5: diff Context Format Change Indicators
Indicator Meaning
blank A line shown for context. It does not indicate a difference between the two files.
- A line deleted. This line will appear in the first file but not in the second file.
+ A line added. This line will appear in the second file but not in the first file.
! A line changed. The two versions of the line will be displayed, each in its respective section of the change group.
表21-5: diff 上下文模式更改指示符
指示符 意思
blank 上下文显示行。它并不表示两个文件之间的差异。
- 删除行。这一行将会出现在第一个文件中,而不是第二个文件内。
+ 添加行。这一行将会出现在第二个文件内,而不是第一个文件中。
! 更改行。将会显示某个文本行的两个版本,每个版本会出现在更改组的各自部分。

The unified format is similar to the context format, but is more concise. It is specified
with the -u option:

这个统一模式相似于上下文模式,但是更加简洁。通过 -u 选项来指定它:

[me@linuxbox ~]$ diff -u file1.txt file2.txt
--- file1.txt 2008-12-23 06:40:13.000000000 -0500
+++ file2.txt 2008-12-23 06:40:34.000000000 -0500
@@ -1,4 +1,4 @@

The most notable difference between the context and unified formats is the elimination of
the duplicated lines of context, making the results of the unified format shorter than the
context format. In our example above, we see file timestamps like those of the context
format, followed by the string @@ -1,4 +1,4 @@. This indicates the lines in the first
file and the lines in the second file described in the change group. Following this are the
lines themselves, with the default three lines of context. Each line starts with one of three
possible characters:
