www.delorie.com/gnu/docs/textutils/coreutils_28.html   search  
 
Buy GNU books!


GNU Core-utils

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.1 sort: Sort text files

sort sorts, merges, or compares all the lines from the given files, or standard input if none are given or for a file of `-'. By default, sort writes the results to standard output. Synopsis:

 
sort [option]... [file]...

sort has three modes of operation: sort (the default), merge, and check for sortedness. The following options change the operation mode:

`-c'
`--check'
Check whether the given files are already sorted: if they are not all sorted, print an error message and exit with a status of 1. Otherwise, exit successfully.

`-m'
`--merge'
Merge the given files by sorting them as a group. Each input file must always be individually sorted. It always works to sort instead of merge; merging is provided because it is faster, in the case where it works.

A pair of lines is compared as follows: if any key fields have been specified, sort compares each pair of fields, in the order specified on the command line, according to the associated ordering options, until a difference is found or no fields are left. Unless otherwise specified, all comparisons use the character collating sequence specified by the LC_COLLATE locale. (1)

If any of the global options `bdfgiMnr' are given but no key fields are specified, sort compares the entire lines according to the global options.

Finally, as a last resort when all keys compare equal (or if no ordering options were specified at all), sort compares the entire lines. The last resort comparison honors the `--reverse' (`-r') global option. The `--stable' (`-s') option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order. If no fields or global options are specified, `--stable' (`-s') has no effect.

GNU sort (as specified for all GNU utilities) has no limits on input line length or restrictions on bytes allowed within lines. In addition, if the final byte of an input file is not a newline, GNU sort silently supplies one. A line's trailing newline is not part of the line for comparison purposes.

Upon any error, sort exits with a status of `2'.

If the environment variable TMPDIR is set, sort uses its value as the directory for temporary files instead of `/tmp'. The `--temporary-directory' (`-T') option in turn overrides the environment variable.

The following options affect the ordering of output lines. They may be specified globally or as part of a specific key field. If no key fields are specified, global options apply to comparison of entire lines; otherwise the global options are inherited by key fields that do not specify any special options of their own. In pre-POSIX versions of sort, global options affect only later key fields, so portable shell scripts should specify global options first.

`-b'
`--ignore-leading-blanks'
Ignore leading blanks when finding sort keys in each line. The LC_CTYPE locale determines character types.

`-d'
`--dictionary-order'
Sort in phone directory order: ignore all characters except letters, digits and blanks when sorting. The LC_CTYPE locale determines character types.

`-f'
`--ignore-case'
Fold lowercase characters into the equivalent uppercase characters when comparing so that, for example, `b' and `B' sort as equal. The LC_CTYPE locale determines character types.

`-g'
`--general-numeric-sort'
Sort numerically, using the standard C function strtod to convert a prefix of each line to a double-precision floating point number. This allows floating point numbers to be specified in scientific notation, like 1.0e-34 and 10e100. The LC_NUMERIC locale determines the decimal-point character. Do not report overflow, underflow, or conversion errors. Use the following collating sequence:

Use this option only if there is no alternative; it is much slower than `--numeric-sort' (`-n') and it can lose information when converting to floating point.

`-i'
`--ignore-nonprinting'
Ignore nonprinting characters. The LC_CTYPE locale determines character types.

`-M'
`--month-sort'
An initial string, consisting of any amount of whitespace, followed by a month name abbreviation, is folded to UPPER case and compared in the order `JAN' < `FEB' < ... < `DEC'. Invalid names compare low to valid names. The LC_TIME locale category determines the month spellings.

`-n'
`--numeric-sort'
Sort numerically: the number begins each line; specifically, it consists of optional whitespace, an optional `-' sign, and zero or more digits possibly separated by thousands separators, optionally followed by a decimal-point character and zero or more digits. The LC_NUMERIC locale specifies the decimal-point character and thousands separator.

Numeric sort uses what might be considered an unconventional method to compare strings representing floating point numbers. Rather than first converting each string to the C double type and then comparing those values, sort aligns the decimal-point characters in the two strings and compares the strings a character at a time. One benefit of using this approach is its speed. In practice this is much more efficient than performing the two corresponding string-to-double (or even string-to-integer) conversions and then comparing doubles. In addition, there is no corresponding loss of precision. Converting each string to double before comparison would limit precision to about 16 digits on most systems.

Neither a leading `+' nor exponential notation is recognized. To compare such strings numerically, use the `--general-numeric-sort' (`-g') option.

`-r'
`--reverse'
Reverse the result of comparison, so that lines with greater key values appear earlier in the output instead of later.

Other options are:

`-o output-file'
`--output=output-file'
Write output to output-file instead of standard output. If necessary, sort reads input before opening output-file, so you can safely sort a file in place by using commands like sort -o F F and cat F | sort -o F.

On newer systems, `-o' cannot appear after an input file if POSIXLY_CORRECT is set, e.g., `sort F -o F'. Portable scripts should specify `-o output-file' before any input files.

`-S size'
`--buffer-size=size'
Use a main-memory sort buffer of the given size. By default, size is in units of 1024 bytes. Appending `%' causes size to be interpreted as a percentage of physical memory. Appending `K' multiplies size by 1024 (the default), `M' by 1,048,576, `G' by 1,073,741,824, and so on for `T', `P', `E', `Z', and `Y'. Appending `b' causes size to be interpreted as a byte count, with no multiplication.

This option can improve the performance of sort by causing it to start with a larger or smaller sort buffer than the default. However, this option affects only the initial buffer size. The buffer grows beyond size if sort encounters input lines larger than size.

`-t separator'
`--field-separator=separator'
Use character separator as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-whitespace character and a whitespace character. That is, given the input line ` foo bar', sort breaks it into fields ` foo' and ` bar'. The field separator is not considered to be part of either the field preceding or the field following. But note that sort fields that extend to the end of the line, as `-k 2', or sort fields consisting of a range, as `-k 2,3', retain the field separators present between the endpoints of the range.

`-T tempdir'
`--temporary-directory=tempdir'
Use directory tempdir to store temporary files, overriding the TMPDIR environment variable. If this option is given more than once, temporary files are stored in all the directories given. If you have a large sort or merge that is I/O-bound, you can often improve performance by using this option to specify directories on different disks and controllers.

`-u'
`--unique'

Normally, output only the first of a sequence of lines that compare equal. For the `--check' (`-c') option, check that no pair of consecutive lines compares equal.

`-k pos1[,pos2]'
`--key=pos1[,pos2]'
Specify a sort field that consists of the part of the line between pos1 and pos2 (or the end of the line, if pos2 is omitted), inclusive. Fields and character positions are numbered starting with 1. So to sort on the second field, you'd use `--key=2,2' (`-k 2,2'). See below for more examples.

`-z'
`--zero-terminated'
Treat the input as a set of lines, each terminated by a zero byte (ASCII NUL (Null) character) instead of an ASCII LF (Line Feed). This option can be useful in conjunction with `perl -0' or `find -print0' and `xargs -0' which do the same in order to reliably handle arbitrary pathnames (even those which contain Line Feed characters.)

Historical (BSD and System V) implementations of sort have differed in their interpretation of some options, particularly `-b', `-f', and `-n'. GNU sort follows the POSIX behavior, which is usually (but not always!) like the System V behavior. According to POSIX, `-n' no longer implies `-b'. For consistency, `-M' has been changed in the same way. This may affect the meaning of character positions in field specifications in obscure cases. The only fix is to add an explicit `-b'.

A position in a sort field specified with the `-k' option has the form `f.c', where f is the number of the field to use and c is the number of the first character from the beginning of the field. In a start position, an omitted `.c' stands for the field's first character. In an end position, an omitted or zero `.c' stands for the field's last character. If the `-b' option was specified, the `.c' part of a field specification is counted from the first nonblank character of the field.

A sort key position may also have any of the option letters `Mbdfinr' appended to it, in which case the global ordering options are not used for that particular field. The `-b' option may be independently attached to either or both of the start and end positions of a field specification, and if it is inherited from the global options it will be attached to both. Keys may span multiple fields.

On older systems, sort supports an obsolete origin-zero syntax `+pos1 [-pos2]' for specifying sort keys. POSIX 1003.1-2001 (see section 2.5 Standards conformance) does not allow this; use `-k' instead.

Here are some examples to illustrate various combinations of options.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003