Date: Sun, 10 Mar 1996 08:31:03 +0200 (IST) From: Eli Zaretskii To: djgpp-workers AT delorie DOT com Subject: Library docs add-ons Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII I've added docs for a few library functions. Please review the additions to the docs of `signal' to see whether I got it right. The docs of regex functions are just texinfo-ized man pages. Please see my other message which asks a few questions I have after going through the sources. *** ansi/stdio/fopen.t~0 Mon Jul 10 09:39:42 1995 --- ansi/stdio/fopen.txh Sat Mar 9 15:27:44 1996 *************** *** 37,42 **** --- 37,50 ---- Force the file to be open in binary mode instead of the default mode. + When called to open the console in binary mode, @code{fopen} will + disable the generation of @code{SIGINT} when you press @kbd{Ctrl-C} + (@kbd{Ctrl-Break} will still cause @code{SIGINT}), because many programs + that use binary reads from the console will also want to get the + @samp{^C} characters. You can use the @code{__djgpp_set_ctrl_c} library + function (@pxref{__djgpp_set_ctrl_c}) if you want @kbd{Ctrl-C} to + generate interrupts while console is read in binary mode. + @item t Force the file to be open in text mode instead of the default mode. *** dos/io/setmode.t~0 Tue Jul 25 12:16:18 1995 --- dos/io/setmode.txh Sat Mar 9 20:36:42 1996 *************** *** 15,20 **** --- 15,29 ---- into either cooked or raw mode accordingly, and set any @code{FILE*} objects that use this file into text or binary mode. + When called to put @var{file} that refers to the console into binary + mode, @code{setmode} will disable the generation of @code{SIGINT} when + you press @kbd{Ctrl-C} (@kbd{Ctrl-Break} will still cause + @code{SIGINT}), because many programs that use binary reads from the + console will also want to get the @samp{^C} characters. You can use the + @code{__djgpp_set_ctrl_c} library function (@pxref{__djgpp_set_ctrl_c}) + if you want @kbd{Ctrl-C} to generate interrupts while console is read in + binary mode. + Note that, for buffered streams (@code{FILE*}), you must call @code{fflush} (@pxref{fflush}) before @code{setmode}, or call @code{setmode} before writing anything to the file, for proper *** go32/dpmiexcp.t~0 Mon Jul 10 09:40:42 1995 --- go32/dpmiexcp.txh Sat Mar 9 20:45:02 1996 *************** *** 5,43 **** @example #include ! int raise(int _sig); @end example @subheading Description ! This function raises the given signal (see @code{} for a ! list). @xref{signal}. @subheading Return Value ! 0 on success. @c ---------------------------------------------------------------------- @node signal, signal @subheading Syntax @example #include ! void (*signal(int _sig, void (*_func)(int)))(int); @end example @subheading Description ! This function registers signal handlers. Signal numbers are 0..255 ! for software interrupts, 256..287 for exceptions (exception number ! plus 256) or as specified in @code{}. ! ! You may pass SIG_DFL to reset the default handling, SIG_ERR to force ! an error when that signal happens, or SIG_IGN to ignore that signal. ! Signal handlers are regular C functions, and may call any function ! that the ANSI/POSIX specs say are valid for signal handlers. Signal ! handlers for hardware interrupts need special handling. @subheading Return Value ! The previous handler. --- 5,295 ---- @example #include ! int raise(int sig); @end example @subheading Description ! This function raises the given signal @var{sig}. ! @xref{the list of possible signals, signal}. @subheading Return Value ! 0 on success, -1 for illegal value of @var{sig}. ! @c ---------------------------------------------------------------------- + @node signal, signal @subheading Syntax @example #include ! void (*signal(int sig, void (*func)(int)))(int); @end example @subheading Description ! Signals are generated in response to some exceptional behavior of the ! program, such as division by 0. A signal can also report some ! asynchronous event outside the program, such as someone pressing a ! Ctrl-Break key combination. ! ! Signals are numbered 0..255 for software interrupts and 256..287 for ! exceptions (exception number plus 256); other implementation-specific ! codes are specified in @code{} (see below). Every signal is ! given a mnemonic which you should use for portable programs. ! ! The default handling for all the signals is to print a traceback (a ! stack dump which describes the sequence of function calls leading to the ! generation of the signal) and abort the program. ! ! This function allows you to change the default behavior for a specific ! signal. It registers @var{func} as a signal handler for signal number ! @var{sig}. After you register your function as the handler for a ! particular signal, it will be called when that signal occurs. The ! execution of the program will be suspended until the handler returns or ! calls @code{longjmp} (@pxref{longjmp}). ! ! You may pass SIG_DFL as the value of @var{func} to reset the signal ! handling for the signal @var{sig} to default (also ! @xref{__djgpp_exception_toggle}, for a quick way to restore all the ! signals' handling to default), SIG_ERR to force an error when that ! signal happens, or SIG_IGN to ignore that signal. Signal handlers that ! you write are regular C functions, and may call any function that the ! ANSI/POSIX specs say are valid for signal handlers. For maximum ! portability, a handler for hardware interrupts and processor exceptions ! should only make calls to @code{signal}, assign values to data objects ! of type @code{volatile sig_atomic_t} (defined as @code{int} on ! @code{}) and return. Handlers for hardware interrupts need ! also be locked in memory (so that the operation of virtual memory ! mechanism won't swap them out), @xref{locking memory regions, ! __dpmi_lock_linear_region}. Handlers for software interrupts can also ! terminate by calling @code{abort}, @code{exit} or @code{longjmp}. ! ! The following signals are defined on @code{}: ! ! @table @code{} ! ! @item SIGABRT ! ! The Abort signal. Currently only used by the @code{assert} macro to ! terminate the program when an assertion fails (@pxref{assert}). ! ! @item SIGFPE ! ! The Floating Point Error signal. Generated in case of divide by zero ! exception (Int 00h), overflow exception (Int 04h), and any x87 ! co-processor exception, either generated by the CPU (Int 10h), or by the ! co-processor itself (Int 75h). ! ! @item SIGILL ! ! The Invalid Execution signal. Currently only generated for ! unknown/invalid exceptions. ! ! @item SIGINT ! ! The Interrupt signal. Generated when a @kbd{Ctrl-C} or @kbd{Ctrl-Break} ! (Int 1Bh) key is hit. Note that when you open the console in binary ! mode, or switch it to binary mode by a call to @code{setmode} ! (@pxref{setmode}), generation of @code{SIGINT} as result of @kbd{Ctrl-C} ! key is disabled. This is so for programs (such as Emacs) which want to ! be able to read the @samp{^C} character as any other character. Use the ! library function @code{__djgpp_set_ctrl_c} to restore @code{SIGINT} ! generation when @kbd{Ctrl-C} is hit, if you need this. ! @xref{__djgpp_set_ctrl_c}, for details on how this should be done. ! @kbd{Ctrl-Break} always generates @code{SIGINT}. ! ! DJGPP hooks the keyboard hardware interrupt (Int 09h) to be able to ! generate @code{SIGINT} in response to @kbd{Ctrl-C} key; you should be ! aware of this when you install a handler for the keyboard interrupt. ! ! @item SIGSEGV ! ! The invalid storage access (Segmentation Violation) signal. Generated ! in response to any of the following exceptions: Bound range exceeded in ! BOUND instruction (Int 05h), Double Exception or an exception in the ! exception handler (Int 08h), Segment Boundary violation by co-processor ! (Int 09h), Segment Not Present (Int 0Bh), Stack Fault (Int 0Ch), General ! Protection Violation (Int 0Dh), or Page Fault (Int 0Eh). Note that Int ! 09h is only generated on 80386 processor; i486 and later CPUs cause Int ! 0Dh when the co-processor accesses memory out of bounds. ! ! @item SIGTERM ! ! The Termination Request signal. Currently unused. ! ! ! The signals below this are not defined by ANSI C, and cannot be used ! when compiling under @samp{-ansi} option to @samp{gcc}. ! ! ! @item SIGALRM ! ! The Alarm signal. Generated after certain time period has passed after ! a call to @code{alarm} library function (@pxref{alarm}). ! ! @item SIGHUP ! ! The Hang-up signal. Currently unused. ! ! @item SIGKILL ! ! The Kill signal. Currently unused. ! ! @item SIGPIPE ! ! The Broken Pipe signal. Currently unused. ! ! @item SIGQUIT ! ! The Quit signal. Currently unused. ! ! @item SIGUSR1 ! ! User-defined signal no. 1. ! ! @item SIGUSR2 ! ! User-defined signal no. 2. ! ! ! The signals below are not defined by ANSI C and POSIX, and cannot be ! used when compiling under either @samp{-ansi} or @samp{-posix} options ! to @samp{gcc}. ! ! ! @item SIGTRAP ! ! The Trap Instruction signal. Generated in response to the Debugger ! Exception (Int 01h) or Breakpoint Exception (Int 03h). ! ! @item SIGNOFP ! ! The No Co-processor signal. Generated if a co-processor (floating-point) ! instruction is encountered when no co-processor is installed (Int 07h). ! ! @item SIGTIMR ! ! The Timer signal. Used by the @code{itimer} and @code{alarm} functions ! (@xref{itimer}, @xref{alarm}). ! ! @item SIGPROF ! ! The Profiler signal. Used by the execution profile gathering code in a ! program compiled with @samp{-pg} option to @samp{gcc}. ! ! @end table ! @subheading Return Value ! The previous handler for signal @var{sig}, or @code{SIG_ERR} if the ! value of @var{sig} is outside legal limits. ! ! @subheading Signal Mechanism Implementation Notes ! ! Due to subtle aspects of protected-mode programs operation under MS-DOS, ! signal handlers cannot be safely called from hardware interrupt ! handlers. Therefore, DJGPP exception-handling mechanism arranges for ! the signal handler to be called on the first occasion that the program ! is in protected mode and touches any of its data. This means that if ! the exception occurs while the processor is in real mode, like when your ! program calls some DOS service, the signal handler won't be called until ! that call returns. For instance, if you call @code{read} (or ! @code{scanf}, or @code{gets}) to read text from the console and press ! @kbd{Ctrl-C}, you will have to press @kbd{Enter} to terminate the ! @code{read} call to cause the signal handler for @code{SIGINT} to be ! called. Another significant implication of this implementation is that ! when the program isn't touching any of its data (like in very tight ! loops which only use values in the registers), it cannot be interrupted. ! ! @c------------------------------------------------------------------------- ! ! @node __djgpp_set_ctrl_c, signal ! @subheading Syntax ! ! @example ! #include ! ! int __djgpp_set_ctrl_c(int enable); ! @end example ! ! @subheading Description ! ! This function sets and resets the bit which controls whether ! @code{SIGINT} (@pxref{SIGINT, signal}) will be raised when you press ! @kbd{Ctrl-C}. By default @kbd{Ctrl-C} generates an interrupt signal ! which, if uncaught by a signal handler, will abort your program. ! However, when you call the @code{setmode} library function to switch the ! console reads to binary mode, or open the console in binary mode for ! reading, this generation of interrupt signal is turned off, because some ! programs want to get the @samp{^C} characters as any other character and ! handle them by themselves. ! ! @code{__djgpp_set_ctrl_c} lets you explicitly determine the effect of ! @kbd{Ctrl-C}. When called with non-zero value of @var{enable}, it ! arranges for @kbd{Ctrl-C} to generate an interrupt; if you call it with ! a zero in @var{enable}, @kbd{Ctrl-C} are treated as normal characters. ! ! Note that the effect of @kbd{Ctrl-Break} key is unaffected by this ! function; use the @code{_go32_want_ctrl_break} library function to ! control it. ! ! Also note that in DJGPP, the effect of the interrupt signal will only be ! seen when the program is in protected mode (@xref{Signal Mechanism, ! signal}, for more details). Thus, if you press @kbd{Ctrl-C} while your ! program calls DOS (e.g., when reading from the console), the ! @code{SIGINT} signal handler will only be called after that call ! returns. ! ! @subheading Return Value ! ! The previous state of the @kbd{Ctrl-C} effect: 0 if the generation of ! @code{SIGINT} by @kbd{Ctrl-C} was disabled, 1 if it was enabled. ! ! @subheading Example ! ! @example ! ! setmode(fileno(stdin), O_BINARY); ! if (isatty(fileno(stdin))); ! __djgpp_set_ctrl_c(1); ! ! @end example ! ! @c------------------------------------------------------------------------- ! ! @node __djgpp_exception_toggle, signal ! @subheading Syntax ! ! @example ! ! #include ! ! void __djgpp_exception_toggle(void); ! ! @end example ! ! ! @subheading Description ! ! This function is automatically called when the program exits, to restore ! handling of all the exceptions to their normal state. You may also call ! it from your program, around the code fragments where you need to ! temporarily restore @strong{all} the exceptions to their default ! handling. One example of such case might be a call to a library ! functions that spawn child programs, when you don't want to handle ! signals generated while the child runs (by default, those signals are ! also passed to the parent). ! ! @subheading Example ! ! @example ! ! __djgpp_exception_toggle(); ! system("myprog"); ! __djgpp_exception_toggle(); ! ! @end example *** posix/fcntl/open.t~0 Mon Jul 10 09:40:44 1995 --- posix/fcntl/open.txh Sat Mar 9 15:25:54 1996 *************** *** 54,59 **** --- 54,67 ---- The file is opened in binary mode. + When called to open the console in binary mode, @code{open} will disable + the generation of @code{SIGINT} when you press @kbd{Ctrl-C} + (@kbd{Ctrl-Break} will still cause @code{SIGINT}), because many programs + that use binary reads from the console will also want to get the + @samp{^C} characters. You can use the @code{__djgpp_set_ctrl_c} library + function (@pxref{__djgpp_set_ctrl_c}) if you want @kbd{Ctrl-C} to + generate interrupts while console is read in binary mode. + @end table If the file is created by this call, it will be given the read/write *** posix/glob/glob.t~0 Sun Jul 23 10:48:52 1995 --- posix/glob/glob.txh Sat Mar 9 15:09:30 1996 *************** *** 4,47 **** @example #include ! int glob(const char *_pattern, int _flags, ! int (*_errfunc)(const char *_epath, int _eerrno), glob_t *_pglob); @end example @subheading Description ! This function performs command-line wildcard expansion. The pattern ! to be expanded is passed as @var{pattern}, and a pointer to a ! structure is passed via @var{_pglob}. This structure is like this: ! @example ! typedef struct @{ ! size_t gl_pathc; ! char **gl_pathv; ! size_t gl_offs; ! @} glob_t; ! @end example ! The @code{gl_pathc} and @code{gl_pathv} fields define a list of ! matches. The @code{gl_offs} field indicates that the list should be ! offset. ! The structure is filled in with information about the files that ! matched the wildcard. Values for @var{_flags} are as follows: @table @code @item GLOB_APPEND ! Append matches to a pre-existing structure. @item GLOB_DOOFFS ! Skip _pglob->gl_offs entries in gl_pathv. @item GLOB_ERR ! Stop when an unreadable directory is encountered. @item GLOB_MARK --- 4,80 ---- @example #include ! int glob(const char *pattern, int flags, ! int (*errfunc)(const char *epath, int eerrno), glob_t *pglob); @end example @subheading Description ! This function expands a filename wildcard which is passed as ! @var{pattern}. The pattern may include these special characters: ! @table @code ! ! @item * ! ! Matches zero of more characters. ! ! @item ? ! ! Matches exactly one character (any character). ! ! @item [...] ! Matches one character from a group of characters. If the first ! character is @code{!}, matches any character @emph{not} in the group. A ! group is defined as a list of characters between the brackets, ! e.g. @code{[dkl_]}, or by two characters separated by @code{-} to ! indicate all characters between and including these two. For example, ! @code{[a-d]} matches @code{a}, @code{b}, @code{c}, or @code{d}, and ! @code{[!a-zA-Z0-9]} matches any character that is not alphanumeric. ! @item ... ! ! Matches all the subdirectories, recursively (VMS aficionados, ! rejoice!). ! ! @item \ ! ! Causes the next character to not be treated as special. For example, ! @code{\[} matches a literal @samp{[}. If @var{flags} includes ! @code{GLOB_NOESCAPE}, this quoting is disabled and @samp{\} is handled ! as a simple character. ! ! @end table ! ! The variable @var{flags} controls certain options of the expansion ! process. Possible values for @var{_flags} are as follows: @table @code @item GLOB_APPEND ! Append the matches to those already present in the array ! @code{pglob->gl_pathv}. By default, @code{glob} discards all previous ! contents of @code{pglob->gl_pathv} and allocates a new memory block for ! it. If you use @code{GLOB_APPEND}, @code{pglob} should point to a ! structure returned by a previous call to @code{glob}. @item GLOB_DOOFFS ! Skip @code{pglob->gl_offs} entries in @code{gl_pathv} and put new ! matches after that point. By default, @code{glob} puts the new matches ! beginning at @code{pglob->gl_pathv[0]}. You can use this flag both with ! @code{GLOB_APPEND} (in which case the new matches will be put after the ! first @code{pglob->gl_offs} matches from previous call to @code{glob}), ! or without it (in which case the first @code{pglob->gl_offs} entries in ! @code{pglob->gl_pathv} will be filled by @code{NULL} pointers). @item GLOB_ERR ! Stop when an unreadable directory is encountered and call user-defined ! function @var{errfunc}. This cannot happen under DOS (and thus ! @var{errfunc} is never used). @item GLOB_MARK *************** *** 50,66 **** @item GLOB_NOCHECK If no matches are found, return the pattern itself as the only match. @item GLOB_NOESCAPE ! Disable blackslash as an escape character. @item GLOB_NOSORT ! Do not sort the returned list. @end table @subheading Return Value ! Zero on success. --- 83,194 ---- @item GLOB_NOCHECK If no matches are found, return the pattern itself as the only match. + By default, @code{glob} doesn't change @code{pglob} if no matches are + found. @item GLOB_NOESCAPE ! Disable blackslash as an escape character. By default, backslash quotes ! special meta-characters in wildcards described above. @item GLOB_NOSORT ! Do not sort the returned list. By default, the list is sorted ! alphabetically. This flag causes the files to be returned in the order ! they were found in the directory. @end table + Given the pattern and the flags, @code{glob} expands the pattern and + returns a list of files that match the pattern in a structure a pointer + to which is passed via @var{pglob}. This structure is like this: + + @example + typedef struct @{ + size_t gl_pathc; + char **gl_pathv; + size_t gl_offs; + @} glob_t; + @end example + + In the structure, the @code{gl_pathc} field holds the number of + filenames in @code{gl_pathv} list; this includes the filenames produced + by this call, plus any previous filenames if @code{GLOB_APPEND} or + @code{GLOB_DOOFFS} were set in @var{flags}. The list of matches is + returned as an array of pointers to the filenames; @code{gl_pathv} holds + the address of the array. Thus, the filenames which match the pattern + can be accessed as @code{gl_pathv[0]}, @code{gl_pathv[1]}, etc. If + @code{GLOB_DOOFFS} was set in @var{flags}, the new matches begin at + offset given by @code{gl_offs}. + @subheading Return Value ! Zero on success, or one of these codes: ! ! @table @code{} ! ! @item GLOB_ABORTED ! ! Not used in DJGPP implementation. ! ! @item GLOB_NOMATCH ! ! No files matched the given pattern. ! ! @item GLOB_NOSPACE ! ! @item GLOB_ERR ! ! Not enough memory to accomodate expanded filenames. ! ! @end table ! ! @subheading Notes ! ! @code{glob} will not match names of volume labels. ! ! Filenames are matched case-insensitively. The list of expanded ! filenames will be returned in lower case, if all the characters of the ! pattern (except those between brackets [...]) are lower-case; if they ! are upper-case, the expanded filenames will be also in upper case. ! ! @subheading Example ! ! @example ! ! #include ! #include ! #include ! ! /* Convert a wildcard pattern into a list of blank-separated ! filenames which match the wildcard. */ ! ! char * glob_pattern(char *wildcard) ! { ! char *gfilename; ! size_t cnt, length; ! glob_t glob_results; ! char **p; ! ! glob(wildcard, GLOB_NOCHECK, 0, &glob_results); ! ! /* How much space do we need? */ ! for (p = glob_results.gl_pathv, cnt = glob_results.gl_pathc; ! cnt; p++, cnt--) ! length += strlen(*p) + 1; ! ! /* Allocate the space and generate the list. */ ! gfilename = (char *) calloc(length, sizeof(char)); ! for (p = glob_results.gl_pathv, cnt = glob_results.gl_pathc; ! cnt; p++, cnt--) ! { ! strcat(gfilename, *p); ! if (cnt > 1) ! strcat(gfilename, " "); ! } ! ! globfree(&glob_results); ! return gfilename; ! } ! ! @end example *** posix/regex/regex.t~0 Sat Mar 9 20:57:00 1996 --- posix/regex/regex.txh Sat Mar 9 11:05:00 1996 *************** *** 0 **** --- 1,551 ---- + @node regcomp, string + @subheading Syntax + + @example + #include + #include + + int regcomp(regex_t *preg, const char *pattern, int cflags); + @end example + + @subheading Description + + This function is part of the implementation of POSIX 1003.2 regular + expressions (@dfn{RE}s). + + @code{regcomp} compiles the regular expression contained in the + @var{pattern} string, subject to the flags in @var{cflags}, and places + the results in the @code{regex_t} structure pointed to by @var{preg}. + (The regular expression syntax, as defined by POSIX 1003.2, is described + below.) + + The parameter @var{cflags} is the bitwise OR of zero or more of the + following flags: + + @table @code{} + + @item REG_EXTENDED + + Compile modern (@dfn{extended}) REs, rather than the obsolete + (@dfn{basic}) REs that are the default. + + @item REG_BASIC + + This is a synonym for 0, provided as a counterpart to + @code{REG_EXTENDED} to improve readability. + + @item REG_NOSPEC + + Compile with recognition of all special characters turned off. All + characters are thus considered ordinary, so the RE in @var{pattern} is a + literal string. This is an extension, compatible with but not specified + by POSIX 1003.2, and should be used with caution in software intended to + be portable to other systems. @code{REG_EXTENDED} and @code{REG_NOSPEC} + may not be used in the same call to @code{regcomp}. + + @item REG_ICASE + + Compile for matching that ignores upper/lower case distinctions. See + the description of regular expressions below for details of + case-independent matching. + + @item REG_NOSUB + + Compile for matching that need only report success or failure, not what + was matched. + + @item REG_NEWLINE + + Compile for newline-sensitive matching. By default, newline is a + completely ordinary character with no special meaning in either REs or + strings. With this flag, @samp{[^} bracket expressions and @samp{.} + never match newline, a @samp{^} anchor matches the null string after any + newline in the string in addition to its normal function, and the + @samp{$} anchor matches the null string before any newline in the string + in addition to its normal function. + + @item REG_PEND + + The regular expression ends, not at the first NUL, but just before the + character pointed to by the @code{re_endp} member of the structure + pointed to by @var{preg}. The @code{re_endp} member is of type + @samp{const char *}. This flag permits inclusion of NULs in the RE; + they are considered ordinary characters. This is an extension, + compatible with but not specified by POSIX 1003.2, and should be used + with caution in software intended to be portable to other systems. + + @end table + + When successful, @code{regcomp} returns 0 and fills in the structure + pointed to by @var{preg}. One member of that structure (other than + @code{re_endp}) is publicized: @code{re_nsub}, of type @code{size_t}, + contains the number of parenthesized subexpressions within the RE + (except that the value of this member is undefined if the + @code{REG_NOSUB} flag was used). + + Note that the length of the RE does matter; in particular, there is a + strong speed bonus for keeping RE length under about 30 characters, + with most special characters counting roughly double. + + @subheading Return Value + + If @code{regcomp} succeeds, it returns zero; if it fails, it returns a + non-zero error code, which is one of these: + + @table @code{} + + @item REG_BADPAT + + invalid regular expression + + @item REG_ECOLLATE + + invalid collating element + + @item REG_ECTYPE + + invalid character class + + @item REG_EESCAPE + + @samp{\} applied to unescapable character + + @item REG_ESUBREG + + invalid backreference number (e.g., larger than the number of + parenthesized subexpressions in the RE) + + @item REG_EBRACK + + brackets [ ] not balanced + + @item REG_EPAREN + + parentheses ( ) not balanced + + @item REG_EBRACE + + braces { } not balanced + + @item REG_BADBR + + invalid repetition count(s) in { } + + @item REG_ERANGE + + invalid character range in [ ] + + @item REG_ESPACE + + ran out of memory (an RE like, say, + @samp{((((a@{1,100@})@{1,100@})@{1,100@})@{1,100@})@{1,100@}'} will + eventually run almost any existing machine out of swap space) + + @item REG_BADRPT + + ?, *, or + operand invalid + + @item REG_EMPTY + + empty (sub)expression + + @item REG_ASSERT + + ``can't happen'' (you found a bug in @code{regcomp}) + + @item REG_INVARG + + invalid argument (e.g. a negative-length string) + + @end table + + @subheading Regular Expressions' Syntax + + Regular expressions (@dfn{RE}s), as defined in POSIX 1003.2, come in two + forms: modern REs (roughly those of @code{egrep}; 1003.2 calls these + @emph{extended} REs) and obsolete REs (roughly those of @code{ed}; + 1003.2 @emph{basic} REs). Obsolete REs mostly exist for backward + compatibility in some old programs; they will be discussed at the end. + 1003.2 leaves some aspects of RE syntax and semantics open; `(*)' marks + decisions on these aspects that may not be fully portable to other + 1003.2 implementations. + + A (modern) RE is one(*) or more non-empty(*) @emph{branches}, separated + by @samp{|}. It matches anything that matches one of the branches. + + A branch is one(*) or more @emph{pieces}, concatenated. It matches a + match for the first, followed by a match for the second, etc. + + A piece is an @emph{atom} possibly followed by a single(*) `*', `+', + `?', or @emph{bound}. + An atom followed by `*' matches a sequence of 0 or more matches of the atom. + An atom followed by `+' matches a sequence of 1 or more matches of the atom. + An atom followed by `?' matches a sequence of 0 or 1 matches of the atom. + + A @emph{bound} is `{' followed by an unsigned decimal integer, possibly + followed by `,' possibly followed by another unsigned decimal integer, + always followed by `}'. The integers must lie between 0 and + @code{RE_DUP_MAX} (255(*)) inclusive, and if there are two of them, the + first may not exceed the second. An atom followed by a bound containing + one integer @samp{i} and no comma matches a sequence of exactly @samp{i} + matches of the atom. An atom followed by a bound containing one integer + @samp{i} and a comma matches a sequence of @samp{i} or more matches of + the atom. An atom followed by a bound containing two integers @samp{i} + and @samp{j} matches a sequence of @samp{i} through @samp{j} (inclusive) + matches of the atom. + + An atom is a regular expression enclosed in `()' (matching a match for the + regular expression), an empty set of `()' (matching the null string(*)), + a @emph{bracket expression} (see below), `.' (matching any single + character), `^' (matching the null string at the beginning of a line), + `$' (matching the null string at the end of a line), a `\\' followed by + one of the characters `^.[$()|*+?{\\' (matching that character taken as + an ordinary character), a `\\' followed by any other character(*) + (matching that character taken as an ordinary character, as if the `\\' + had not been present(*)), or a single character with no other + significance (matching that character). A `{' followed by a character + other than a digit is an ordinary character, not the beginning of a + bound(*). It is illegal to end an RE with `\\'. + + A @emph{bracket expression} is a list of characters enclosed in `[]'. + It normally matches any single character from the list (but see below). + If the list begins with `^', it matches any single character (but see + below) @emph{not} from the rest of the list. If two characters in the + list are separated by `-', this is shorthand for the full @emph{range} + of characters between those two (inclusive) in the collating sequence, + e.g. `[0-9]' in ASCII matches any decimal digit. It is illegal(*) for + two ranges to share an endpoint, e.g. `a-c-e'. Ranges are very + collating-sequence-dependent, and portable programs should avoid relying + on them. + + To include a literal `]' in the list, make it the first character + (following a possible `^'). To include a literal `-', make it the + first or last character, or the second endpoint of a range. To use a + literal `-' as the first endpoint of a range, enclose it in `[.' and + `.]' to make it a collating element (see below). With the exception of + these and some combinations using `[' (see next paragraphs), all other + special characters, including `\\', lose their special significance + within a bracket expression. + + Within a bracket expression, a collating element (a character, a + multi-character sequence that collates as if it were a single character, + or a collating-sequence name for either) enclosed in `[.' and `.]' + stands for the sequence of characters of that collating element. The + sequence is a single element of the bracket expression's list. A + bracket expression containing a multi-character collating element can + thus match more than one character, e.g. if the collating sequence + includes a `ch' collating element, then the RE @samp{[[.ch.]]*c} matches + the first five characters of ``chchcc''. + + Within a bracket expression, a collating element enclosed in `[=' and + `=]' is an equivalence class, standing for the sequences of characters + of all collating elements equivalent to that one, including itself. + (If there are no other equivalent collating elements, the treatment is + as if the enclosing delimiters were `[.' and `.]'.) For example, if o + and \o'o^' are the members of an equivalence class, then `[[=o=]]', + `[[=\o'o^'=]]', and `[o\o'o^']' are all synonymous. An equivalence + class may not\(dg be an endpoint of a range. + + Within a bracket expression, the name of a @emph{character class} + enclosed in `[:' and `:]' stands for the list of all characters + belonging to that class. + Standard character class names are: + + @example + alnum digit punct + alpha graph space + blank lower upper + cntrl print xdigit + @end example + + These stand for the character classes defined by @code{isalnum} + (@pxref{isalnum}), @code{isdigit} (@pxref{isdigit}), @code{ispunct} + (@pxref{ispunct}), @code{isalpha} (@pxref{isalpha}), @code{isgraph} + (@pxref{isgraph}), @code{isspace} (@pxref{isspace}) (@code{blank} is the + same as @code{space}), @code{islower} (@pxref{islower}), @code{isupper} + (@pxref{isupper}), @code{iscntrl} (@pxref{iscntrl}), @code{isprint} + (@pxref{isprint}), and @code{isxdigit} (@pxref{isxdigit}), + respectively. A locale may provide others. A character class may not + be used as an endpoint of a range. + + There are two special cases(*) of bracket expressions: the bracket + expressions `[[:<:]]' and `[[:>:]]' match the null string at the + beginning and end of a word respectively. A word is defined as a + sequence of word characters which is neither preceded nor followed by + word characters. A word character is an @code{alnum} character (as + defined by @code{isalnum} library function) or an underscore. This is + an extension, compatible with but not specified by POSIX 1003.2, and + should be used with caution in software intended to be portable to other + systems. + + In the event that an RE could match more than one substring of a given + string, the RE matches the one starting earliest in the string. If the + RE could match more than one substring starting at that point, it + matches the longest. Subexpressions also match the longest possible + substrings, subject to the constraint that the whole match be as long as + possible, with subexpressions starting earlier in the RE taking priority + over ones starting later. Note that higher-level subexpressions thus + take priority over their lower-level component subexpressions. + + Match lengths are measured in characters, not collating elements. A + null string is considered longer than no match at all. For example, + @samp{bb*} matches the three middle characters of @samp{abbbc}, + @samp{(wee|week)(knights|nights)} matches all ten characters of + @samp{weeknights}, when @samp{(.*).*} is matched against @samp{abc} the + parenthesized subexpression matches all three characters, and when + @samp{(a*)*} is matched against `bc' both the whole RE and the + parenthesized subexpression match the null string. + + If case-independent matching is specified, the effect is much as if all + case distinctions had vanished from the alphabet. When an alphabetic + that exists in multiple cases appears as an ordinary character outside a + bracket expression, it is effectively transformed into a bracket + expression containing both cases, e.g. `x' becomes `[xX]'. When it + appears inside a bracket expression, all case counterparts of it are + added to the bracket expression, so that (e.g.) `[x]' becomes `[xX]' and + `[^x]' becomes `[^xX]'. + + No particular limit is imposed on the length of REs(*). Programs + intended to be portable should not employ REs longer than 256 bytes, + as an implementation can refuse to accept such REs and remain + POSIX-compliant. + + Obsolete (@emph{basic}) regular expressions differ in several respects. + `|', `+', and `?' are ordinary characters and there is no equivalent + for their functionality. The delimiters for bounds are `\\@{' and + `\\@}', with `@{' and `@}' by themselves ordinary characters. The + parentheses for nested subexpressions are `\(' and `\)', with `(' and + `)' by themselves ordinary characters. `^' is an ordinary character + except at the beginning of the RE or(*) the beginning of a parenthesized + subexpression, `$' is an ordinary character except at the end of the RE + or(*) the end of a parenthesized subexpression, and `*' is an ordinary + character if it appears at the beginning of the RE or the beginning of a + parenthesized subexpression (after a possible leading `^'). + Finally, there is one new type of atom, a @emph{back reference}: + `\\' followed by a non-zero decimal digit @emph{d} matches the same + sequence of characters matched by the @emph{d}th parenthesized + subexpression (numbering subexpressions by the positions of their + opening parentheses, left to right), so that (e.g.) @samp{\\([bc]\\)\\1} + matches `bb' or `cc' but not `bc'. + + @c------------------------------------------------------------------------- + + @node regexec, string + @subheading Syntax + + @example + #include + #include + + int regexec(const regex_t *preg, const char *string, + size_t nmatch, regmatch_t pmatch[], int eflags); + @end example + + + @subheading Description + + @code{regexec} matches the compiled RE pointed to by @var{preg} against + the @var{string}, subject to the flags in @var{eflags}, and reports + results using @var{nmatch}, @var{pmatch}, and the returned value. The + RE must have been compiled by a previous invocation of @code{regcomp} + (@pxref{regcomp}). The compiled form is not altered during execution of + @code{regexec}, so a single compiled RE can be used simultaneously by + multiple threads. + + By default, the NUL-terminated string pointed to by @var{string} is + considered to be the text of an entire line, minus any terminating + newline. + + The @var{eflags} argument is the bitwise OR of zero or more of the + following flags: + + @table @code{} + + @item REG_NOTBOL + + The first character of the string is not the beginning of a line, so the + `^' anchor should not match before it. This does not affect the + behavior of newlines under @code{REG_NEWLINE} (@pxref{REG_NEWLINE, + regcomp}). + + @item REG_NOTEOL + + The NUL terminating the string does not end a line, so the `$' anchor + should not match before it. This does not affect the behavior of + newlines under @code{REG_NEWLINE} (@pxref{REG_NEWLINE, regcomp}). + + @item REG_STARTEND + + The string is considered to start at @var{@w{string + pmatch[0].rm_so}} + and to have a terminating @code{NUL} located at + @var{@w{string + pmatch[0].rm_eo}} (there need not actually be a + @code{NUL} at that location), regardless of the value of @var{nmatch}. + See below for the definition of @var{pmatch} and @var{nmatch}. This is + an extension, compatible with but not specified by POSIX 1003.2, and + should be used with caution in software intended to be portable to other + systems. Note that a non-zero @code{rm_so} does not imply + @code{REG_NOTBOL}; @code{REG_STARTEND} affects only the location of the + string, not how it is matched. + + @item REG_TRACE + + trace execution (printed to stdout) + + @item REG_LARGE + + force large representation + + @item REG_BACKR + + force use of backref code + + @end table + + @xref{Regular Expressions' Syntax, regcomp}, for a discussion of what is + matched in situations where an RE or a portion thereof could match any + of several substrings of @var{string}. + + If @code{REG_NOSUB} was specified in the compilation of the RE + (@pxref{REG_NOSUB, regcomp}), or if @var{nmatch} is 0, @code{regexec} + ignores the @var{pmatch} argument (but see below for the case where + @code{REG_STARTEND} is specified). Otherwise, @var{pmatch} should point + to an array of @var{nmatch} structures of type @code{regmatch_t}. Such + a structure has at least the members @code{rm_so} and @code{rm_eo}, both + of type @code{regoff_t} (a signed arithmetic type at least as large as + an @code{off_t} and a @code{ssize_t}, containing respectively the offset + of the first character of a substring and the offset of the first + character after the end of the substring. Offsets are measured from the + beginning of the @var{string} argument given to @code{regexec}. An + empty substring is denoted by equal offsets, both indicating the + character following the empty substring. + + When @code{regexec} returns, the 0th member of the @var{pmatch} array is + filled in to indicate what substring of @var{string} was matched by the + entire RE. Remaining members report what substring was matched by + parenthesized subexpressions within the RE; member @code{i} reports + subexpression @code{i}, with subexpressions counted (starting at 1) by + the order of their opening parentheses in the RE, left to right. Unused + entries in the array---corresponding either to subexpressions that did + not participate in the match at all, or to subexpressions that do not + exist in the RE (that is, @code{@w{i > preg->re_nsub}}---have both + @code{rm_so} and @code{rm_eo} set to @code{-1}. If a subexpression + participated in the match several times, the reported substring is the + last one it matched. (Note, as an example in particular, that when the + RE @samp{(b*)+} matches @samp{bbb}, the parenthesized subexpression + matches each of the three `b's and then an infinite number of empty + strings following the last `b', so the reported substring is one of the + empties.) + + If @code{REG_STARTEND} is specified in @var{eflags}, @var{pmatch} must + point to at least one @code{regmatch_t} variable (even if @var{nmatch} + is 0 or @code{REG_NOSUB} was specified in the compilation of the RE, + @xref{REG_NOSUB, regcomp}), to hold the input offsets for + @code{REG_STARTEND}. Use for output is still entirely controlled by + @var{nmatch}; if @var{nmatch} is 0 or @code{REG_NOSUB} was specified, + the value of @code{pmatch[0]} will not be changed by a successful + @code{regexec}. + + @var{nmatch} exceeding 0 is expensive; @var{nmatch} exceeding 1 is + worse. Back references are massively expensive. + + @subheading Return Value + + Normally, @code{regexec} returns 0 for success and the non-zero code + @code{REG_NOMATCH} for failure. Other non-zero error codes may be + returned in exceptional situations. The list of possible error return + values is below: + + @table @code{} + + @item REG_ESPACE + + ran out of memory + + @item REG_BADPAT + + the passed argument @var{preg} doesn't point to an RE compiled by + @code{regcomp} + + @item REG_INVARG + + invalid argument(s) (e.g., @var{@w{string + pmatch[0].rm_eo}} is less + than @var{@w{string + pmatch[0].rm_so}}) + + @end table + + @c---------------------------------------------------------------------- + + @node regerror, string + @subheading Syntax + + @example + + #include + #include + + size_t regerror(int errcode, const regex_t *preg, + char *errbuf, size_t errbuf_size); + + @end example + + @subheading Description + + @code{regerror} maps a non-zero value of @var{errcode} from either + @code{regcomp} (@pxref{Return Value, regcomp}) or @code{regexec} + (@pxref{Return Value, regexec}) to a human-readable, printable message. + + If @var{preg} is non- AT code{NULL}, the error code should have arisen from + use of the variable of the type @code{regex_t} pointed to by @var{preg}, + and if the error code came from @code{regcomp}, it should have been the + result from the most recent @code{regcomp} using that @code{regex_t} + variable. (@code{regerror} may be able to supply a more detailed + message using information from the @code{regex_t} than from + @var{errcode} alone.) @code{regerror} places the @code{NUL}-terminated + message into the buffer pointed to by @var{errbuf}, limiting the length + (including the @code{NUL}) to at most @var{errbuf_size} bytes. If the + whole message won't fit, as much of it as will fit before the + terminating @code{NUL} is supplied. In any case, the returned value is + the size of buffer needed to hold the whole message (including + terminating @code{NUL}). If @var{errbuf_size} is 0, @var{errbuf} is + ignored but the return value is still correct. + + If the @var{errcode} given to @code{regerror} is first ORed with + @code{REG_ITOA}, the ``message'' that results is the printable name of + the error code, e.g. ``REG_NOMATCH'', rather than an explanation + thereof. If @var{errcode} is @code{REG_ATOI}, then @var{preg} shall be + non-NULL and the @code{re_endp} member of the structure it points to + must point to the printable name of an error code + (e.g. ``REG_ECOLLATE''); in this case, the result in @var{errbuf} is the + decimal representation of the numeric value of the error code (0 if the + name is not recognized). @code{REG_ITOA} and @code{REG_ATOI} are + intended primarily as debugging facilities; they are extensions, + compatible with but not specified by POSIX 1003.2, and should be used + with caution in software intended to be portable to other systems. Be + warned also that they are considered experimental and changes are + possible. + + @subheading Return Value + + The size of buffer needed to hold the message (including terminating + @code{NUL}) is always returned, even if @var{errbuf_size} is zero. + + @c-------------------------------------------------------------------------- + + @node regfree, string + @subheading Syntax + + @example + + #include + #include + + void regfree(regex_t *preg); + + @end example + + @subheading Description + + @code{regfree} frees any dynamically-allocated storage associated with + the compiled RE pointed to by @var{preg}. The remaining @code{regex_t} + is no longer a valid compiled RE and the effect of supplying it to + @code{regexec} or @code{regerror} is undefined.