www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/1996/03/10/01:39:12

Date: Sun, 10 Mar 1996 08:31:03 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
To: djgpp-workers AT delorie DOT com
Subject: Library docs add-ons
Message-Id: <Pine.SUN.3.91.960310082639.2390N-100000@is>
Mime-Version: 1.0

I've added docs for a few library functions.  Please review the additions 
to the docs of `signal' to see whether I got it right.  The docs of regex 
functions are just texinfo-ized man pages.

Please see my other message which asks a few questions I have after going
through the sources. 

*** ansi/stdio/fopen.t~0	Mon Jul 10 09:39:42 1995
--- ansi/stdio/fopen.txh	Sat Mar  9 15:27:44 1996
***************
*** 37,42 ****
--- 37,50 ----
  
  Force the file to be open in binary mode instead of the default mode.
  
+ When called to open the console in binary mode, @code{fopen} will
+ disable the generation of @code{SIGINT} when you press @kbd{Ctrl-C}
+ (@kbd{Ctrl-Break} will still cause @code{SIGINT}), because many programs
+ that use binary reads from the console will also want to get the
+ @samp{^C} characters.  You can use the @code{__djgpp_set_ctrl_c} library
+ function (@pxref{__djgpp_set_ctrl_c}) if you want @kbd{Ctrl-C} to
+ generate interrupts while console is read in binary mode.
+ 
  @item t
  
  Force the file to be open in text mode instead of the default mode.
*** dos/io/setmode.t~0	Tue Jul 25 12:16:18 1995
--- dos/io/setmode.txh	Sat Mar  9 20:36:42 1996
***************
*** 15,20 ****
--- 15,29 ----
  into either cooked or raw mode accordingly, and set any @code{FILE*}
  objects that use this file into text or binary mode. 
  
+ When called to put @var{file} that refers to the console into binary
+ mode, @code{setmode} will disable the generation of @code{SIGINT} when
+ you press @kbd{Ctrl-C} (@kbd{Ctrl-Break} will still cause
+ @code{SIGINT}), because many programs that use binary reads from the
+ console will also want to get the @samp{^C} characters.  You can use the
+ @code{__djgpp_set_ctrl_c} library function (@pxref{__djgpp_set_ctrl_c})
+ if you want @kbd{Ctrl-C} to generate interrupts while console is read in
+ binary mode.
+ 
  Note that, for buffered streams (@code{FILE*}), you must call
  @code{fflush} (@pxref{fflush}) before @code{setmode}, or call
  @code{setmode} before writing anything to the file, for proper
*** go32/dpmiexcp.t~0	Mon Jul 10 09:40:42 1995
--- go32/dpmiexcp.txh	Sat Mar  9 20:45:02 1996
***************
*** 5,43 ****
  @example
  #include <signal.h>
  
! int	raise(int _sig);
  @end example
  
  @subheading Description
  
! This function raises the given signal (see @code{<signal.h>} for a
! list). @xref{signal}.
  
  @subheading Return Value
  
! 0 on success.
  @c ----------------------------------------------------------------------
  @node signal, signal
  @subheading Syntax
  
  @example
  #include <signal.h>
  
! void	(*signal(int _sig, void (*_func)(int)))(int);
  @end example
  
  @subheading Description
  
! This function registers signal handlers.  Signal numbers are 0..255
! for software interrupts, 256..287 for exceptions (exception number
! plus 256) or as specified in @code{<signal.h>}.
! 
! You may pass SIG_DFL to reset the default handling, SIG_ERR to force
! an error when that signal happens, or SIG_IGN to ignore that signal.
! Signal handlers are regular C functions, and may call any function
! that the ANSI/POSIX specs say are valid for signal handlers.  Signal
! handlers for hardware interrupts need special handling.
  
  @subheading Return Value
  
! The previous handler.
--- 5,295 ----
  @example
  #include <signal.h>
  
! int	raise(int sig);
  @end example
  
  @subheading Description
  
! This function raises the given signal @var{sig}.
! @xref{the list of possible signals, signal}.
  
  @subheading Return Value
  
! 0 on success, -1 for illegal value of @var{sig}.
! 
  @c ----------------------------------------------------------------------
+ 
  @node signal, signal
  @subheading Syntax
  
  @example
  #include <signal.h>
  
! void	(*signal(int sig, void (*func)(int)))(int);
  @end example
  
  @subheading Description
  
! Signals are generated in response to some exceptional behavior of the
! program, such as division by 0.  A signal can also report some
! asynchronous event outside the program, such as someone pressing a
! Ctrl-Break key combination.
! 
! Signals are numbered 0..255 for software interrupts and 256..287 for
! exceptions (exception number plus 256); other implementation-specific
! codes are specified in @code{<signal.h>} (see below).  Every signal is
! given a mnemonic which you should use for portable programs.
! 
! The default handling for all the signals is to print a traceback (a
! stack dump which describes the sequence of function calls leading to the
! generation of the signal) and abort the program.
! 
! This function allows you to change the default behavior for a specific
! signal.  It registers @var{func} as a signal handler for signal number
! @var{sig}.  After you register your function as the handler for a
! particular signal, it will be called when that signal occurs.  The
! execution of the program will be suspended until the handler returns or
! calls @code{longjmp} (@pxref{longjmp}).
! 
! You may pass SIG_DFL as the value of @var{func} to reset the signal
! handling for the signal @var{sig} to default (also
! @xref{__djgpp_exception_toggle}, for a quick way to restore all the
! signals' handling to default), SIG_ERR to force an error when that
! signal happens, or SIG_IGN to ignore that signal.  Signal handlers that
! you write are regular C functions, and may call any function that the
! ANSI/POSIX specs say are valid for signal handlers.  For maximum
! portability, a handler for hardware interrupts and processor exceptions
! should only make calls to @code{signal}, assign values to data objects
! of type @code{volatile sig_atomic_t} (defined as @code{int} on
! @code{<signal.h>}) and return.  Handlers for hardware interrupts need
! also be locked in memory (so that the operation of virtual memory
! mechanism won't swap them out), @xref{locking memory regions,
! __dpmi_lock_linear_region}.  Handlers for software interrupts can also
! terminate by calling @code{abort}, @code{exit} or @code{longjmp}.
! 
! The following signals are defined on @code{<signal.h>}:
! 
! @table @code{}
! 
! @item SIGABRT
! 
! The Abort signal.  Currently only used by the @code{assert} macro to
! terminate the program when an assertion fails (@pxref{assert}).
! 
! @item SIGFPE
! 
! The Floating Point Error signal.  Generated in case of divide by zero
! exception (Int 00h), overflow exception (Int 04h), and any x87
! co-processor exception, either generated by the CPU (Int 10h), or by the
! co-processor itself (Int 75h).
! 
! @item SIGILL
! 
! The Invalid Execution signal.  Currently only generated for
! unknown/invalid exceptions.
! 
! @item SIGINT
! 
! The Interrupt signal.  Generated when a @kbd{Ctrl-C} or @kbd{Ctrl-Break}
! (Int 1Bh) key is hit.  Note that when you open the console in binary
! mode, or switch it to binary mode by a call to @code{setmode}
! (@pxref{setmode}), generation of @code{SIGINT} as result of @kbd{Ctrl-C}
! key is disabled.  This is so for programs (such as Emacs) which want to
! be able to read the @samp{^C} character as any other character.  Use the
! library function @code{__djgpp_set_ctrl_c} to restore @code{SIGINT}
! generation when @kbd{Ctrl-C} is hit, if you need this.
! @xref{__djgpp_set_ctrl_c}, for details on how this should be done.
! @kbd{Ctrl-Break} always generates @code{SIGINT}.
! 
! DJGPP hooks the keyboard hardware interrupt (Int 09h) to be able to
! generate @code{SIGINT} in response to @kbd{Ctrl-C} key; you should be
! aware of this when you install a handler for the keyboard interrupt.
! 
! @item SIGSEGV
! 
! The invalid storage access (Segmentation Violation) signal.  Generated
! in response to any of the following exceptions: Bound range exceeded in
! BOUND instruction (Int 05h), Double Exception or an exception in the
! exception handler (Int 08h), Segment Boundary violation by co-processor
! (Int 09h), Segment Not Present (Int 0Bh), Stack Fault (Int 0Ch), General
! Protection Violation (Int 0Dh), or Page Fault (Int 0Eh).  Note that Int
! 09h is only generated on 80386 processor; i486 and later CPUs cause Int
! 0Dh when the co-processor accesses memory out of bounds.
! 
! @item SIGTERM
! 
! The Termination Request signal.  Currently unused.
! 
! 
! The signals below this are not defined by ANSI C, and cannot be used
! when compiling under @samp{-ansi} option to @samp{gcc}.
! 
! 
! @item SIGALRM
! 
! The Alarm signal.  Generated after certain time period has passed after
! a call to @code{alarm} library function (@pxref{alarm}).
! 
! @item SIGHUP
! 
! The Hang-up signal.  Currently unused.
! 
! @item SIGKILL
! 
! The Kill signal.  Currently unused.
! 
! @item SIGPIPE
! 
! The Broken Pipe signal.  Currently unused.
! 
! @item SIGQUIT
! 
! The Quit signal.  Currently unused.
! 
! @item SIGUSR1
! 
! User-defined signal no. 1.
! 
! @item SIGUSR2
! 
! User-defined signal no. 2.
! 
! 
! The signals below are not defined by ANSI C and POSIX, and cannot be
! used when compiling under either @samp{-ansi} or @samp{-posix} options
! to @samp{gcc}.
! 
! 
! @item SIGTRAP
! 
! The Trap Instruction signal.  Generated in response to the Debugger
! Exception (Int 01h) or Breakpoint Exception (Int 03h).
! 
! @item SIGNOFP
! 
! The No Co-processor signal.  Generated if a co-processor (floating-point)
! instruction is encountered when no co-processor is installed (Int 07h).
! 
! @item SIGTIMR
! 
! The Timer signal.  Used by the @code{itimer} and @code{alarm} functions
! (@xref{itimer}, @xref{alarm}).
! 
! @item SIGPROF
! 
! The Profiler signal.  Used by the execution profile gathering code in a
! program compiled with @samp{-pg} option to @samp{gcc}.
! 
! @end table
! 
  
  @subheading Return Value
  
! The previous handler for signal @var{sig}, or @code{SIG_ERR} if the
! value of @var{sig} is outside legal limits.
! 
! @subheading Signal Mechanism Implementation Notes
! 
! Due to subtle aspects of protected-mode programs operation under MS-DOS,
! signal handlers cannot be safely called from hardware interrupt
! handlers.  Therefore, DJGPP exception-handling mechanism arranges for
! the signal handler to be called on the first occasion that the program
! is in protected mode and touches any of its data.  This means that if
! the exception occurs while the processor is in real mode, like when your
! program calls some DOS service, the signal handler won't be called until
! that call returns.  For instance, if you call @code{read} (or
! @code{scanf}, or @code{gets}) to read text from the console and press
! @kbd{Ctrl-C}, you will have to press @kbd{Enter} to terminate the
! @code{read} call to cause the signal handler for @code{SIGINT} to be
! called.  Another significant implication of this implementation is that
! when the program isn't touching any of its data (like in very tight
! loops which only use values in the registers), it cannot be interrupted.
! 
! @c-------------------------------------------------------------------------
! 
! @node __djgpp_set_ctrl_c, signal
! @subheading Syntax
! 
! @example
! #include <sys/exceptn.h>
! 
! int __djgpp_set_ctrl_c(int enable);
! @end example
! 
! @subheading Description
! 
! This function sets and resets the bit which controls whether
! @code{SIGINT} (@pxref{SIGINT, signal}) will be raised when you press
! @kbd{Ctrl-C}.  By default @kbd{Ctrl-C} generates an interrupt signal
! which, if uncaught by a signal handler, will abort your program.
! However, when you call the @code{setmode} library function to switch the
! console reads to binary mode, or open the console in binary mode for
! reading, this generation of interrupt signal is turned off, because some
! programs want to get the @samp{^C} characters as any other character and
! handle them by themselves.
! 
! @code{__djgpp_set_ctrl_c} lets you explicitly determine the effect of
! @kbd{Ctrl-C}.  When called with non-zero value of @var{enable}, it
! arranges for @kbd{Ctrl-C} to generate an interrupt; if you call it with
! a zero in @var{enable}, @kbd{Ctrl-C} are treated as normal characters.
! 
! Note that the effect of @kbd{Ctrl-Break} key is unaffected by this
! function; use the @code{_go32_want_ctrl_break} library function to
! control it.
! 
! Also note that in DJGPP, the effect of the interrupt signal will only be
! seen when the program is in protected mode (@xref{Signal Mechanism,
! signal}, for more details).  Thus, if you press @kbd{Ctrl-C} while your
! program calls DOS (e.g., when reading from the console), the
! @code{SIGINT} signal handler will only be called after that call
! returns.
! 
! @subheading Return Value
! 
! The previous state of the @kbd{Ctrl-C} effect: 0 if the generation of
! @code{SIGINT} by @kbd{Ctrl-C} was disabled, 1 if it was enabled.
! 
! @subheading Example
! 
! @example
! 
!   setmode(fileno(stdin), O_BINARY);
!   if (isatty(fileno(stdin)));
!     __djgpp_set_ctrl_c(1);
! 
! @end example
! 
! @c-------------------------------------------------------------------------
! 
! @node __djgpp_exception_toggle, signal
! @subheading Syntax
! 
! @example
! 
! #include <sys/exceptn.h>
! 
! void __djgpp_exception_toggle(void);
! 
! @end example
! 
! 
! @subheading Description
! 
! This function is automatically called when the program exits, to restore
! handling of all the exceptions to their normal state.  You may also call
! it from your program, around the code fragments where you need to
! temporarily restore @strong{all} the exceptions to their default
! handling.  One example of such case might be a call to a library
! functions that spawn child programs, when you don't want to handle
! signals generated while the child runs (by default, those signals are
! also passed to the parent).
! 
! @subheading Example
! 
! @example
! 
!   __djgpp_exception_toggle();
!   system("myprog");
!   __djgpp_exception_toggle();
! 
! @end example
*** posix/fcntl/open.t~0	Mon Jul 10 09:40:44 1995
--- posix/fcntl/open.txh	Sat Mar  9 15:25:54 1996
***************
*** 54,59 ****
--- 54,67 ----
  
  The file is opened in binary mode.
  
+ When called to open the console in binary mode, @code{open} will disable
+ the generation of @code{SIGINT} when you press @kbd{Ctrl-C}
+ (@kbd{Ctrl-Break} will still cause @code{SIGINT}), because many programs
+ that use binary reads from the console will also want to get the
+ @samp{^C} characters.  You can use the @code{__djgpp_set_ctrl_c} library
+ function (@pxref{__djgpp_set_ctrl_c}) if you want @kbd{Ctrl-C} to
+ generate interrupts while console is read in binary mode.
+ 
  @end table
  
  If the file is created by this call, it will be given the read/write
*** posix/glob/glob.t~0	Sun Jul 23 10:48:52 1995
--- posix/glob/glob.txh	Sat Mar  9 15:09:30 1996
***************
*** 4,47 ****
  @example
  #include <glob.h>
  
! int  glob(const char *_pattern, int _flags,
!           int (*_errfunc)(const char *_epath, int _eerrno), glob_t *_pglob);
  @end example
  
  @subheading Description
  
! This function performs command-line wildcard expansion.  The pattern
! to be expanded is passed as @var{pattern}, and a pointer to a
! structure is passed via @var{_pglob}.  This structure is like this:
  
! @example
! typedef struct @{
!   size_t gl_pathc;
!   char **gl_pathv;
!   size_t gl_offs;
! @} glob_t;
! @end example
  
! The @code{gl_pathc} and @code{gl_pathv} fields define a list of
! matches.  The @code{gl_offs} field indicates that the list should be
! offset.
  
! The structure is filled in with information about the files that
! matched the wildcard.  Values for @var{_flags} are as follows:
  
  @table @code
  
  @item GLOB_APPEND
  
! Append matches to a pre-existing structure.
  
  @item GLOB_DOOFFS
  
! Skip _pglob->gl_offs entries in gl_pathv.
  
  @item GLOB_ERR
  
! Stop when an unreadable directory is encountered.
  
  @item GLOB_MARK
  
--- 4,80 ----
  @example
  #include <glob.h>
  
! int  glob(const char *pattern, int flags,
!           int (*errfunc)(const char *epath, int eerrno), glob_t *pglob);
  @end example
  
  @subheading Description
  
! This function expands a filename wildcard which is passed as
! @var{pattern}.  The pattern may include these special characters:
  
! @table @code
! 
! @item *
! 
! Matches zero of more characters.
! 
! @item ?
! 
! Matches exactly one character (any character).
! 
! @item [...]
  
! Matches one character from a group of characters.  If the first
! character is @code{!}, matches any character @emph{not} in the group.  A
! group is defined as a list of characters between the brackets,
! e.g. @code{[dkl_]}, or by two characters separated by @code{-} to
! indicate all characters between and including these two.  For example,
! @code{[a-d]} matches @code{a}, @code{b}, @code{c}, or @code{d}, and
! @code{[!a-zA-Z0-9]} matches any character that is not alphanumeric. 
  
! @item ...
! 
! Matches all the subdirectories, recursively (VMS aficionados,
! rejoice!).
! 
! @item \
! 
! Causes the next character to not be treated as special.  For example,
! @code{\[} matches a literal @samp{[}.  If @var{flags} includes
! @code{GLOB_NOESCAPE}, this quoting is disabled and @samp{\} is handled
! as a simple character. 
! 
! @end table
! 
! The variable @var{flags} controls certain options of the expansion
! process.  Possible values for @var{_flags} are as follows:
  
  @table @code
  
  @item GLOB_APPEND
  
! Append the matches to those already present in the array
! @code{pglob->gl_pathv}.  By default, @code{glob} discards all previous
! contents of @code{pglob->gl_pathv} and allocates a new memory block for
! it.  If you use @code{GLOB_APPEND}, @code{pglob} should point to a
! structure returned by a previous call to @code{glob}.
  
  @item GLOB_DOOFFS
  
! Skip @code{pglob->gl_offs} entries in @code{gl_pathv} and put new
! matches after that point.  By default, @code{glob} puts the new matches
! beginning at @code{pglob->gl_pathv[0]}.  You can use this flag both with
! @code{GLOB_APPEND} (in which case the new matches will be put after the
! first @code{pglob->gl_offs} matches from previous call to @code{glob}),
! or without it (in which case the first @code{pglob->gl_offs} entries in
! @code{pglob->gl_pathv} will be filled by @code{NULL} pointers).
  
  @item GLOB_ERR
  
! Stop when an unreadable directory is encountered and call user-defined
! function @var{errfunc}.  This cannot happen under DOS (and thus
! @var{errfunc} is never used).
  
  @item GLOB_MARK
  
***************
*** 50,66 ****
  @item GLOB_NOCHECK
  
  If no matches are found, return the pattern itself as the only match.
  
  @item GLOB_NOESCAPE
  
! Disable blackslash as an escape character.
  
  @item GLOB_NOSORT
  
! Do not sort the returned list.
  
  @end table
  
  @subheading Return Value
  
! Zero on success.
--- 83,194 ----
  @item GLOB_NOCHECK
  
  If no matches are found, return the pattern itself as the only match.
+ By default, @code{glob} doesn't change @code{pglob} if no matches are
+ found.
  
  @item GLOB_NOESCAPE
  
! Disable blackslash as an escape character.  By default, backslash quotes
! special meta-characters in wildcards described above.
  
  @item GLOB_NOSORT
  
! Do not sort the returned list.  By default, the list is sorted
! alphabetically.  This flag causes the files to be returned in the order
! they were found in the directory.
  
  @end table
  
+ Given the pattern and the flags, @code{glob} expands the pattern and
+ returns a list of files that match the pattern in a structure a pointer
+ to which is passed via @var{pglob}.  This structure is like this:
+ 
+ @example
+ typedef struct @{
+   size_t gl_pathc;
+   char **gl_pathv;
+   size_t gl_offs;
+ @} glob_t;
+ @end example
+ 
+ In the structure, the @code{gl_pathc} field holds the number of
+ filenames in @code{gl_pathv} list; this includes the filenames produced
+ by this call, plus any previous filenames if @code{GLOB_APPEND} or
+ @code{GLOB_DOOFFS} were set in @var{flags}.  The list of matches is
+ returned as an array of pointers to the filenames; @code{gl_pathv} holds
+ the address of the array.  Thus, the filenames which match the pattern
+ can be accessed as @code{gl_pathv[0]}, @code{gl_pathv[1]}, etc.  If
+ @code{GLOB_DOOFFS} was set in @var{flags}, the new matches begin at
+ offset given by @code{gl_offs}.
+ 
  @subheading Return Value
  
! Zero on success, or one of these codes:
! 
! @table @code{}
! 
! @item GLOB_ABORTED
! 
! Not used in DJGPP implementation.
! 
! @item GLOB_NOMATCH
! 
! No files matched the given pattern.
! 
! @item GLOB_NOSPACE
! 
! @item GLOB_ERR
! 
! Not enough memory to accomodate expanded filenames.
! 
! @end table
! 
! @subheading Notes
! 
! @code{glob} will not match names of volume labels.
! 
! Filenames are matched case-insensitively.  The list of expanded
! filenames will be returned in lower case, if all the characters of the
! pattern (except those between brackets [...]) are lower-case; if they
! are upper-case, the expanded filenames will be also in upper case.
! 
! @subheading Example
! 
! @example
! 
! #include <stdlib.h>
! #include <string.h>
! #include <glob.h>
! 
! /* Convert a wildcard pattern into a list of blank-separated
!    filenames which match the wildcard.  */
! 
! char * glob_pattern(char *wildcard)
! {
!   char *gfilename;
!   size_t cnt, length;
!   glob_t glob_results;
!   char **p;
! 
!   glob(wildcard, GLOB_NOCHECK, 0, &glob_results);
! 
!   /* How much space do we need?  */
!   for (p = glob_results.gl_pathv, cnt = glob_results.gl_pathc;
!        cnt; p++, cnt--)
!     length += strlen(*p) + 1;
! 
!   /* Allocate the space and generate the list.  */
!   gfilename = (char *) calloc(length, sizeof(char));
!   for (p = glob_results.gl_pathv, cnt = glob_results.gl_pathc;
!        cnt; p++, cnt--)
!     {
!       strcat(gfilename, *p);
!       if (cnt > 1)
!         strcat(gfilename, " ");
!     }
! 
!   globfree(&glob_results);
!   return gfilename;
! }
! 
! @end example
*** posix/regex/regex.t~0	Sat Mar  9 20:57:00 1996
--- posix/regex/regex.txh	Sat Mar  9 11:05:00 1996
***************
*** 0 ****
--- 1,551 ----
+ @node regcomp, string
+ @subheading Syntax
+ 
+ @example
+ #include <sys/types.h>
+ #include <regex.h>
+ 
+ int regcomp(regex_t *preg, const char *pattern, int cflags);
+ @end example
+ 
+ @subheading Description
+ 
+ This function is part of the implementation of POSIX 1003.2 regular
+ expressions (@dfn{RE}s).
+ 
+ @code{regcomp} compiles the regular expression contained in the
+ @var{pattern} string, subject to the flags in @var{cflags}, and places
+ the results in the @code{regex_t} structure pointed to by @var{preg}.
+ (The regular expression syntax, as defined by POSIX 1003.2, is described
+ below.)
+ 
+ The parameter @var{cflags} is the bitwise OR of zero or more of the
+ following flags:
+ 
+ @table @code{}
+ 
+ @item REG_EXTENDED
+ 
+ Compile modern (@dfn{extended}) REs, rather than the obsolete
+ (@dfn{basic}) REs that are the default.
+ 
+ @item REG_BASIC
+ 
+ This is a synonym for 0, provided as a counterpart to
+ @code{REG_EXTENDED} to improve readability.
+ 
+ @item REG_NOSPEC
+ 
+ Compile with recognition of all special characters turned off.  All
+ characters are thus considered ordinary, so the RE in @var{pattern} is a
+ literal string.  This is an extension, compatible with but not specified
+ by POSIX 1003.2, and should be used with caution in software intended to
+ be portable to other systems.  @code{REG_EXTENDED} and @code{REG_NOSPEC}
+ may not be used in the same call to @code{regcomp}.
+ 
+ @item REG_ICASE
+ 
+ Compile for matching that ignores upper/lower case distinctions.  See
+ the description of regular expressions below for details of
+ case-independent matching.
+ 
+ @item REG_NOSUB
+ 
+ Compile for matching that need only report success or failure, not what
+ was matched.
+ 
+ @item REG_NEWLINE
+ 
+ Compile for newline-sensitive matching.  By default, newline is a
+ completely ordinary character with no special meaning in either REs or
+ strings.  With this flag, @samp{[^} bracket expressions and @samp{.}
+ never match newline, a @samp{^} anchor matches the null string after any
+ newline in the string in addition to its normal function, and the
+ @samp{$} anchor matches the null string before any newline in the string
+ in addition to its normal function.
+ 
+ @item REG_PEND
+ 
+ The regular expression ends, not at the first NUL, but just before the
+ character pointed to by the @code{re_endp} member of the structure
+ pointed to by @var{preg}.  The @code{re_endp} member is of type
+ @samp{const char *}.  This flag permits inclusion of NULs in the RE;
+ they are considered ordinary characters.  This is an extension,
+ compatible with but not specified by POSIX 1003.2, and should be used
+ with caution in software intended to be portable to other systems.
+ 
+ @end table
+ 
+ When successful, @code{regcomp} returns 0 and fills in the structure
+ pointed to by @var{preg}.  One member of that structure (other than
+ @code{re_endp}) is publicized: @code{re_nsub}, of type @code{size_t},
+ contains the number of parenthesized subexpressions within the RE
+ (except that the value of this member is undefined if the
+ @code{REG_NOSUB} flag was used).
+ 
+ Note that the length of the RE does matter; in particular, there is a
+ strong speed bonus for keeping RE length under about 30 characters,
+ with most special characters counting roughly double.
+ 
+ @subheading Return Value
+ 
+ If @code{regcomp} succeeds, it returns zero; if it fails, it returns a
+ non-zero error code, which is one of these:
+ 
+ @table @code{}
+ 
+ @item REG_BADPAT
+ 
+ invalid regular expression
+ 
+ @item REG_ECOLLATE
+ 
+ invalid collating element
+ 
+ @item REG_ECTYPE
+ 
+ invalid character class
+ 
+ @item REG_EESCAPE
+ 
+ @samp{\} applied to unescapable character
+ 
+ @item REG_ESUBREG
+ 
+ invalid backreference number (e.g., larger than the number of
+ parenthesized subexpressions in the RE)
+ 
+ @item REG_EBRACK
+ 
+ brackets [ ] not balanced
+ 
+ @item REG_EPAREN
+ 
+ parentheses ( ) not balanced
+ 
+ @item REG_EBRACE
+ 
+ braces { } not balanced
+ 
+ @item REG_BADBR
+ 
+ invalid repetition count(s) in { }
+ 
+ @item REG_ERANGE
+ 
+ invalid character range in [ ]
+ 
+ @item REG_ESPACE
+ 
+ ran out of memory (an RE like, say,
+ @samp{((((a@{1,100@})@{1,100@})@{1,100@})@{1,100@})@{1,100@}'} will
+ eventually run almost any existing machine out of swap space)
+ 
+ @item REG_BADRPT
+ 
+ ?, *, or + operand invalid
+ 
+ @item REG_EMPTY
+ 
+ empty (sub)expression
+ 
+ @item REG_ASSERT
+ 
+ ``can't happen'' (you found a bug in @code{regcomp})
+ 
+ @item REG_INVARG
+ 
+ invalid argument (e.g. a negative-length string)
+ 
+ @end table
+ 
+ @subheading Regular Expressions' Syntax
+ 
+ Regular expressions (@dfn{RE}s), as defined in POSIX 1003.2, come in two
+ forms: modern REs (roughly those of @code{egrep}; 1003.2 calls these
+ @emph{extended} REs) and obsolete REs (roughly those of @code{ed};
+ 1003.2 @emph{basic} REs).  Obsolete REs mostly exist for backward
+ compatibility in some old programs; they will be discussed at the end.
+ 1003.2 leaves some aspects of RE syntax and semantics open; `(*)' marks
+ decisions on these aspects that may not be fully portable to other
+ 1003.2 implementations.
+ 
+ A (modern) RE is one(*) or more non-empty(*) @emph{branches}, separated
+ by @samp{|}.  It matches anything that matches one of the branches.
+ 
+ A branch is one(*) or more @emph{pieces}, concatenated.  It matches a
+ match for the first, followed by a match for the second, etc.
+ 
+ A piece is an @emph{atom} possibly followed by a single(*) `*', `+',
+ `?', or @emph{bound}.
+ An atom followed by `*' matches a sequence of 0 or more matches of the atom.
+ An atom followed by `+' matches a sequence of 1 or more matches of the atom.
+ An atom followed by `?' matches a sequence of 0 or 1 matches of the atom.
+ 
+ A @emph{bound} is `{' followed by an unsigned decimal integer, possibly
+ followed by `,' possibly followed by another unsigned decimal integer,
+ always followed by `}'.  The integers must lie between 0 and
+ @code{RE_DUP_MAX} (255(*)) inclusive, and if there are two of them, the
+ first may not exceed the second.  An atom followed by a bound containing
+ one integer @samp{i} and no comma matches a sequence of exactly @samp{i}
+ matches of the atom.  An atom followed by a bound containing one integer
+ @samp{i} and a comma matches a sequence of @samp{i} or more matches of
+ the atom.  An atom followed by a bound containing two integers @samp{i}
+ and @samp{j} matches a sequence of @samp{i} through @samp{j} (inclusive)
+ matches of the atom.
+ 
+ An atom is a regular expression enclosed in `()' (matching a match for the
+ regular expression), an empty set of `()' (matching the null string(*)),
+ a @emph{bracket expression} (see below), `.' (matching any single
+ character), `^' (matching the null string at the beginning of a line),
+ `$' (matching the null string at the end of a line), a `\\' followed by
+ one of the characters `^.[$()|*+?{\\' (matching that character taken as
+ an ordinary character), a `\\' followed by any other character(*)
+ (matching that character taken as an ordinary character, as if the `\\'
+ had not been present(*)), or a single character with no other
+ significance (matching that character).  A `{' followed by a character
+ other than a digit is an ordinary character, not the beginning of a
+ bound(*).  It is illegal to end an RE with `\\'.
+ 
+ A @emph{bracket expression} is a list of characters enclosed in `[]'.
+ It normally matches any single character from the list (but see below).
+ If the list begins with `^', it matches any single character (but see
+ below) @emph{not} from the rest of the list.  If two characters in the
+ list are separated by `-', this is shorthand for the full @emph{range}
+ of characters between those two (inclusive) in the collating sequence,
+ e.g. `[0-9]' in ASCII matches any decimal digit.  It is illegal(*) for
+ two ranges to share an endpoint, e.g. `a-c-e'.  Ranges are very
+ collating-sequence-dependent, and portable programs should avoid relying
+ on them.
+ 
+ To include a literal `]' in the list, make it the first character
+ (following a possible `^').  To include a literal `-', make it the
+ first or last character, or the second endpoint of a range.  To use a
+ literal `-' as the first endpoint of a range, enclose it in `[.' and
+ `.]' to make it a collating element (see below).  With the exception of
+ these and some combinations using `[' (see next paragraphs), all other
+ special characters, including `\\', lose their special significance
+ within a bracket expression.
+ 
+ Within a bracket expression, a collating element (a character, a
+ multi-character sequence that collates as if it were a single character,
+ or a collating-sequence name for either) enclosed in `[.' and `.]'
+ stands for the sequence of characters of that collating element.  The
+ sequence is a single element of the bracket expression's list.  A
+ bracket expression containing a multi-character collating element can
+ thus match more than one character, e.g. if the collating sequence
+ includes a `ch' collating element, then the RE @samp{[[.ch.]]*c} matches
+ the first five characters of ``chchcc''.
+ 
+ Within a bracket expression, a collating element enclosed in `[=' and
+ `=]' is an equivalence class, standing for the sequences of characters
+ of all collating elements equivalent to that one, including itself.
+ (If there are no other equivalent collating elements, the treatment is
+ as if the enclosing delimiters were `[.' and `.]'.)  For example, if o
+ and \o'o^' are the members of an equivalence class, then `[[=o=]]',
+ `[[=\o'o^'=]]', and `[o\o'o^']' are all synonymous.  An equivalence
+ class may not\(dg be an endpoint of a range.
+ 
+ Within a bracket expression, the name of a @emph{character class}
+ enclosed in `[:' and `:]' stands for the list of all characters
+ belonging to that class.
+ Standard character class names are:
+ 
+ @example
+ alnum	digit	punct
+ alpha	graph	space
+ blank	lower	upper
+ cntrl	print	xdigit
+ @end example
+ 
+ These stand for the character classes defined by @code{isalnum}
+ (@pxref{isalnum}), @code{isdigit} (@pxref{isdigit}), @code{ispunct}
+ (@pxref{ispunct}), @code{isalpha} (@pxref{isalpha}), @code{isgraph}
+ (@pxref{isgraph}), @code{isspace} (@pxref{isspace}) (@code{blank} is the
+ same as @code{space}), @code{islower} (@pxref{islower}), @code{isupper}
+ (@pxref{isupper}), @code{iscntrl} (@pxref{iscntrl}), @code{isprint}
+ (@pxref{isprint}), and @code{isxdigit} (@pxref{isxdigit}),
+ respectively.  A locale may provide others.  A character class may not
+ be used as an endpoint of a range.
+ 
+ There are two special cases(*) of bracket expressions: the bracket
+ expressions `[[:<:]]' and `[[:>:]]' match the null string at the
+ beginning and end of a word respectively.  A word is defined as a
+ sequence of word characters which is neither preceded nor followed by
+ word characters.  A word character is an @code{alnum} character (as
+ defined by @code{isalnum} library function) or an underscore.  This is
+ an extension, compatible with but not specified by POSIX 1003.2, and
+ should be used with caution in software intended to be portable to other
+ systems.
+ 
+ In the event that an RE could match more than one substring of a given
+ string, the RE matches the one starting earliest in the string.  If the
+ RE could match more than one substring starting at that point, it
+ matches the longest.  Subexpressions also match the longest possible
+ substrings, subject to the constraint that the whole match be as long as
+ possible, with subexpressions starting earlier in the RE taking priority
+ over ones starting later.  Note that higher-level subexpressions thus
+ take priority over their lower-level component subexpressions.
+ 
+ Match lengths are measured in characters, not collating elements.  A
+ null string is considered longer than no match at all.  For example,
+ @samp{bb*} matches the three middle characters of @samp{abbbc}, 
+ @samp{(wee|week)(knights|nights)} matches all ten characters of
+ @samp{weeknights}, when @samp{(.*).*} is matched against @samp{abc} the
+ parenthesized subexpression matches all three characters, and when
+ @samp{(a*)*} is matched against `bc' both the whole RE and the
+ parenthesized subexpression match the null string.
+ 
+ If case-independent matching is specified, the effect is much as if all
+ case distinctions had vanished from the alphabet.  When an alphabetic
+ that exists in multiple cases appears as an ordinary character outside a
+ bracket expression, it is effectively transformed into a bracket
+ expression containing both cases, e.g. `x' becomes `[xX]'.  When it
+ appears inside a bracket expression, all case counterparts of it are
+ added to the bracket expression, so that (e.g.) `[x]' becomes `[xX]' and
+ `[^x]' becomes `[^xX]'.
+ 
+ No particular limit is imposed on the length of REs(*).  Programs
+ intended to be portable should not employ REs longer than 256 bytes,
+ as an implementation can refuse to accept such REs and remain
+ POSIX-compliant.
+ 
+ Obsolete (@emph{basic}) regular expressions differ in several respects.
+ `|', `+', and `?' are ordinary characters and there is no equivalent
+ for their functionality.  The delimiters for bounds are `\\@{' and
+ `\\@}', with `@{' and `@}' by themselves ordinary characters.  The
+ parentheses for nested subexpressions are `\(' and `\)', with `(' and
+ `)' by themselves ordinary characters.  `^' is an ordinary character
+ except at the beginning of the RE or(*) the beginning of a parenthesized
+ subexpression, `$' is an ordinary character except at the end of the RE
+ or(*) the end of a parenthesized subexpression, and `*' is an ordinary
+ character if it appears at the beginning of the RE or the beginning of a
+ parenthesized subexpression (after a possible leading `^').
+ Finally, there is one new type of atom, a @emph{back reference}:
+ `\\' followed by a non-zero decimal digit @emph{d} matches the same
+ sequence of characters matched by the @emph{d}th parenthesized
+ subexpression (numbering subexpressions by the positions of their
+ opening parentheses, left to right), so that (e.g.) @samp{\\([bc]\\)\\1}
+ matches `bb' or `cc' but not `bc'.
+ 
+ @c-------------------------------------------------------------------------
+ 
+ @node regexec, string
+ @subheading Syntax
+ 
+ @example
+ #include <sys/types.h>
+ #include <regex.h>
+ 
+ int regexec(const regex_t *preg, const char *string,
+             size_t nmatch, regmatch_t pmatch[], int eflags);
+ @end example
+ 
+ 
+ @subheading Description
+ 
+ @code{regexec} matches the compiled RE pointed to by @var{preg} against
+ the @var{string}, subject to the flags in @var{eflags}, and reports
+ results using @var{nmatch}, @var{pmatch}, and the returned value.  The
+ RE must have been compiled by a previous invocation of @code{regcomp}
+ (@pxref{regcomp}).  The compiled form is not altered during execution of
+ @code{regexec}, so a single compiled RE can be used simultaneously by
+ multiple threads.
+ 
+ By default, the NUL-terminated string pointed to by @var{string} is
+ considered to be the text of an entire line, minus any terminating
+ newline.
+ 
+ The @var{eflags} argument is the bitwise OR of zero or more of the
+ following flags:
+ 
+ @table @code{}
+ 
+ @item REG_NOTBOL
+ 
+ The first character of the string is not the beginning of a line, so the
+ `^' anchor should not match before it.  This does not affect the
+ behavior of newlines under @code{REG_NEWLINE} (@pxref{REG_NEWLINE,
+ regcomp}).
+ 
+ @item REG_NOTEOL
+ 
+ The NUL terminating the string does not end a line, so the `$' anchor
+ should not match before it.  This does not affect the behavior of
+ newlines under @code{REG_NEWLINE} (@pxref{REG_NEWLINE, regcomp}).
+ 
+ @item REG_STARTEND
+ 
+ The string is considered to start at @var{@w{string + pmatch[0].rm_so}}
+ and to have a terminating @code{NUL} located at
+ @var{@w{string + pmatch[0].rm_eo}} (there need not actually be a
+ @code{NUL} at that location), regardless of the value of @var{nmatch}.
+ See below for the definition of @var{pmatch} and @var{nmatch}.  This is
+ an extension, compatible with but not specified by POSIX 1003.2, and
+ should be used with caution in software intended to be portable to other
+ systems.  Note that a non-zero @code{rm_so} does not imply
+ @code{REG_NOTBOL}; @code{REG_STARTEND} affects only the location of the
+ string, not how it is matched.
+ 
+ @item REG_TRACE
+ 
+ trace execution (printed to stdout)
+ 
+ @item REG_LARGE
+ 
+ force large representation
+ 
+ @item REG_BACKR
+ 
+ force use of backref code
+ 
+ @end table
+ 
+ @xref{Regular Expressions' Syntax, regcomp}, for a discussion of what is
+ matched in situations where an RE or a portion thereof could match any
+ of several substrings of @var{string}.
+ 
+ If @code{REG_NOSUB} was specified in the compilation of the RE
+ (@pxref{REG_NOSUB, regcomp}), or if @var{nmatch} is 0, @code{regexec}
+ ignores the @var{pmatch} argument (but see below for the case where
+ @code{REG_STARTEND} is specified).  Otherwise, @var{pmatch} should point
+ to an array of @var{nmatch} structures of type @code{regmatch_t}.  Such
+ a structure has at least the members @code{rm_so} and @code{rm_eo}, both
+ of type @code{regoff_t} (a signed arithmetic type at least as large as
+ an @code{off_t} and a @code{ssize_t}, containing respectively the offset
+ of the first character of a substring and the offset of the first
+ character after the end of the substring.  Offsets are measured from the
+ beginning of the @var{string} argument given to @code{regexec}.  An
+ empty substring is denoted by equal offsets, both indicating the
+ character following the empty substring.
+ 
+ When @code{regexec} returns, the 0th member of the @var{pmatch} array is
+ filled in to indicate what substring of @var{string} was matched by the
+ entire RE.  Remaining members report what substring was matched by
+ parenthesized subexpressions within the RE; member @code{i} reports
+ subexpression @code{i}, with subexpressions counted (starting at 1) by
+ the order of their opening parentheses in the RE, left to right.  Unused
+ entries in the array---corresponding either to subexpressions that did
+ not participate in the match at all, or to subexpressions that do not
+ exist in the RE (that is, @code{@w{i > preg->re_nsub}}---have both
+ @code{rm_so} and @code{rm_eo} set to @code{-1}.  If a subexpression
+ participated in the match several times, the reported substring is the
+ last one it matched.  (Note, as an example in particular, that when the
+ RE @samp{(b*)+} matches @samp{bbb}, the parenthesized subexpression
+ matches each of the three `b's and then an infinite number of empty
+ strings following the last `b', so the reported substring is one of the
+ empties.)
+ 
+ If @code{REG_STARTEND} is specified in @var{eflags}, @var{pmatch} must
+ point to at least one @code{regmatch_t} variable (even if @var{nmatch}
+ is 0 or @code{REG_NOSUB} was specified in the compilation of the RE,
+ @xref{REG_NOSUB, regcomp}), to hold the input offsets for
+ @code{REG_STARTEND}.  Use for output is still entirely controlled by
+ @var{nmatch}; if @var{nmatch} is 0 or @code{REG_NOSUB} was specified,
+ the value of @code{pmatch[0]} will not be changed by a successful
+ @code{regexec}.
+ 
+ @var{nmatch} exceeding 0 is expensive; @var{nmatch} exceeding 1 is
+ worse.  Back references are massively expensive.
+ 
+ @subheading Return Value
+ 
+ Normally, @code{regexec} returns 0 for success and the non-zero code
+ @code{REG_NOMATCH} for failure.  Other non-zero error codes may be
+ returned in exceptional situations.  The list of possible error return
+ values is below:
+ 
+ @table @code{}
+ 
+ @item REG_ESPACE
+ 
+ ran out of memory
+ 
+ @item REG_BADPAT
+ 
+ the passed argument @var{preg} doesn't point to an RE compiled by
+ @code{regcomp}
+ 
+ @item REG_INVARG
+ 
+ invalid argument(s) (e.g., @var{@w{string + pmatch[0].rm_eo}} is less
+ than @var{@w{string + pmatch[0].rm_so}})
+ 
+ @end table
+ 
+ @c----------------------------------------------------------------------
+ 
+ @node regerror, string
+ @subheading Syntax
+ 
+ @example
+ 
+ #include <sys/types.h>
+ #include <regex.h>
+ 
+ size_t regerror(int errcode, const regex_t *preg,
+                 char *errbuf, size_t errbuf_size);
+ 
+ @end example
+ 
+ @subheading Description
+ 
+ @code{regerror} maps a non-zero value of @var{errcode} from either
+ @code{regcomp} (@pxref{Return Value, regcomp}) or @code{regexec}
+ (@pxref{Return Value, regexec}) to a human-readable, printable message.
+ 
+ If @var{preg} is non- AT code{NULL}, the error code should have arisen from
+ use of the variable of the type @code{regex_t} pointed to by @var{preg},
+ and if the error code came from @code{regcomp}, it should have been the
+ result from the most recent @code{regcomp} using that @code{regex_t}
+ variable.  (@code{regerror} may be able to supply a more detailed
+ message using information from the @code{regex_t} than from
+ @var{errcode} alone.)  @code{regerror} places the @code{NUL}-terminated
+ message into the buffer pointed to by @var{errbuf}, limiting the length
+ (including the @code{NUL}) to at most @var{errbuf_size} bytes.  If the
+ whole message won't fit, as much of it as will fit before the
+ terminating @code{NUL} is supplied.  In any case, the returned value is
+ the size of buffer needed to hold the whole message (including
+ terminating @code{NUL}).  If @var{errbuf_size} is 0, @var{errbuf} is
+ ignored but the return value is still correct.
+ 
+ If the @var{errcode} given to @code{regerror} is first ORed with
+ @code{REG_ITOA}, the ``message'' that results is the printable name of
+ the error code, e.g. ``REG_NOMATCH'', rather than an explanation
+ thereof.  If @var{errcode} is @code{REG_ATOI}, then @var{preg} shall be
+ non-NULL and the @code{re_endp} member of the structure it points to
+ must point to the printable name of an error code
+ (e.g. ``REG_ECOLLATE''); in this case, the result in @var{errbuf} is the
+ decimal representation of the numeric value of the error code (0 if the
+ name is not recognized).  @code{REG_ITOA} and @code{REG_ATOI} are
+ intended primarily as debugging facilities; they are extensions,
+ compatible with but not specified by POSIX 1003.2, and should be used
+ with caution in software intended to be portable to other systems.  Be
+ warned also that they are considered experimental and changes are
+ possible.
+ 
+ @subheading Return Value
+ 
+ The size of buffer needed to hold the message (including terminating
+ @code{NUL}) is always returned, even if @var{errbuf_size} is zero.
+ 
+ @c--------------------------------------------------------------------------
+ 
+ @node regfree, string
+ @subheading Syntax
+ 
+ @example
+ 
+ #include <sys/types.h>
+ #include <regex.h>
+ 
+ void regfree(regex_t *preg);
+ 
+ @end example
+ 
+ @subheading Description
+ 
+ @code{regfree} frees any dynamically-allocated storage associated with
+ the compiled RE pointed to by @var{preg}.  The remaining @code{regex_t}
+ is no longer a valid compiled RE and the effect of supplying it to
+ @code{regexec} or @code{regerror} is undefined.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019