www.delorie.com/djgpp/doc/libc/libc_657.html   search  
libc.a reference

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

regexec

Syntax

 
#include <sys/types.h>
#include <regex.h>

int regexec(const regex_t *preg, const char *string,
            size_t nmatch, regmatch_t pmatch[], int eflags);

Description

regexec matches the compiled RE pointed to by preg against the string, subject to the flags in eflags, and reports results using nmatch, pmatch, and the returned value. The RE must have been compiled by a previous invocation of regcomp (see section regcomp). The compiled form is not altered during execution of regexec, so a single compiled RE can be used simultaneously by multiple threads.

By default, the NUL-terminated string pointed to by string is considered to be the text of an entire line, with the NUL indicating the end of the line. (That is, any other end-of-line marker is considered to have been removed and replaced by the NUL.)

The eflags argument is the bitwise OR of zero or more of the following flags:

REG_NOTBOL

The first character of the string is not the beginning of a line, so the `^' anchor should not match before it. This does not affect the behavior of newlines under REG_NEWLINE (REG_NEWLINE, see section regcomp).

REG_NOTEOL

The NUL terminating the string does not end a line, so the `$' anchor should not match before it. This does not affect the behavior of newlines under REG_NEWLINE (REG_NEWLINE, see section regcomp).

REG_STARTEND

The string is considered to start at string + pmatch[0].rm_so and to have a terminating NUL located at string + pmatch[0].rm_eo (there need not actually be a NUL at that location), regardless of the value of nmatch. See below for the definition of pmatch and nmatch. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems. Note that a non-zero rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not how it is matched.

REG_TRACE

trace execution (printed to stdout)

REG_LARGE

force large representation

REG_BACKR

force use of backref code

Regular Expressions' Syntax, See section regcomp, for a discussion of what is matched in situations where an RE or a portion thereof could match any of several substrings of string.

If REG_NOSUB was specified in the compilation of the RE (REG_NOSUB, see section regcomp), or if nmatch is 0, regexec ignores the pmatch argument (but see below for the case where REG_STARTEND is specified). Otherwise, pmatch should point to an array of nmatch structures of type regmatch_t. Such a structure has at least the members rm_so and rm_eo, both of type regoff_t (a signed arithmetic type at least as large as an off_t and a ssize_t), containing respectively the offset of the first character of a substring and the offset of the first character after the end of the substring. Offsets are measured from the beginning of the string argument given to regexec. An empty substring is denoted by equal offsets, both indicating the character following the empty substring.

When regexec returns, the 0th member of the pmatch array is filled in to indicate what substring of string was matched by the entire RE. Remaining members report what substring was matched by parenthesized subexpressions within the RE; member i reports subexpression i, with subexpressions counted (starting at 1) by the order of their opening parentheses in the RE, left to right. Unused entries in the array--corresponding either to subexpressions that did not participate in the match at all, or to subexpressions that do not exist in the RE (that is, i > preg->re_nsub)---have both rm_so and rm_eo set to -1. If a subexpression participated in the match several times, the reported substring is the last one it matched. (Note, as an example in particular, that when the RE `(b*)+' matches "bbb", the parenthesized subexpression matches the three `b's and then an infinite number of empty strings following the last `b', so the reported substring is one of the empties.)

If REG_STARTEND is specified in eflags, pmatch must point to at least one regmatch_t variable (even if nmatch is 0 or REG_NOSUB was specified in the compilation of the RE, REG_NOSUB, see section regcomp), to hold the input offsets for REG_STARTEND. Use for output is still entirely controlled by nmatch; if nmatch is 0 or REG_NOSUB was specified, the value of pmatch[0] will not be changed by a successful regexec.

Return Value

Normally, regexec returns 0 for success and the non-zero code REG_NOMATCH for failure. Other non-zero error codes may be returned in exceptional situations. The list of possible error return values is below:

REG_ESPACE

ran out of memory

REG_BADPAT

the passed argument preg doesn't point to an RE compiled by regcomp

REG_INVARG

invalid argument(s) (e.g., string + pmatch[0].rm_eo is less than string + pmatch[0].rm_so)

History

This implementation of the POSIX regexp functionality was written by Henry Spencer.

Bugs

regexec performance is poor. nmatch exceeding 0 is expensive; nmatch exceeding 1 is worse. regexec is largely insensitive to RE complexity except that back references are massively expensive. RE length does matter; in particular, there is a strong speed bonus for keeping RE length under about 30 characters, with most special characters counting roughly double.

The implementation of word-boundary matching is a bit of a kludge, and bugs may lurk in combinations of word-boundary matching and anchoring.

Portability

ANSI/ISO C No
POSIX 1003.2-1992; 1003.1-2001


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster   donations   bookstore     delorie software   privacy  
  Copyright 2004   by DJ Delorie     Updated Apr 2004