www.delorie.com/gnu/docs/regex/regex_52.html   search  
Buy the book!


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.1.8 Using Registers

A group in a regular expression can match a (posssibly empty) substring of the string that regular expression as a whole matched. The matcher remembers the beginning and end of the substring matched by each group.

To find out what they matched, pass a nonzero regs argument to a GNU matching or searching function (see section 7.1.3 GNU Matching and 7.1.4 GNU Searching), i.e., the address of a structure of this type, as defined in `regex.h':

struct re_registers
  unsigned num_regs;
  regoff_t *start;
  regoff_t *end;

Except for (possibly) the num_regs'th element (see below), the ith element of the start and end arrays records information about the ith group in the pattern. (They're declared as C pointers, but this is only because not all C compilers accept zero-length arrays; conceptually, it is simplest to think of them as arrays.)

The start and end arrays are allocated in various ways, depending on the value of the regs_allocated field in the pattern buffer passed to the matcher.

The simplest and perhaps most useful is to let the matcher (re)allocate enough space to record information for all the groups in the regular expression. If regs_allocated is REGS_UNALLOCATED, the matcher allocates 1 + re_nsub (another field in the pattern buffer; see section 7.1.1 GNU Pattern Buffers). The extra element is set to -1, and sets regs_allocated to REGS_REALLOCATE. Then on subsequent calls with the same pattern buffer and regs arguments, the matcher reallocates more space if necessary.

It would perhaps be more logical to make the regs_allocated field part of the re_registers structure, instead of part of the pattern buffer. But in that case the caller would be forced to initialize the structure before passing it. Much existing code doesn't do this initialization, and it's arguably better to avoid it anyway.

re_compile_pattern sets regs_allocated to REGS_UNALLOCATED, so if you use the GNU regular expression functions, you get this behavior by default.

xx document re_set_registers

POSIX, on the other hand, requires a different interface: the caller is supposed to pass in a fixed-length array which the matcher fills. Therefore, if regs_allocated is REGS_FIXED the matcher simply fills that array.

The following examples illustrate the information recorded in the re_registers structure. (In all of them, `(' represents the open-group and `)' the close-group operator. The first character in the string string is at index 0.)

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003