www.delorie.com/gnu/docs/guile/guile_198.html   search  
 
Buy GNU books!


Guile Reference Manual

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21.5.1 Regexp Functions

By default, Guile supports POSIX extended regular expressions. That means that the characters `(', `)', `+' and `?' are special, and must be escaped if you wish to match the literal characters.

This regular expression interface was modeled after that implemented by SCSH, the Scheme Shell. It is intended to be upwardly compatible with SCSH regular expressions.

Scheme Procedure: string-match pattern str [start]
Compile the string pattern into a regular expression and compare it with str. The optional numeric argument start specifies the position of str at which to begin matching.

string-match returns a match structure which describes what, if anything, was matched by the regular expression. See section 21.5.2 Match Structures. If str does not match pattern at all, string-match returns #f.

Two examples of a match follow. In the first example, the pattern matches the four digits in the match string. In the second, the pattern matches nothing.

 
(string-match "[0-9][0-9][0-9][0-9]" "blah2002")
=> #("blah2002" (4 . 8))

(string-match "[A-Za-z]" "123456")
=> #f

Each time string-match is called, it must compile its pattern argument into a regular expression structure. This operation is expensive, which makes string-match inefficient if the same regular expression is used several times (for example, in a loop). For better performance, you can compile a regular expression in advance and then match strings against the compiled regexp.

Scheme Procedure: make-regexp pat . flags
C Function: scm_make_regexp (pat, flags)
Compile the regular expression described by pat, and return the compiled regexp structure. If pat does not describe a legal regular expression, make-regexp throws a regular-expression-syntax error.

The flags arguments change the behavior of the compiled regular expression. The following flags may be supplied:

regexp/icase
Consider uppercase and lowercase letters to be the same when matching.
regexp/newline
If a newline appears in the target string, then permit the `^' and `$' operators to match immediately after or immediately before the newline, respectively. Also, the `.' and `[^...]' operators will never match a newline character. The intent of this flag is to treat the target string as a buffer containing many lines of text, and the regular expression as a pattern that may match a single one of those lines.
regexp/basic
Compile a basic ("obsolete") regexp instead of the extended ("modern") regexps that are the default. Basic regexps do not consider `|', `+' or `?' to be special characters, and require the `{...}' and `(...)' metacharacters to be backslash-escaped (see section 21.5.3 Backslash Escapes). There are several other differences between basic and extended regular expressions, but these are the most significant.
regexp/extended
Compile an extended regular expression rather than a basic regexp. This is the default behavior; this flag will not usually be needed. If a call to make-regexp includes both regexp/basic and regexp/extended flags, the one which comes last will override the earlier one.

Scheme Procedure: regexp-exec rx str [start [flags]]
C Function: scm_regexp_exec (rx, str, start, flags)
Match the compiled regular expression rx against str. If the optional integer start argument is provided, begin matching from that position in the string. Return a match structure describing the results of the match, or #f if no match could be found.

The flags arguments change the matching behavior. The following flags may be supplied:

regexp/notbol
Operator `^' always fails (unless regexp/newline is used). Use this when the beginning of the string should not be considered the beginning of a line.
regexp/noteol
Operator `$' always fails (unless regexp/newline is used). Use this when the end of the string should not be considered the end of a line.

 
;; Regexp to match uppercase letters
(define r (make-regexp "[A-Z]*"))

;; Regexp to match letters, ignoring case
(define ri (make-regexp "[A-Z]*" regexp/icase))

;; Search for bob using regexp r
(match:substring (regexp-exec r "bob"))
=> ""                  ; no match

;; Search for bob using regexp ri
(match:substring (regexp-exec ri "Bob"))
=> "Bob"               ; matched case insensitive

Scheme Procedure: regexp? obj
C Function: scm_regexp_p (obj)
Return #t if obj is a compiled regular expression, or #f otherwise.

Regular expressions are commonly used to find patterns in one string and replace them with the contents of another string.

Scheme Procedure: regexp-substitute port match [item...]
Write to the output port port selected contents of the match structure match. Each item specifies what should be written, and may be one of the following arguments:

The port argument may be #f, in which case nothing is written; instead, regexp-substitute constructs a string from the specified items and returns that.

The following example takes a regular expression that matches a standard YYYYMMDD-format date such as "20020828". The regexp-substitute call returns a string computed from the information in the match structure, consisting of the fields and text from the original string reordered and reformatted.

 
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
(define s "Date 20020429 12am.")
(define sm (string-match date-regex s))
(regexp-substitute #f sm 'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
=> "Date 04-29-2002 12am. (20020429)"

Scheme Procedure: regexp-substitute/global port regexp target [item...]
Similar to regexp-substitute, but can be used to perform global substitutions on str. Instead of taking a match structure as an argument, regexp-substitute/global takes two string arguments: a regexp string describing a regular expression, and a target string which should be matched against this regular expression.

Each item behaves as in regexp-substitute, with the following exceptions:

The example above for regexp-substitute could be rewritten as follows to remove the string-match stage:

 
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
(define s "Date 20020429 12am.")
(regexp-substitute/global #f date-regex s
  'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
=> "Date 04-29-2002 12am. (20020429)"


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003