Mailing-List: contact cygwin-help@sourceware.cygnus.com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin@sources.redhat.com>
List-Help: <mailto:cygwin-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner@sources.redhat.com
Delivered-To: mailing list cygwin@sources.redhat.com
From: "Zack Weinberg" <zackw@stanford.edu>
Date: Fri, 8 Jun 2001 09:59:32 -0700
To: Eli Zaretskii <eliz@is.elta.co.il>
Cc: dj@redhat.com, gcc@gcc.gnu.org, gdb@sources.redhat.com,
        binutils@sources.redhat.com, cygwin@sources.redhat.com
Subject: Re: Another RFC: regex in libiberty
Message-ID: <20010608095932.S979@stanford.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <9003-Fri08Jun2001100651+0300-eliz@is.elta.co.il>
User-Agent: Mutt/1.3.18i

On Fri, Jun 08, 2001 at 10:06:51AM +0300, Eli Zaretskii wrote:
> 
> One notorious problem with GNU regex is that it is quite slow for many
> simple jobs, such as matching a simple regular expression with no
> backtracking.  It seems that the main reason for this slowness is the
> fact that GNU regex supports null characters in strings.  For
> examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower
> on simple jobs than the same Sed compiled with Spencer's regex
> library.

I think the null characters are a red herring.  I looked into GNU
regex's performance in the context of GCC's fixincludes program, last
year.  On a platform that has mostly-okay headers, fixincludes spends
most of its time matching regular expressions.

The regex.c that came with GDB 4.18, which I think is the one that got
spread around widely, had a bug in its implementation of the POSIX
regcomp/regexec interface, which caused a major performance hit.  That
bug has been fixed in GNU libc for a long time.  When I replaced
fixincludes' copy of regex.c with a more recent version from glibc,
fixincludes was sped up by a factor of nine.  That same bug affects
Sed 3.02 - replace the regex.c it ships with with the one from glibc
2.2.x and I bet you'll see better performance.

There's some discussion in these messages:

http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00764.html
http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00765.html

The relevant fix is in there, too, if you want to pull it out and
apply it.

I did some benchmarking of fixincludes with Spencer's regexp library
as well.  IIRC, it was about the same as the fixed GNU regex.c.

-- 
zw        This is, no doubt, the rational strategy; quite possibly the
          only one that will work.  But it ignores the exigiencies of
          the tenure system and is therefore impractical.
          	-- Jerry Fodor, _The Mind Doesn't Work That Way_

--
Want to unsubscribe from this list?
Check out: http://cygwin.com/ml/#unsubscribe-simple

