www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1994/08/17/08:07:38

To: Olly Betts <olly AT mantis DOT co DOT uk>
Cc: djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: Re: Speed tuning programs
Date: Wed, 17 Aug 94 12:42:05 +0300
From: eliz AT is DOT elta DOT co DOT il

> I haven't tried profiling the code recently, so it might be worth doing
> again.  However, this would probably just reduce the times for both
> versions.

Not necessarily true.  The libraries and the code generation of the two
compilers (BC and GCC) are quite different, so what's a hot spot in one
version, doesn't have to be such in another.  For example, imagine that
some specific library function is much more efficient for one of the
compilers, and this very function is used in the innermost loop of
your program.

GCC should be much more efficient for long int (i.e. 32-bit) arithmetics,
and especially for working with large buffers (arrays) where in BC you
use far pointers (compact, large or huge memory models).  GCC will
enable you to make such pointers register variables, whereas BC must
access memory (twice) for each reference of these.

On the other hand, library functions which move buffers, such as
strcpy(), memcpy(), memset(), memmove() are inlined by BC under
-O2, which GCC does not.  Also, in BC these work by moving 16-bit
words, whereas memcpy() which comes with DJGPP moves bytes.  If you
have such calls, you're better off using movedata() which moves
32-bit double-words.

So you see, profiling could indeed tell you something different
about each of the versions.  You might find that rewriting
a single library function as an in-line assembly function is all
you need.

> There are 32 input files read in with a total size of 54316 bytes.  4
> output files are produced, total size 158695 bytes.  Pretty small
> really.

This means your problem is *not* in the I/O.  So I would concentrate
on the above issues of code efficiency, for which profiling is the
way to start.

> There is no software disk cache running, as the machine has a fairly
> good caching disk controller card, so you don't gain anything.

I would try using software cache anyway.  The cache which sits on the
controller has a disadvantage of talking to the PC via relatively
slow AT bus, whereas software cache typically has about 10 times
faster access to system RAM.  So, unless you have many megabytes
of cache on the controller *and* bus-mastering controller on an
EISA or PCI bus, hardware cache will always loose.  Apparently,
this issue has nothing to do with your run-time problem, but
it certainly will improve compilation time.

	Eli Zaretskii

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019