To: djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: Program fast w/BC++, slow w/DJGPP
Date: Mon, 21 Nov 94 18:05:53 +0200
From: "Eli Zaretskii" <eliz AT is DOT elta DOT co DOT il>

SHORT STORY:

How many reasons can you think of that will cause a CPU-intensive
program run twice as fast when compiled with Borland's C as compared
to the same code compiled with DJGPP 1.12maint2?  I specifically ask
for *any* reason you can possibly think of, because I have such a
program and, after testing every cause I could think of, I'm out
of reasons.

LONG STORY:

I have a program which was originally written and compiled in Borland
C++ 3.1.  Recently, I was asked by a friend who wrote it to compile it
under GCC (he wants more memory for a hash table his program uses).  My
problem is that when compiled with Gnu C++, the program runs only about
half as fast as the BC++ version on the same machine.  I've tried
several things in the hope I understand who is the culprit (see below),
but couldn't find anything worth mentioning.  So I'm totally confused.
If anybody can suggest new ideas, I would be grateful.

This is a chess-playing program.  From what I've seen, it is quite
CPU-bound; most of the time it just computes possible moves and checks
their scores.  From time to time it writes short (~10 chars) messages
to the screen (with cprintf()), and once every move it writes a line to
a logfile.  Other than this, it doesn't do anything I can think of which
would require a switch to real mode.  Does anybody know reasons other
than file I/O which will cause a mode switch?

The measure of the program's performance is the number of moves it
considers per second; as I said, this is roughly half as large for
DJGPP-compiled program as for the BC++ one.  Matches typically
take at least 5 minutes, so we are *not* talking about loosing several
seconds here and there.

I use the -O3 -funroll-loops optimization switches.  The -O3 is because
the program defines several inline functions, and I understand only -O3
actually performs the inlining.  I've run the profiler and found the
histogram to be fairly flat: the most expensive function takes about
15% of run time.  None of the library functions appear in the profile
anywhere near the beginning, so the library is not the culprit.  The
program is written in C++, but it doesn't use any classes but its own,
so the class library supplied with the compiler cannot be the reason
for this.

There is one thing I cannot accept as an assumption: that GCC can produce
code so much slower than BCC, for a program which mostly needs CPU.  This
is based on some experience, not only on ideology.  If anybody out there
knows about some circumstances where such a lossage is possible, let him
speak now.

I tried different combinations of other optimization-related switches, but
none produced any significant effect.  I can't say I've tested all the
switches which might be relevant, so if you have a list of such switches
to test, go ahead and tell me, even if the list is long.

I also thought that the GCC-compiled program might be a little larger, so
it just happens to not fit in the CPU (L1) or secondary (L2) cache, so
I've compiled the GCC version with all int's #define'd to be shorts
(that's what BCC does)--nothing happened.