To: djgpp AT sun DOT soe DOT clarkson DOT edu Subject: Program fast w/BC++, slow w/DJGPP Date: Mon, 21 Nov 94 18:05:53 +0200 From: "Eli Zaretskii" SHORT STORY: How many reasons can you think of that will cause a CPU-intensive program run twice as fast when compiled with Borland's C as compared to the same code compiled with DJGPP 1.12maint2? I specifically ask for *any* reason you can possibly think of, because I have such a program and, after testing every cause I could think of, I'm out of reasons. LONG STORY: I have a program which was originally written and compiled in Borland C++ 3.1. Recently, I was asked by a friend who wrote it to compile it under GCC (he wants more memory for a hash table his program uses). My problem is that when compiled with Gnu C++, the program runs only about half as fast as the BC++ version on the same machine. I've tried several things in the hope I understand who is the culprit (see below), but couldn't find anything worth mentioning. So I'm totally confused. If anybody can suggest new ideas, I would be grateful. This is a chess-playing program. From what I've seen, it is quite CPU-bound; most of the time it just computes possible moves and checks their scores. From time to time it writes short (~10 chars) messages to the screen (with cprintf()), and once every move it writes a line to a logfile. Other than this, it doesn't do anything I can think of which would require a switch to real mode. Does anybody know reasons other than file I/O which will cause a mode switch? The measure of the program's performance is the number of moves it considers per second; as I said, this is roughly half as large for DJGPP-compiled program as for the BC++ one. Matches typically take at least 5 minutes, so we are *not* talking about loosing several seconds here and there. I use the -O3 -funroll-loops optimization switches. The -O3 is because the program defines several inline functions, and I understand only -O3 actually performs the inlining. I've run the profiler and found the histogram to be fairly flat: the most expensive function takes about 15% of run time. None of the library functions appear in the profile anywhere near the beginning, so the library is not the culprit. The program is written in C++, but it doesn't use any classes but its own, so the class library supplied with the compiler cannot be the reason for this. There is one thing I cannot accept as an assumption: that GCC can produce code so much slower than BCC, for a program which mostly needs CPU. This is based on some experience, not only on ideology. If anybody out there knows about some circumstances where such a lossage is possible, let him speak now. I tried different combinations of other optimization-related switches, but none produced any significant effect. I can't say I've tested all the switches which might be relevant, so if you have a list of such switches to test, go ahead and tell me, even if the list is long. I also thought that the GCC-compiled program might be a little larger, so it just happens to not fit in the CPU (L1) or secondary (L2) cache, so I've compiled the GCC version with all int's #define'd to be shorts (that's what BCC does)--nothing happened.