To: djgpp AT sun DOT soe DOT clarkson DOT edu Subject: Re: Program fast w/BC++, slow w/DJGPP Date: Tue, 22 Nov 94 09:54:36 +0200 From: "Eli Zaretskii" Thanks to all of you who answered to my posting. I don't have any real solution for now, just a few thoughts about some of the things you suggested. DJ Delorie writes: DJ> Few programs fit in the 8K L1 cache. I also think this program doesn't fit in L1 cache. DJ> Paging might be the culprit, so DJ> set "GO32=topline" to watch the activity. The R/P in the upper left DJ> tells you if the CPU is in real or protected mode (non-dpmi only). In DJ> your case, it should stay at "P". You can also see paging activity in DJ> the upper right corner. Well, I don't think paging is the problem here. I forgot to tell you that the original program was compiled under *small* memory model, and the only data item which doesn't fit inside the 64K data segment is the hash table (which was the reason for moving to DJGPP). One of the things I tried was to define a small hash table, so it would mimick the one which is used in the real-mode program, but that didn't change anything. I did use the topline switch, and from what I remember it almost always stays at "P", with occasional "R" flickers, which I suppose are the one-liners it prints to the logfile every 3 seconds or so. What exactly should I see on the topline when something is paged out? DJ> Also, try disabling interrupts during the computes. Although this DJ> messes up the clock, interrupts are more expensive in protected mode DJ> than in real mode (well, slightly, and there aren't that many of DJ> them). Indeed. What other interrupts there are except the clock and the disk? Btw, SmartDrive is installed, so actual disk activity is very low. DJ> Also, kbhit() might be more expensive than you think. The program doesn't use kbhit(). The version I timed is just a non- interactive variant whereby the machine plays against itself. DJ> Check DJ> the call profile for functions that take up little CPU but get called DJ> a lot. The profile I was talking about is the cumulative part of the gprof output, so the *total* time spent in any single routine is no more than 15%. DJ> Also, short math is slower than long math due to the operand override DJ> prefix codes. However, long constants are lower than short constants DJ> if they're not aligned. GCC should automatically align them. The program uses ``int'', not short, which is 32 bit in GCC. I just tried once to make them shorts, but that didn't help. DJ> Also, check the gcc info pages for the -f* options to gcc; there may DJ> be some that -O3 misses that are useful. The -fomit-frame-pointer The info pages tell me only -funroll-loops aren't included. I don't runt under DPMI, and the program doesn't use floating point, so am I to understand I can try using -fomit-frame-pointer ? Aaron Ucko writes: AU> anything requiring a real-mode interrupt to be called (which includes all AU> I/O except for direct screen writes) There is no I/O except a single line written once every 3 seconds or so. Do you really think this can cause a 50% degradation in speed? AU> Not necessarily; I believe profiling only sees how much time each function AU> spends in protected mode. This is true, but the program seems to actuually be in protected mode most of the time. Thanks again to all who replied. I'll keep you posted about any significant findings. This bugs me a lot.