www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1994/11/22/07:50:28

To: djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: Re: Program fast w/BC++, slow w/DJGPP
Date: Tue, 22 Nov 94 09:54:36 +0200
From: "Eli Zaretskii" <eliz AT is DOT elta DOT co DOT il>

Thanks to all of you who answered to my posting.  I don't have
any real solution for now, just a few thoughts about some of
the things you suggested.

DJ Delorie <dj AT stealth DOT ctron DOT com> writes:

DJ> Few programs fit in the 8K L1 cache.

I also think this program doesn't fit in L1 cache.

DJ>                                     Paging might be the culprit, so
DJ> set "GO32=topline" to watch the activity.  The R/P in the upper left
DJ> tells you if the CPU is in real or protected mode (non-dpmi only).  In
DJ> your case, it should stay at "P".  You can also see paging activity in
DJ> the upper right corner.

Well, I don't think paging is the problem here.  I forgot to tell you
that the original program was compiled under *small* memory model, and
the only data item which doesn't fit inside the 64K data segment is
the hash table (which was the reason for moving to DJGPP).  One of the
things I tried was to define a small hash table, so it would mimick
the one which is used in the real-mode program, but that didn't change
anything.
I did use the topline switch, and from what I remember it almost always
stays at "P", with occasional "R" flickers, which I suppose are the
one-liners it prints to the logfile every 3 seconds or so.  What
exactly should I see on the topline when something is paged out?

DJ> Also, try disabling interrupts during the computes.  Although this
DJ> messes up the clock, interrupts are more expensive in protected mode
DJ> than in real mode (well, slightly, and there aren't that many of
DJ> them).

Indeed.  What other interrupts there are except the clock and the disk?
Btw, SmartDrive is installed, so actual disk activity is very low.

DJ> Also, kbhit() might be more expensive than you think.

The program doesn't use kbhit().  The version I timed is just a non-
interactive variant whereby the machine plays against itself.

DJ> Check
DJ> the call profile for functions that take up little CPU but get called
DJ> a lot.

The profile I was talking about is the cumulative part of the gprof
output, so the *total* time spent in any single routine is no more
than 15%.

DJ> Also, short math is slower than long math due to the operand override
DJ> prefix codes.  However, long constants are lower than short constants
DJ> if they're not aligned.  GCC should automatically align them.

The program uses ``int'', not short, which is 32 bit in GCC.  I just
tried once to make them shorts, but that didn't help.

DJ> Also, check the gcc info pages for the -f* options to gcc; there may
DJ> be some that -O3 misses that are useful.  The -fomit-frame-pointer

The info pages tell me only -funroll-loops aren't included.  I don't
runt under DPMI, and the program doesn't use floating point, so am I
to understand I can try using -fomit-frame-pointer ?

Aaron Ucko <UCKO AT VAX1 DOT ROCKHURST DOT EDU> writes:

AU> anything requiring a real-mode interrupt to be called (which includes all
AU> I/O except for direct screen writes)

There is no I/O except a single line written once every 3 seconds or so.
Do you really think this can cause a 50% degradation in speed?

AU> Not necessarily; I believe profiling only sees how much time each function
AU> spends in protected mode.

This is true, but the program seems to actuually be in protected mode
most of the time.

Thanks again to all who replied.  I'll keep you posted about any
significant findings.  This bugs me a lot.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019