www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1998/09/17/15:00:23

From: "John S. Fine" <johnfine AT erols DOT com>
Newsgroups: comp.os.msdos.djgpp
Subject: Optimizations
Date: Thu, 17 Sep 1998 14:34:13 -0400
Lines: 58
Message-ID: <36015625.62D2@erols.com>
Reply-To: johnfine AT erols DOT com
NNTP-Posting-Host: 207-172-241-249.s58.as8.bsd.erols.com
Mime-Version: 1.0
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

I am working on some code that needs to be fast and
fairly small on a 486.  The system has slow dram and no
L2 cache, so in many cases small will be the best way
to achieve fast (fit more in L1 cache).

  While debugging I noticed many place where gcc has
generated crude code that is both larger and slower
than I would have expected.

  I am using -O2 and no other optimization switches.
Are there other switches that are appropriate to this
project?

  I am using gcc 2.7.2.1.  Would a newer version produce
better 486 code, or do the improvements just help
Pentium+ CPUs?

  The C code needs to be very portable, but I am only
worried about the performance of the gcc based version.
I could use conditionals to support nonportable
optimizations for the gcc version, but I really want
to avoid confusing other people who must look at the
C code.

  My code frequently has expressions of the form
( A << ( (B) & 31 ) )  where B is a subexpression.
GCC always computes B in some poorly chosen register,
then moves it to ecx, ANDs cl with 0x1F and then
does the shift.

  On an x86 there is no need to AND cl with 0x1F before
a shift.  The CPU only uses the low five bits of cl for
the shift anyway.  However, I can't remove the "& 31"
from the source code and have it still be portable.

  I understand that gcc includes templates that control
the generation of instructions.  Can a template describe
something like ( A << (B & 31) )?  How hard would it be
for me (I have never recompiled any part of djgpp) to
add that template and recompile?

  GCC also adds NOPs to align many branch targets to
dword boundaries.  In my project, that usually slows
the code down, because the harm done by extra cache
misses outweighs the benefits of aligning.  Can I
individually turn off optimizations like that while
generally optimizing for speed rather than space?

  The most common form of bad code seems to be computing
a value in one register and then moving it to the register
where it is needed.  In all these cases, there was nothing
preventing it from computing the value in the correct
register to begin with.  Are there any options to make it
spend more time during compilation thinking about register
selection, so it won't get those wrong?
-- 
http://www.erols.com/johnfine/
http://www.geocities.com/SiliconValley/Peaks/8600/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019