Sender: crough45 AT amc DOT de
Date: Thu, 3 Apr 1997 09:03:15 +0100
From: Chris Croughton <crough45 AT amc DOT de>
Mime-Version: 1.0
To: jestandi AT cs DOT indiana DOT edu
Cc: djgpp AT delorie DOT com
Subject: Re: How is switch implemented?
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <97Apr3.090043gmt+0100.21909@internet01.amc.de>


Jeff Standish wrote:

> I'm assuming that most intelligent compilers would try to use an jump
> table, but I don't know for certain, and I'm not in the mood to compile,

Not necessarily.  Most compilers will make a decision based on the
size and distribution of the cases in the switch.  It's very often 
faster and/or smaller to use a series of compare/jump instructions
if the number of cases is small, especially if the range is large.  
For instance:

  switch (x)
  {
    case 5:
      ...
      break;
    case 576:
      ...
      break;
  }

If implemented as a jump table, not only would you need a 572 word
lookup table but you also need comparisons for under 5 and over 576
as well as the indexing code, making it much bigger and slower than
just comparing the two values.  With more values it might be more
efficient to have a sorted lookup table of the values and use a binary
chop on them, or a hashing algorithm.  The "break even" point depends
on the target machine architecture and instruction timings, of course,
which is why some code compiled for 486 processors will run more
slowly on a Pentium than on their intended target (extreme cases, but
it can happen).

Which method is used also depends on the optimisation options.  I
know Visual C and Borland C each have options to optimise for speed
or size, and they will give different results.  Some compilers even
allow you to specify which optimisations to perform on a more detailed
level, so you can select what's best for your program (and on a per
module basis).

Some older compilers will give different results depending on the order
of the case values as well, generating more efficient code if they are 
already sorted, but all modern optimisers should do the sorting 
themselves...

> disassemble, and try to read the machine code.  Besides, just because
> this might happen for a trivial test program doesn't mean it will also
> work for 40+ cases spread out over several hundred lines of code (which
> is the minimum of what I expect to write).

That last point is the real problem - you don't have to disassemble
because all compilers I know will generate assembler source (-S for
gcc/DJGPP, other flags for other compilers so look in the documentation
for the compiler).  I think the best thing for you to do (and this goes
for time/space critical applications in general) is to write the code
and then do performance analysis (DJGPP and Borland, at least, have a
'profiler' which can help with this) to find the parts of your code 
which are slowest.  Having identified those then you can rewrite them
to be more efficient, in assembler if necessary...

Hope that helps.

Chris