www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/11/29/21:24:58

From: Paul Shirley <Paul AT chocolat DOT foobar DOT co DOT uk>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Optimization
Date: Sat, 30 Nov 1996 00:09:46 +0000
Organization: DrinkSoft
Lines: 31
Distribution: world
Message-ID: <vJlFwEAKt3nyEwc2@chocolat.foobar.co.uk>
References: <57hg9b$or5 AT kannews DOT ca DOT newbridge DOT com>
<329C95AD DOT C3E AT silo DOT csci DOT unt DOT edu> <57k531$5bu AT kannews DOT ca DOT newbridge DOT com>
<329E319E DOT 2A82 AT gbrmpa DOT gov DOT au>
NNTP-Posting-Host: chocolat.foobar.co.uk
Mime-Version: 1.0
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

In article <329E319E DOT 2A82 AT gbrmpa DOT gov DOT au>, Leath Muller
<leathm AT gbrmpa DOT gov DOT au> writes
>> 
>> Well, my logic is this: You have to move 2x as much data around; this
>> means your L1 cache fills up 2x as fast. This is not good.
>
>Hmmm...I would think that if about 20 people told me one thing, and I
>was the only person considering the other, that I would be wrong... ;)
>
        <cache load description snipped>

There are a lot of reasons to *not* use short ints on P5 and up.
The instruction prefix problem has been mentioned (the prefix is also
unpairable, for a worst case 4* slow down, 1 clk prefix, .5 clock
unpaired op before the prefix, unpaired 16 bit op instead of 2 paired
32/8 bit ops) Its usually worse in fact because the 16 bit load will
often be promoted to 32 bit, which is itself a slow operation! gcc
mitigates this by not implementing P5 instruction scheduling ;)

*However* in real programs cache fill times can far outweigh any of
this. Remember, it can take 70+ clks to do a cache line fetch from main
memory, even a L2->L1 cache fill can be significant. If so fetching
shorts halves the number of cache fills and can pay.

You should use the native compiler 'int' type *unless* you have reason
to believe there will be excessive L1 cache thrashing. Then you should
check wether switching to shorts for arrays is faster. If so take steps
to force type widening at load time, not implicitly during calculations.

-- 
Paul Shirley

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019