Date: Thu, 3 Feb 2000 13:46:59 +0100 From: Jan Hubicka To: pgcc AT delorie DOT com Subject: Re: pgcc and egcs alignment -- function, basic block and string Message-ID: <20000203134659.G12247@atrey.karlin.mff.cuni.cz> References: <38921CD6 DOT 2A725779 AT ix DOT netcom DOT com> <20000129032101 DOT A25630 AT atrey DOT karlin DOT mff DOT cuni DOT cz> <20000129143245 DOT F1024 AT cerebro DOT laendle> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20000129143245.F1024@cerebro.laendle>; from marc@gimp.org on Sat, Jan 29, 2000 at 02:32:45PM +0100 Reply-To: pgcc AT delorie DOT com Errors-To: dj-admin AT delorie DOT com X-Mailing-List: pgcc AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk > On Sat, Jan 29, 2000 at 03:21:01AM +0100, Jan Hubicka wrote: > > It is. Consider memset/memcpy/strlen expanders. These can work > > much better when they know that destination is word size aligned. > > source you meant ;) Well, currently the expanders align destination, except for the strlen (were destination does not make sense. > > Again Intel Optimizing Manual recommends this. I believe Intel did some experiments > > Well, doing some experiments yourself (as you said) does not hurt > either (if you have a pentium). While intel recommends relatively large > alignments, "common knowledge" (linus for example ;) recommends no > alignments at all. > > In _my_ tests large alignment is a very very slight win, but in the real > world, the increased code size might not be worth it (cache effect, long > nops, AGI because of lea-nops). > > It's a must on 486, though, and a bit better on ppro and later. Yes, thats problem. I've measured small wins in my benchmarks for alignments. But problem is that changing the policy needs some extensive testing to prove that Intel is wrong. I think I don't even have enought knowedge and time to do that. The current alignment scheme (4,,7) don't do so much padding at the average (it is 29/16 = 1.81 bytes at average if I am not mistaken) so I believe it is not too expensive. Problem is with chained alignments (where one alignment forces another one). This happends in switches, where code is slightly larger than 7 bytes, loops with tiny internal loops, where padding of internal loop hurts etc. I've made an experiment with code shortening alignments that are too large (i.e when loop or block they precede is shorten that suggested alignment, it is shortened) and I've got very good results with even more aggresive alignments with this optimization on AMD-Athlon (but it does have large cache sizes, so the situation on Pentiums may be different). The code is available in the egcs mailing list archives under "shorten alignments" keyword. Other experiment I did is on K6 (that is very touchy about too large alignments) where I've implemented strategy of aligning loops at least two instructions from the boundary (to not stall decoding). This patch is also available in the mailing lists somewhere around August and brings interesting improvements on that platform. Possibly similar strategy can apply to PPro/Athlon as well, but I don't know the exact penalties for decoding near end of boundary there. Also egcs now have fresh code for static branch prediction. I would like to extend it to be able to predict number of executions for each basic block (and thus number of repetitions of loops). (this can be done easily when we are driven by profiler output, but I am not sure how to implement it using the prediction algorithms. I may use markov chains to count expected number of iterations from branch probabilities gcc generate, but I doubt it is usefull. References to papers and other docs are welcomed. At the end I would like to converge to flowgraph with "expected number of repetitions" values for each edge and basic block. ) This may be then used to decide whteher alignment of the basic block is good idea or not and emit much fewer alignments that we do currently. Honza > > -- > -----==- | > ----==-- _ | > ---==---(_)__ __ ____ __ Marc Lehmann +-- > --==---/ / _ \/ // /\ \/ / pcg AT opengroup DOT org |e| > -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ > The choice of a GNU generation | > |