Date: Sun, 30 Jan 2000 01:14:44 +0100 From: Jan Hubicka To: Chris Sears Cc: pgcc AT delorie DOT com Subject: Re: pgcc and egcs alignment -- function, basic block and string Message-ID: <20000130011444.A32728@atrey.karlin.mff.cuni.cz> References: <38921CD6 DOT 2A725779 AT ix DOT netcom DOT com> <20000129032101 DOT A25630 AT atrey DOT karlin DOT mff DOT cuni DOT cz> <38927310 DOT 2033EED4 AT ix DOT netcom DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <38927310.2033EED4@ix.netcom.com>; from cbsears@ix.netcom.com on Fri, Jan 28, 2000 at 08:56:48PM -0800 Reply-To: pgcc AT delorie DOT com Errors-To: dj-admin AT delorie DOT com X-Mailing-List: pgcc AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk > > Is there a switch to turn this alignment off so that I could test it? > -mcode-align? Or does this turn off alignment of entry points > as well? There are switches -malign-jump and -malign-loops, that may do what do you want. > > > > In pgcc strings are being aligned to cache lines. > > > But is alignment even necessary for strings? > > It is. Consider memset/memcpy/strlen expanders. These can work > > much better when they know that destination is word size aligned. > > I didn't quite understand this. The string alignment now is to a > cache line. > > .file "ioport.c" > .version "01.01" > gcc2_compiled.: > .section .rodata > .LC0: > .string "eip: %p\n" > .align 32 > .LC1: > .string "/home/chris/linux/include/asm/spinlock.h" > > Admittedly, a cache line is word aligned as well, > but wouldn't .align 4 suffice to align to a word boundary? Yes. We was discussing this recently with Richard and we probably will change this bit. The rationale behind is to place string into as few cache lines as possible. (when the string starts near end of cache line, it may go cross one extra). But this needs some tunning. > > If possible could you send me email telling me what happened. I didn't had time to test it yet (I am preparing for exam and I've dopne other 100Kb patch to function calling code ehh..)I will do that tommorow :) and let you know. > > > > So in summary, I think that functions should be aligned to cache lines > > > and that basic blocks and strings should not be aligned at all. > > Gcc don't align every basic block. It uses alignments for top of loops, where > > the alignment to ifetch block is necesary. Top of loop appearing at the very > > end of ifetch blocks may cause stalls in the decoding process IMO. > > Second alignment is dont after barriers, where situation is in many points > > of view equivalent to function entry point. > > The .p2align 4,,7 is deceptively misleading. It could probably be better > read as .align 8 as the 7 represents a limit of 7 nops, which gas usually > replaces with a do nothing leal and a nop. > > So given that this can happen in four cases in a 32 byte cache line: > > bytes 0-7 + 7 gets aligned to bytes 7-15 -- alignment not done > bytes 8-15 + 7 gets aligned to 16 -- alignment to 16 > bytes 16-23 + 7 gets aligned to 23-31 -- alignment not done > bytes 24-31 + 7 get aligned to 32 -- alignment to 32 > > So half of the time it isn't being aligned anyways. In the second case, > it seems a waste since the icache line will be in the buffer. No point. > In the fourth case, I can see a point, especially if there is an jmp > instruction and no nops will be executed. The second case bring similar sppedups to the fourth case at least on the CPUs without of order execution. I will do some tests how does this behave on Pentium. But I believe that code starting near the end of 16byte prefetch buffer will cause stalls too. > > > Aligning to 16 byte boundary can be quite good tradeoff between code size > > and cache line fetching effecienty. While function starting near end of > > cache line is catastrophical, function starting in the middle of it is not > > so bad. > > Again Intel Optimizing Manual recommends this. I believe Intel did some experiments > > before saying so. > > 16 byte alignment for functions trades memory against cache footprint. > I would strongly prefer cache and I would urge someone to look at this. > In this case, I wouldn't take Intel's word. The problem is, that you need to align function to the largest alignment used in the code. If you use 32 byte boundary based alignment for loops, you need 32 byte alignment for functions as well. Gas puts alignment of whole section based on the .align directive at the start and other alignments are done relative to this value. At least this is my understanding of thinks. I may be mistaken in this case. Honza