Message-Id: Comments: Authenticated sender is From: "Salvador Eduardo Tropea (SET)" Organization: INTI To: djgpp AT delorie DOT com Date: Mon, 22 Dec 1997 12:18:13 +0000 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Bad limitation in BNU Aligment Precedence: bulk /* Hi All: I think that's an strong limitation (bug?) in djgpp's BNU 2.8.1 (I think 2.7 is the same): Problem: BNU can't align to a 64 (or more) bits boundary. How to see it: Just ask for .align 3 or .balign 8 and you'll get only 32 bits aligment. (of course you can have enough luck to get 64 bits, not in my example/case) How it affects the code: My inline assembler shows that my CPU will execute some piece of code at 3 different speeds if I: 1) Align to 32 bits in the second "32 bits nibble" of a 64 bits boundary. 2) Align to 32 bits in the first "32 bits nibble" of a 64 bits boundary. 3) Align to 64 bits boundary. As an example one of my routines gives: 1) 367 ticks 2) 311 ticks 3) 300 ticks (22% of difference!!!) Note: I even suspect that in fact the right aligment for the Cx5x86 is 128 bits because the internal bus (cache to CPU) is 128 bits. How did I saw it: I was testing 3 versions of the routines and the speeds were totally crazy, routines better optimized reported worst speed. After figuring out that the speed was changing just commenting one of the routines I started to find what a hell was going on. The following code shows the missaligment: (Pepe==foo in my language ;-) */ #include int main(int argc, char *argv[]) { unsigned char *pp; asm (" .align 3 Pepe: movl $Pepe,%%eax " : "a="(pp) ); printf("%X (%d)\n",(unsigned)pp,((unsigned)pp) & 7); return 0; } /* The bug(?) is in the linker and not in AS. I tried .balign 16 in a .s and then I compiled it with as and finally decompiled it with objdump -d. The .o file is correctly aligned. But if I make an exe with this file the aligment is totally broken. Looking deeper I saw the source of the problem: *LD aligns to 32 bits when joins .o files* That's the problem, GAS starts all the .o files like if it will be start in the address 0 (full aligned for anything) BUT as ld aligns each .o file to 32 bits (like adding .align 2 at the end of the .s file) you can't get more than this. Now: Is there any way to configure that? (The problem is hard because it can destroys Pentium optimizations) Currently I'm using a workaround that is a little tricky: 1) I'm declaring all the functions that need aligment in a section (.setali) For that we need the section attributed enabled. 2) After each function that I send to this section I add a macro that is expanded to asm(".balign 16"); 3) I tweasted djgpp.djl to put the .setali section in the code segment and 128 bits aligned with respect to the last section inside the code. That works very well but needs a modified gcc and specials specs and djgpp.djl files (specs to be sure that ld isn't using the built-in script). As an advantage it wastes memory only in the special section and not in the whole program. SET */ ------------------------------------ 0 -------------------------------- Visit my home page: http://www.geocities.com/SiliconValley/Vista/6552/ Salvador Eduardo Tropea (SET). (Electronics Engineer) Alternative e-mail: set-sot AT usa DOT net - ICQ: 2951574 Address: Curapaligue 2124, Caseros, 3 de Febrero Buenos Aires, (1678), ARGENTINA TE: +(541) 759 0013