Date: Wed, 19 Jan 2000 11:31:06 +0200 (IST) From: Eli Zaretskii X-Sender: eliz AT is To: Dieter Buerssner cc: djgpp AT delorie DOT com Subject: Re: gcc optimization (Was: Executable size: limit to acceptability?) In-Reply-To: <8623d1$26n4p$1@fu-berlin.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp AT delorie DOT com Errors-To: dj-admin AT delorie DOT com X-Mailing-List: djgpp AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On 18 Jan 2000, Dieter Buerssner wrote: > My CPU is AMD K6-2 266. I don't know anything about K6. AFAIK, GCC's code is optimized towards Intel's recommendations; I don't know how well these fit K6. > gcc 2.9.2: flags -fomit-frame-pointer -ffast-math + indicated flags You mean 2.95.2, right? > -On -mcpu=k6 -On -march=k6 > -O 86383 92070 92070 > -O2 85852 86966 87009 > -O3 81476 89791 89814 > -O6 81421 89833 89818 > > In all three cases -O produces the fastest code. The differences are small enough to be explained by alignment. I suggest to look at the code (disassemble inside a debugger) and see how many targets of jmp and call instructions are misaligned. Intel recommends them to be aligned on 16-byte boundaries, unless they are more than 7 bytes far from this boundary. GCC 2.95.2 emits the correct alignment directives (.balign 16,,7), but your Binutils mess that up, because each .o file is aligned on 4-byte boundary instead of 16-byte. In effect, you are disrupting the CPU's prefetch queues, which can have significant effect on performance. > The produced code runs slower than code produced with gcc 2.6.3! > The same was true for my old 486 66 and 386SX when comparing > newer versions of gcc with 2.6.3. You need to experiment with more optimization options than just -mcpu and -march. GCC has lots of different optimization options, and -O2 turns on almost all of them; you should try to selectively turn on only some of them. Section 14.2 of the FAQ refers to this, although it's probably not up-to-date yet with the latest GCC releases. Also, GCC tries very hard to align the stack on 8-byte boundary, and that causes it to emit a lot of stack-alignment instructions (subl %esp, 4 etc.). This could lose big time if your program doesn't need this alignment. I suggest to experiment with the alignment-related options. > My conclusion is, to useally use -O only, and to still have > an old version of gcc around. There's nothing wrong with this conclusion, but I think there's lots more to check before this conclusion is general enough.