From: buers AT gmx DOT de (Dieter Buerssner) Newsgroups: comp.os.msdos.djgpp Subject: Re: [long] gcc performance and possible bug Date: 9 Mar 2000 02:02:18 GMT Lines: 27 Message-ID: <8a70n9$34mhh$1@fu-berlin.de> References: <8a65uu$39fkt$1 AT fu-berlin DOT de> <38C6B414 DOT 2D67E404 AT inti DOT gov DOT ar> NNTP-Posting-Host: pec-44-99.tnt3.s2.uunet.de (149.225.44.99) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: fu-berlin.de 952567338 3299889 149.225.44.99 (16 [17104]) X-Posting-Agent: Hamster/1.3.13.0 User-Agent: Xnews/03.02.04 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com salvador) wrote: >K6 CPUs have a "bug" related to aligment. If some memory address is in a >0xNNNNNC, you'll have a big penalty to read it. 0, 4 and 8 are ok, >but C is the worst case (by far), double check you are not hiting this >limitation. Do you mean code alignment, data alignment or both? Anyway, I edited the gcc -O2 -S output of the slower running version of my program (with const), changed the .p2align 2 statements to .p2align 4 (16 byte), for zseed, mul and mwc32 (I think these are all data and code alignments that could contribute to the large performance difference), and recompiled. The program was ran faster, but there was still an order of magnitude difference between the const and the non const version. I also double checked the alignments with fsdb and objdump (Thanks to Hans-Bernhard Broeker, for pointing the objdump method out to me). zseed, mul and mwc32 were 16 byte aligned. If you have the time and the interest, please try to compile the source I sent and run the executable. It should take less than five minutes. Regards, Dieter