Date: Fri, 11 May 2001 19:53:17 +0300 From: "Eli Zaretskii" Sender: halo1 AT zahav DOT net DOT il To: Michiel de Bondt Message-Id: <3277-Fri11May2001195316+0300-eliz@is.elta.co.il> X-Mailer: Emacs 20.6 (via feedmail 8.3.emacs20_6 I) and Blat ver 1.8.9 CC: djgpp AT delorie DOT com In-reply-to: <3AFBF8AB.C42331EB@sci.kun.nl> (message from Michiel de Bondt on Fri, 11 May 2001 16:35:23 +0200) Subject: Re: how to use inline push and pop References: <3AFBF8AB DOT C42331EB AT sci DOT kun DOT nl> Reply-To: djgpp AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk > From: Michiel de Bondt > Newsgroups: comp.os.msdos.djgpp > Date: Fri, 11 May 2001 16:35:23 +0200 > > > > There's a popular belief that recursive code is terribly slow, but > > experience shows that this is mostly a myth. Recursive code _might_ be > > slow, but in many cases it isn't. Because recursive code is usually > > smaller, it fits better into the CPU caches. It is also simpler, so you > > have less probability for bugs, and it lends itself better to compiler > > optimizations. > > > > I have once seen the opposite: faster recursive code.. Yes, that's what I was saying as well. > What do you mean with profile? Fine-tuning the code within the > language itself? No, I mean use the profiler. Compile and link the program with the "-pg" compiler switch, then run it, and when it exits, run gprof, the profiler which is part of the Binutils distribution. It will show you where does your program spends most of its time. If that place is not in the code you are trying to inline, you are wasting your time. > I discovered that the base pointer can be used as well, with > -fno-frame-pointer. This makes an extra register available and my > code can be speeded up in another way. Yes, this is another optimization switch that you should try. > I started using many intel inline asms when I discovered that my C > instructions were not translated to the one-liners I had in > mind. The code gcc generates looks terrible. > See e.g. the following examples: > > C-code: > T.Byte += dd[2] > (union {long Long; unsigned char Byte; } T;) > > gcc-output: > movb %cl, %al > addb _dd+2, %al > movb %al, %cl > > one-liner: > addb _dd+2, %cl You did compile this with optimizations, yes? And you do have the latest GCC version, right? Also, did you use the -march=pentium option? You also should look at the time it takes to perform these instructions. Sometimes, the code looks to be of poor quality, but it actually runs faster.