www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/03/01/10:01:54

From: ao950 AT FreeNet DOT Carleton DOT CA (Paul Derbyshire)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Loop unrolling: Don't bother
Date: 1 Mar 1997 06:36:49 GMT
Organization: The National Capital FreeNet
Lines: 43
Message-ID: <5f8iq1$cor@freenet-news.carleton.ca>
References: <5f5knj$cho AT freenet-news DOT carleton DOT ca> <5f61hc$nkg AT flex DOT uunet DOT pipex DOT com> <331702EF DOT 8EA AT rpi DOT edu>
Reply-To: ao950 AT FreeNet DOT Carleton DOT CA (Paul Derbyshire)
NNTP-Posting-Host: freenet2.carleton.ca
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

Brian Osman (osmanb AT rpi DOT edu) writes:
> nikki wrote:
>> 
>> hardly a great surprise seeing as the loop above would quite probably fit in
>> the cache when well optimised, but unrolled would thrash it horribly.
>> unrolling loops won't save an enormous amount of time, after all a jump
>> instruction will only take you 3 or 4 cycles at most.
>> 
>> nik
>> 
>> --
>> Graham Tootell
>> nikki AT gameboutique DOT com
> 
> Bear in mind that in many of the newer processes (ie PPro) which
> use predictive branching, branches are one of the single worst
> instructions. A mispredicted branch means that all of the pipeline,
> and the cache has to be invalidated and flushed. Not pretty.
> There are some cases where loop unrolling won't help much, but
> it's still a valid and useful optimization technique. I don't
> suppose -O3 is causing any unrolling? :)

No, -O3 does inlining but not unrolling. And I don't trust -funroll-loops
or -funroll-all-loops, because they might unroll loops with
run-time-determined numbers of executions that will die very horribly if they 
run a few times more than they are supposed to, i.e. they will access a
mallocked array out of its bounds or something. So, I did it manually.

#define XLOOP          stuff in the innermost loop
#define YLOOP        outer loop stuff and about twenty of
                XLOOP;XLOOP;XLOOP etc.
#define ZLOOP      about 14 of YLOOP; YLOOP etc.

(All macros made "function-like" in leaving a semicolon off the last
statement so ZLOOP; etc. would be correct syntax when expanded.)


--
    .*.  Where feelings are concerned, answers are rarely simple [GeneDeWeese]
 -()  <  When I go to the theater, I always go straight to the "bag and mix"
    `*'  bulk candy section...because variety is the spice of life... [me]
Paul Derbyshire ao950 AT freenet DOT carleton DOT ca, http://chat.carleton.ca/~pderbysh

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019