From: Paul Shirley Newsgroups: comp.os.msdos.djgpp Subject: Re: Is DJGPP that efficient? Date: Sat, 21 Dec 1996 07:00:41 +0000 Organization: wot? me? Lines: 21 Distribution: world Message-ID: References: <199612161347 DOT IAA01261 AT delorie DOT com> <32B8749B DOT 6DFD AT nlc DOT net DOT au> <32B8ECAF DOT 5F9F AT gbrmpa DOT gov DOT au> <59bopp$vn3 AT winx03 DOT informatik DOT uni-wuerzburg DOT de> Reply-To: Paul Shirley NNTP-Posting-Host: chocolat.foobar.co.uk Mime-Version: 1.0 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp In article <59bopp$vn3 AT winx03 DOT informatik DOT uni-wuerzburg DOT de>, Manuel Kessler writes >Leath Muller (leathm AT gbrmpa DOT gov DOT au) wrote: >I have no manuals at my hands, but i KNOW that the pentium is capable of >doing one fmul EVERY cycle, because i DID it. For serious problems you >don't get that throughput, but something around 2 cycles per flop (fmul >or fadd/fsub) is possible, if no memory is slowing things down. See the >BLAS homepage at The P5 has a 3 clk latency (the time it takes from issue to retiring an op), a throughput (the time before another op can be issued) of 1 clk *unless* you issue consecutive multiplies when is has a 2 clk throughput. AFAIK you can achieve a maximum multiply throughput of 2clks/mul. However in real code you have to actually load the next operand or sum the result which eats up that otherwise wasted cycle. The gcc fpu code is actually pretty good. --- Paul Shirley: shuffle chocolat before foobar for my real email address