www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/12/24/10:39:16

From: Paul Shirley <Paul AT foobar DOT co DOT uk DOT chocolat>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Is DJGPP that efficient?
Date: Sat, 21 Dec 1996 07:00:41 +0000
Organization: wot? me?
Lines: 21
Distribution: world
Message-ID: <aJQDsHAZs4uyEwDL@chocolat.foobar.co.uk>
References: <199612161347 DOT IAA01261 AT delorie DOT com> <32B8749B DOT 6DFD AT nlc DOT net DOT au>
<32B8ECAF DOT 5F9F AT gbrmpa DOT gov DOT au> <59bopp$vn3 AT winx03 DOT informatik DOT uni-wuerzburg DOT de>
Reply-To: Paul Shirley <junk AT defeating DOT email DOT address>
NNTP-Posting-Host: chocolat.foobar.co.uk
Mime-Version: 1.0
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

In article <59bopp$vn3 AT winx03 DOT informatik DOT uni-wuerzburg DOT de>, Manuel
Kessler <mlkessle AT cip DOT physik DOT uni-wuerzburg DOT de> writes
>Leath Muller (leathm AT gbrmpa DOT gov DOT au) wrote:
>I have no manuals at my hands, but i KNOW that the pentium is capable of 
>doing one fmul EVERY cycle, because i DID it. For serious problems you
>don't get that throughput, but something around 2 cycles per flop (fmul
>or fadd/fsub) is possible, if no memory is slowing things down. See the
>BLAS homepage at

The P5 has a 3 clk latency (the time it takes from issue to retiring an
op), a throughput (the time before another op can be issued) of 1 clk
*unless* you issue consecutive multiplies when is has a 2 clk
throughput.

AFAIK you can achieve a maximum multiply throughput of 2clks/mul.
However in real code you have to actually load the next operand or sum
the result which eats up that otherwise wasted cycle. The gcc fpu code
is actually pretty good.

---
Paul Shirley: shuffle chocolat before foobar for my real email address

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019