Message-ID: <32BCFB1F.4FD6@gbrmpa.gov.au> Date: Sun, 22 Dec 1996 17:10:55 +0800 From: Leath Muller Reply-To: leathm AT gbrmpa DOT gov DOT au Organization: Great Barrier Reef Marine Park Authority MIME-Version: 1.0 To: Manuel Kessler CC: djgpp AT delorie DOT com Subject: Re: Is DJGPP that efficient? References: <199612161347 DOT IAA01261 AT delorie DOT com> <32B8749B DOT 6DFD AT nlc DOT net DOT au> <32B8ECAF DOT 5F9F AT gbrmpa DOT gov DOT au> <59bopp$vn3 AT winx03 DOT informatik DOT uni-wuerzburg DOT de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > :> Well, I have the Pentium Programmers Manual sitting in front of me in > :> Acrobat, and it says it _does_ do 3 cycles per mul. If you want proof > :> of the speed, look at Quake. Even Abrash said he couldn't get the same > :> performance out of the pentium with fixed point as he could with > :> floating point. > I have no manuals at my hands, but i KNOW that the pentium is capable of > doing one fmul EVERY cycle, because i DID it. For serious problems you > don't get that throughput, but something around 2 cycles per flop (fmul > or fadd/fsub) is possible, if no memory is slowing things down. See the > BLAS homepage at > http://cip.physik.uni-wuerzburg.de/~mlkessle/blas1.html > For simple functions like dot product of short vectors coming out of the > L1 cache it's possible to achieve 79 MFLOP at a P-133. This gives one > fpu result every 1.6 cycles. Latency for both fmul and fadd is three cycles, > therefore you have to use heavily fxch, but it's mostly free anyway. > Of course, it's not very easy to get that performance, but it's > possible. And as Lord Shaman says: > Anyway, even you are wrong, it's 3 clocks for the first mul, if the next > FP operation is a mul, it goes through in 1 clock. Which sounds about right... but then I thought this was in the lower precision modes. Maybe I should go check that... :) Leathal.