Xref: news2.mv.net comp.os.msdos.djgpp:5927 From: korpela AT albert DOT ssl DOT berkeley DOT edu (Eric J. Korpela) Newsgroups: comp.os.msdos.djgpp Subject: Re: Setpixel in AT&T inline asm.... Date: 12 Jul 1996 19:49:17 GMT Organization: Cal Berkeley-- Space Sciences Lab Lines: 33 Message-ID: <4s6a7t$l0f@agate.berkeley.edu> References: <4rh0g5$m9r AT twain DOT mo DOT net> <4rj0kb$sf3 AT nef DOT ens DOT fr> <836948271snz AT tsys DOT demon DOT co DOT uk> <4s4sue$qbc AT status DOT gen DOT nz> NNTP-Posting-Host: albert.ssl.berkeley.edu To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp In article <4s4sue$qbc AT status DOT gen DOT nz>, Bruce Foley wrote: >Tom Wheeley wrote: > > >>Although I've never used setpixel routines, I was always under the impression >>that (x + y << 8 + y << 6) is faster than (x + 320 * y). > >I think this is true of older processors, but on a 486, a >well designed mul instruction is just as fast (or faster?), >depending on the value of the operands. >Don't know about the Pentium though, since simple >instructions can be useful for keeping both Pipes going. mul and imul are multicycle unpairable instructions on the pentium. They execute in the FP unit. The only hard cycle count number I have is "imul eax,217" takes 10 cycles. The number of cycles is variable (I think). According to the intel documentation the break even point is 8 or fewer bits set in the pentium, for 486 it's 6 or fewer bits set. So on both processors, a multiply by 320 is always best when implemented as shift and add. Of course if you are using C, (x+320*y) should compile the same as (x+y<<8+y<<6). (On a properly designed compiler). Eric -- Eric Korpela | An object at rest can never be korpela AT ssl DOT berkeley DOT edu | stopped. Click here for more info.