Mail Archives: djgpp/1997/01/06/21:31:32
In article <01bbf68d$e5e607e0$cc2549c2 AT default>,
Thomas Harte <T DOT Harte AT btinternet DOT com> wrote:
>just wanted to add that you can use a shortcut based on the fact that
>256+64 = 320, that 2 to the power of 8 is 256 and 2 to the power of 6 is
>64, and that binary is based around two options. Therefore, using the
>correct bit-shifts, you can change the line to :-
>
> _farpokeb(_dos_ds, 0xA0000 + (y << 8) + (y << 6) + x, colour);
>
> . . . .which is faster.
Sorry bud, GCC is way ahead of you. When optimization is on, it turns
a multiply by a constant 320 into:
movl _p,%eax ; load multiplicand (1 cycle)
leal (%eax,%eax,4),%eax ; multiply by 5 (1 cycle)
sall $6,%eax ; multiply by 64 (2 cycles)
Total: 4 (i486)
Tested as "return p*320;" w/ -O3 -m486 -fomit-frame-pointer -S
Your code turns into:
movl _p,%eax ; load shiftee (1 cycle)
movl %eax,%edx ; load second shiftee (1 cycle)
sall $8,%edx ; multiply by 256 (2 cycles)
sall $6,%eax ; multiply by 64 (2 cycles)
addl %edx,%eax ; combine results (1 cycle)
Total: 7 (i486)
Tested as "return (p<<8)+(p<<6);" w/ -O3 -m486 -fomit-frame-pointer -S
Neither sequence goes superscalar on Pentium.
Decomposing into shifts is still a good trick for other (Borland) compilers
but GCC is no slouch in this department.
Brennan
p.s. Did my announcement of BCD not make it in here? News has been flaky
lately.
--
brennan AT rt66 DOT com | "Developing for Windows is not fun." -- John Carmack
Riomhchlaraitheoir|
Rasterfarian | <http://brennan.home.ml.org> -O
- Raw text -