Mail Archives: djgpp/1997/01/20/18:50:19
> well i dug the big book of cycles out today. this is what it says..
> fdiv fmul idiv imul div mul
> 486(7) 8-89 11-27 43/44 42 40 42
> pentium 39-42 1-7 22-46 10/11 17-41 11
Your book disagrees with the information provided by Intel in their
programmers reference manual. Go to http://www.x86.org and get Acrobat
reader, and then the PDF file from Intel (it has a link): 241430_4.pdf.
This has everything you need. I recall 3 cycles for an fmul and 10
cycles for an fimul...
> can anyone confirm those values? just on the offchance there's a mistake in my
> book. now it strikes me that rather than do the expensive operation
I would say its wrong. It gives the impression you can do a simply fmul
in one clock when this isn't true. If you have:
flds _x0;
fmuls _x1;
fstps _result;
Then this will take (1 + 3 + 3) 7 cycles. However, if you have something
like:
flds _x0; // 1
fmuls _x1; // 2 - 4
flds _y0; // 3
fmuls _y1; // 4 - 6
flds _z0; // 5
fmuls _z1; // 6 - 8
fxch %st(2); // free
faddp %st(1); // 7 - 9
faddp %st(1); // 10 - 12
fstp _result; // 12 - 15
As you are overlapping fmuls in this dot product routine, the fmul comes
at one cycle... (the fstp normally takes 2 cycles, but has a 1 cycle
latency when using the result of the previous operation). I haven't seen
a fmul take 7 seconds, but that may be what it takes on 80bit ops.
Note: I do fld, fmul etc with an s not d because I store things as
floats - the conversion time is zero between the 32bit float and 80bit
full precision, so it makes no difference. You just have less accuracy.
> float a,b,c,d,x,y;
> c=x/b;
> d=y/b;
> a=1/b;
> c=x*a;
> d=y*a;
Definately - the savings are huge...
> which would save a whole load of cycles, particularly on a pentium.
> in fact, if i were doing the operations with signed longs instead...
> signed long a,b,c,d,x,y;
> i would be better writing - (and changing a to a float)
> a=1.0/b; (because fdiv is still faster than idiv in most cases)
> c=(float)x*a;
> d=(float)y*a;
You are probably better off using full floating point math until the
very end where you can store the result as a float. To load/store
int's is very expensive and should be avoided where possible. I think
an int->float or float->int conversion takes about 14 cycles each
time...
> ie. to change the integers into floating wherever possible to make use of the
> fmul timings, which outstrip every other timing even in worst case!
Yep... :)
> so there must be a catch somewhere of course ;)
No, fpu stuff is fast on Intel stuff now (pentium onwards). It took a
while, but they finally caught up to Motorola.
> perhaps the changing from float->int and vice versa takes a lot of time?
> anyone?
Yep... :) Use floating point. If your programming Pentium onwards anyway.
Remember, if someone says otherwise, just say one word: Quake.
Leathal.
- Raw text -