Newsgroups: comp.os.msdos.djgpp From: Elliott Oti Subject: Re: Optimized code, comparing with Borland C++ 4.5 w/ Power Pack Sender: usenet AT fys DOT ruu DOT nl (News system Tijgertje) Message-ID: In-Reply-To: <350334da.0@superego.idcomm.com> Date: Mon, 9 Mar 1998 09:45:51 GMT Content-Type: TEXT/PLAIN; charset=US-ASCII References: <350334da DOT 0 AT superego DOT idcomm DOT com> Mime-Version: 1.0 Organization: Physics and Astronomy, University of Utrecht, The Netherlands Lines: 90 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Precedence: bulk On Sun, 8 Mar 1998, Randy Sorensen wrote: > Here's my problem. When I do shut down out of DOS and run the exec's > included on the CD, they run up to 60 fps, where as the code that I ported > to DJGPP runs at 40 fps with the following optimizations: > "-O6 -ffast-math -funroll-loops -finline -m486". Is there any other > optimizations that will speed it up? Also, I've heard that using high > "-O"'s will cause problems.. should I bring it down to 4 or 3? On plain gcc use -O2 ( it doesn't go higher than -O3, but -O2 is cache-friendlier ). -fomit-frame-pointer is another very handy switch that frees up an extra register at the expense of debugging ease. On pgcc use -O6 -mpentium -ffast-math -fomit-frame-pointer. > I should note that when I ported the code, the original author didn't do a > very "standard" job with it. Some of the matrix vector and point > transformation code was inlined (they were C++ methods) and I couldn't > figure out how to make DJGPP inline them. This is the main reason the performance sucks. During transformations, matrix and vector functions are being called thousands of times each frame. The overhead associated with calling these functions can be higher than the overhead of the function body itself. To inline them place the method implementation in a header file accessible to all, and use the keyword "inline": ----------- example vector.h ------------------------------- #ifndef vector__ #define vector__ class vector { double v[3]; public: vector(const double X, const double Y, const double Z) { v[0]=X; v[1]=Y; v[2]=Z; } // constructor ~vector(){} // destructor inline vector operator+(const vector &a) const; // vector addition (inline) }; inline vector vector::operator+(const vector &a) const return r { r[0] = v[0] + a[0]; r[1] = v[1] + a[1]; r[2] = v[2] + a[2]; } #endif ----------------------------------------------------------------- > Also, since you can't write to > video memory in DJGPP by default, I went about doing so using > __djgpp_nearptr_enable() and adding __djgpp_conventional_base to the video > memory address. Is there a faster way of going about video memory writing? > And lastly, I had to put some extra type-casts in there, since gcc.exe kept > giving me warnings about assigning doubles to unsigned char's and stuff. > > This is fast enough, but I wonder: is the source code in the book you mentioned rendering *directly* to video memory, or does it render first to an offscreen buffer which is then copied to video memory? If the first case is true I would suggest changing the code slightly so that it writes to an offscreen buffer instead of 0xa0000. Then you can use the safer ( and practically as fast ) function dosmemput(), ( or dosmemputl()) instead of memcpy + nearpointers to write to video memory. info libc alphabetical dosmemput will give you the syntax for using dosmemput, and an example. The typecasts to suppress the warnings are OK, but I would suggest checking these places to see if the typecast is warranted, and to see if any precision loss is not adversely affecting the code. I used to program for both Borland (3,4) and djgpp; good djgpp code beats good Borland code hands down ( part of the reason I switched ), so a performance boost *is* possible. Cheers, Elliott Oti kamer 104, tel (030-253) 2516 (RvG) http://www.fys.ruu.nl/~oti