To: eliz AT is DOT elta DOT co DOT il Cc: djgpp AT sun DOT soe DOT clarkson DOT edu Subject: Re: Speed tuning programs Date: Wed, 17 Aug 1994 17:25:55 +0100 From: Olly Betts In message <9408170942 DOT AA02203 AT is DOT elta DOT co DOT il>, eliz AT is DOT elta DOT co DOT il writes: >> I haven't tried profiling the code recently, so it might be worth doing >> again. However, this would probably just reduce the times for both >> versions. > >Not necessarily true. The libraries and the code generation of the two >compilers (BC and GCC) are quite different, so what's a hot spot in one >version, doesn't have to be such in another. For example, imagine that >some specific library function is much more efficient for one of the >compilers, and this very function is used in the innermost loop of >your program. Good point. I've had a go at profiling the code, but I think I'm failing to do something. Here's what I did: Deleted *.o and the coff and executable files Modified the makefile to add the flags -pg to all compiles and links Rebuilt the program Ran the program on a sample data set (20.43 secs internal timing) Ran: gprof survex.out [survex.out is the coff file] Here's the output (editted highlights anyway): =========================================================================== Flat profile: Each sample counts as 0.055556 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 99.99 0.06 0.06 1 55.55 55.55 main 0.00 0.06 0.00 92183 0.00 0.00 fputc 0.00 0.06 0.00 43295 0.00 0.00 strncmp 0.00 0.06 0.00 23389 0.00 0.00 skipblanks 0.00 0.06 0.00 16879 0.00 0.00 tochar [...] 0.00 0.06 0.00 1 0.00 0.00 write_image % the percentage of the total running time of the time program used by this function. cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing. [...] =========================================================================== Now from my reading of this, the profiler thinks that the program took 0.06 seconds, all spent in main(), which is just plain wrong. I'm sure I must just be failing to do something. It does show that it makes a lot of use of fputc() and strncmp() though, which may be pertinent if the library implementations of these are weak. >On the other hand, library functions which move buffers, such as >strcpy(), memcpy(), memset(), memmove() are inlined by BC under >-O2, which GCC does not. Also, in BC these work by moving 16-bit >words, whereas memcpy() which comes with DJGPP moves bytes. If you >have such calls, you're better off using movedata() which moves >32-bit double-words. Looks like strncpy() might be a worthy candidate then. n is 12 in almost all the calls to it, so the function overhead is probably fairly significant. If only I could get some timings from gprof ... Olly