X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs- Date: Mon, 6 Jul 1998 18:35:32 +0300 (EET DST) From: Tuukka Toivonen X-Sender: tuukkat AT stekt3 Reply-To: Tuukka Toivonen To: Andrea Arcangeli cc: Linux Programming , linuxprog AT geeky1 DOT ebtech DOT net, beastium-list Subject: passing args in regs speed (was:something else) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: Marc Lehmann Status: RO Content-Length: 4020 Lines: 122 On Sun, 5 Jul 1998, Andrea Arcangeli wrote: >I suggest you to learn and use the gcc inline asm. The way gcc implements >inline gcc is so far the best. It allow gcc to optimize out everything as >best. Yes, except that I happen to hate AT&T syntax ;) >true since for example the eax register has not to be preserved at all. It >would be nice to pass the last parameter of the function call in the eax >register and the other parameters across the stack as usual. I think it >would help a lot in performance. I' ll try to discover the improvement. >Fast latency: 1007, normal latency 1307 [ not using EAX ] [ using EAX for arg pass ] Interesting. I made some experiments too. Test program: bzip2 0.1pl2 I added function prototypes for all functions in the program (and removed those already existing). I told the compiler to use different amount of register parameters and then compiled the program and measured how long it took to compress uncompressed LyX 0.12.0 source tar file (7997440 bytes) to /dev/null. My test system: Pentium 120 MHz, 24 MB main memory, 32 MB swap, Linux 2.0.34, gcc version 2.7.2. There were no other active programs background eating CPU-time, but the hard disk rotated few times showing that not everything fit in the disk cache. The tests show no significant speedup until I use all 3 registers, in which case it's about 6% faster. Question: why gcc doesn't allow more than 3 registers to be used?? x86 would have 7 or at least 6 free registers. Each case first shows the used compiler flags, and then the test run was made 4 times. The times are in real-time seconds (measured using my own program using RDTSC instruction) The last number is length of the stripped ELF executable (so case 4 gives smallest executables). Patch for bzip and some more information is in file http://www.ee.oulu.fi/~tuukkat/regpass-test.tar.gz Considerations: - All libc calls used conventional stack parameter passing convention. This could be changed by breaking compatibility. - Why kernel doesn't use register parameters?? It would be ideal since it wouldn't break compatibility! Can we think this test closes the case? I don't think. Especially that the case 5 gives so much better performance than any other case make me suspecting that a lot more testing (of different real-life programs) is needed. Surprise, surprise: case 2 is faster than case 1! CASE 1: no register parameter passing. Compiler-selected inline functions. -O3 -fomit-frame-pointer -funroll-loops -g clock count: 100.54 clock count: 100.46 clock count: 100.77 clock count: 100.64 total clock count: 402.41 / 4 65544 CASE 2: no register parameter passing. No inline functions. -O2 -fomit-frame-pointer -funroll-loops -g clock count: 99.609 clock count: 99.731 clock count: 99.508 clock count: 99.617 total clock count: 398.46 / 4 54200 CASE 3: 1 register argument. No inline functions. __attribute__ (( regparm(1) )) -O2 -fomit-frame-pointer -funroll-loops -g clock count: 100.14 clock count: 99.742 clock count: 100.12 clock count: 99.701 total clock count: 399.7 / 4 54040 CASE 4: 2 register argument. No inline functions. __attribute__ (( regparm(2) )) -O2 -fomit-frame-pointer -funroll-loops -g clock count: 99.725 clock count: 99.698 clock count: 99.44 clock count: 99.209 total clock count: 398.07 / 4 53896 CASE 5: 3 register argument. No inline functions. __attribute__ (( regparm(3) )) -O2 -fomit-frame-pointer -funroll-loops -g clock count: 94.509 clock count: 94.295 clock count: 94.171 clock count: 94.328 total clock count: 377.3 / 4 53912 ( I'm CCing this to pgcc list since I think those people could be interested; maybe they could implement automatic register passing for static functions?) -- | Tuukka Toivonen [PGP public key | Homepage: http://www.ee.oulu.fi/~tuukkat/ available] | Try also finger -l tuukkat AT ee DOT oulu DOT fi | Studying information engineering at the University of Oulu +-----------------------------------------------------------