Sender: law AT sgi DOT com Message-ID: <38AC20C0.55E33FA4@sgi.com> Date: Thu, 17 Feb 2000 08:24:32 -0800 From: Linda Walsh X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686) X-Accept-Language: en MIME-Version: 1.0 To: pgcc AT delorie DOT com Subject: Big slowdown on -mpentium/-mpentiumpro, short_int add Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Reply-To: pgcc AT delorie DOT com I was running the AIM9 benchmark and found one of the tests that Showed about a 30-40% slowdown when using the -mpentium or -mpentiumpro Initial optargs: CCOPT=-O2 -fomit-frame-pointer -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -DCPU=686 Just adding -mpentium or -mpentiumpro to that will slow this test down. Note the test suite is copyright, but I'm excerpting this section for educational (to demonatrate a problem) purposes. ---code snipit static int add_short(Result *res) { int n, /* internal loop variable */ loop_cnt, /* internal loop count */ tloop_cnt; /* temporary internal loop count */ short s1, /* copy of arg 1 */ s2, /* copy of arg 2 */ ts1, ts2, /* temp copy of args */ s; /* result goes here */ /* these values are actually passed in -- law */ ts1=3; ts2=-3; tloop_cnd=2000000; s1 = ts1; /* use register variables */ s2 = ts2; loop_cnt = tloop_cnt; s=0; /* Variable Values */ /* s s1 s2 */ for (n=loop_cnt; n>0; n--) { /* 0 x -x - initial value */ s += s1; /* x x -x */ s1 += s2; /* x 0 -x */ s1 += s2; /* x -x -x */ s2 += s ; /* x -x 0 */ s2 += s ; /* x -x x */ s += s1; /* 0 -x x */ s += s1; /* -x -x x */ s1 += s2; /* -x 0 x */ s1 += s2; /* -x x x */ s2 += s ; /* -x x 0 */ s2 += s ; /* -x x -x */ s += s1; /* 0 x -x */ /* Note that at loop end, s1 = -s2 */ /* which is as we started. Thus, */ /* the values in the loop are stable */ } res->s = s; return(0); } ===================================== The subroutine is run in a loop with a 5 second timer set that will stop calling the subroutine after it is finished. The entire benchmark has about 60 tests, Even with this test dragging down the average, overall performance increase was about 1-2% -- possibly low due to short-int adds elsewhere in the code and kernel (was running the benchmark on either an optimized kernel or not). The pgcc I'm using is from the Mandrake 7.0 package, v2.95.2 I'm not on this mailing list, so please 'cc' me with any questions/ news/fixes... Thanks, Linda Walsh