Sender: law AT sgi DOT com
Message-ID: <38AC20C0.55E33FA4@sgi.com>
Date: Thu, 17 Feb 2000 08:24:32 -0800
From: Linda Walsh <law AT sgi DOT com>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: pgcc AT delorie DOT com
Subject: Big slowdown on -mpentium/-mpentiumpro, short_int add
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Reply-To: pgcc AT delorie DOT com

I was running the AIM9 benchmark and found one of the tests that
Showed about a 30-40% slowdown when using the -mpentium or -mpentiumpro
Initial optargs:
CCOPT=-O2 -fomit-frame-pointer -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -DCPU=686    

Just adding -mpentium or -mpentiumpro to that will slow this test down.

Note the test suite is copyright, but I'm excerpting this section for
educational (to demonatrate a problem) purposes.

---code snipit

static int add_short(Result *res)
{
  int
    n,                 /* internal loop variable */
    loop_cnt,            /* internal loop count */
    tloop_cnt;             /* temporary internal loop count */
 
  short
    s1,                /* copy of arg 1 */
    s2,                /* copy of arg 2 */
    ts1, ts2,            /* temp copy of args */
    s;                 /* result goes here */
 
  /*  these values are actually passed in -- law */
  ts1=3;
  ts2=-3;
  tloop_cnd=2000000;

  s1 = ts1;              /* use register variables */
  s2 = ts2;
  loop_cnt = tloop_cnt;
 
  s=0;   
            /* Variable Values */
          /*    s    s1    s2   */
  for (n=loop_cnt; n>0; n--) {    /*    0    x     -x  - initial value */
  s  += s1;     /*    x    x     -x   */
  s1 += s2;     /*    x    0     -x   */
  s1 += s2;     /*    x    -x    -x   */
  s2 += s ;     /*    x    -x    0    */
  s2 += s ;     /*    x    -x    x    */
  s  += s1;     /*    0    -x    x    */
  s  += s1;     /*    -x   -x    x    */
  s1 += s2;     /*    -x   0     x    */
  s1 += s2;     /*    -x   x     x    */
  s2 += s ;     /*    -x   x     0    */
  s2 += s ;     /*    -x   x     -x   */
  s  += s1;     /*    0    x     -x   */
          /* Note that at loop end, s1 = -s2 */
          /* which is as we started.  Thus, */
          /* the values in the loop are stable */
  }
  res->s = s;
  return(0);
}
=====================================

The subroutine is run in a loop with a 5 second timer set that will
stop calling the subroutine after it is finished.

The entire benchmark has about 60 tests, Even with this test dragging
down the average, overall performance increase was about 1-2% -- possibly
low due to short-int adds elsewhere in the code and kernel (was running
the benchmark on either an optimized kernel or not).

The pgcc I'm using is from the Mandrake 7.0 package, v2.95.2

I'm not on this mailing list, so please 'cc' me with any questions/
news/fixes...

Thanks,
Linda Walsh