www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/01/17/12:17:14

From: GAMMELJL AT SLU DOT EDU
Date: Mon, 17 Jan 2000 09:34:04 -0600 (CST)
Subject: AMD assembly language
To: djgpp AT delorie DOT com
Message-id: <01JKT1KTY2XE8WY342@SLU.EDU>
Organization: SAINT LOUIS UNIVERSITY St. Louis, MO
X-VMS-To: IN%"djgpp AT delorie DOT com"
MIME-version: 1.0
Reply-To: djgpp AT delorie DOT com

     The source codes in question were developed on a machine with
an Intel Pentium Pro 200 mhz chip.  The machine has 128mb of RAM.
The djgpp on this machine was downloaded on December 15,1997.
The version number is 2.01.  A new malloc.c was downloaded
January 2, 1998.

     The source codes were copied to a machine with an AMD K6 333
mhz processor.  That machine has 92 mb of RAM.  The djgpp on that
machine was downloaded in December 19,1999 (unzipped from
djdev202.zip).  Presumably, this is the latest version of djgpp.

     The source codes (which are a few thousand lines in length)
require only 2.4 mb of RAM (in addition to the size of the executa-
bles, which may be as large as 0.700 mb).  Therefore, it is unlikely
that memory management has anything to do with the problems we
encountered after copying the source codes from one machine to the
other.

     The first problem we dealt with was what appears to us to be
a change in the compiler (we always use
               gxx codename.cc -o codename.exe [-O2] [-w]
where [] indicates "optional".)  In the source codes developed on
the Intel machine, it was assumed that the default type for sub-
routines is "int".  That is, in an include file containing sub-
routines, one writes

         extern subroutinename(whatever arguments);

         subroutine(whatever arguments)
           {.
            .
            return 0;
           }

and the source compiles and the executable runs correctly.
On the AMD machine, the compiler insisted that the type (int)
be stated explicitly:

        extern int subroutine name(whatever arguments);

        int subroutine(whatever arguments)
            {.
             .
             return 0;
            }

The changes were tedious, but when the changes were made, the
source codes compiled and the executables ran correctly on both
machines.

     The second problem appeared to us to be a change in the
allowable assembly language (whether due to a difference in the
architecture of the AMD and Intel chips or due to a change in the
assembler we do not know).  On the Intel machine, a section of
assembly language in the source code to which an argument was
passed was done as follows:

      .                      //in the C++ part of the code
      .
      argument=something;    //argument is a global variable
      mode(argument);
      .
      .

where, in an include file containing the assembly language codes,

      #define mode(argument)       \
      __asm__ (             \
          .                 \
          .                 \
          :                 \
          : "a" (argument)  \
          : "eax", etc );

which means that the argument is to be passed in the register eax.
This way of coding did not work on the AMD machine.  We had to
change to
     argument=something   //in the C++ part of the code
     mode();

     #define mode()      \
     __asm__ (                        \
         "movl _argument,%%eax\n\t"   \
         .                            \
         :                            \
         :                            \
         : "eax", etc );

That is, the assembly language functions could not have an argument.
Again, making these changes was tedious, but the resulting source
code compiled correctly and ran correctly on both machines (as long
as the optimizing -O2 switch was not called on--see fourth problem
below).

     There was a third (very minor and probably not worth mentioning)
problem related to assembly language programming on the AMD machine.
Sometimes in assembly language one wants the origin of an array named
arrayname, for instance in this assembly language statement,
             movl _arraynameorigin,%%ebx.
arraynameorigin has to be set somewhere. On the intel machine,
          unsigned int arraynameorigin=arrayname;
(this statement occurs in the C++ part of the code, not assembly
language) results in a warning (arraynameorigigin has no cast), but
on the AMD machine it results in an error.  Apparently, the compiler
has been changed.  We do not understand why this should be an error.
However, since arrayname is the origin of the array arrayname, and
               movl _arrayname,%%ebx
works on both machines, this problem is just a curiosity having no
importance.

     The fourth problem, which we have not overcome, is the fact
that, after all of the above revisions, on the AMD machine the
optimization switch -O2 does not work. With the -O2 switch on, the
source codes compile (on both machines) with no reported errors.
The executables run faster on the Intel machine by 30%-50%.
But the executables crash on the AMD machine.

     The use of assembly language is crucial, resulting in exe-
cutables which are FOUR TIMES FASTER than those created without
the use of assembly language, for both the Intel and AMD processors.
The -O2 switch, in the Intel case, results in another gain in speed
of 30%-50%, a gain which we would like to get for the AMD processor
if we can locate the source of this fourth problem and fix it.

     Here is some timing data:
     source  assembly language   -O2    ticks(91 ticks/sec)
                                         Intel     AMD
      #1         no               no     9735      4145
                 no               yes    4775      2690
                 yes              no     2155      1225
                 yes              yes    1550      crash

      #2         no               no     9280      3910
                 no               yes    4530      2545
                 yes              no     1780      985
                 yes              yes    1365      crash

The spectacular gains in speed when using assembly language are due
to the use of the commands, adcl, mull, and divl.  When the 64-bit
merced chips appear, these commands will be dropped (that is what I
call risc architecture, but others mean something else).  The
architecture of the merced will be similar to the alpha architecture
(in my naive view).  I understand that these commands will not be
dropped when AMD produces its 64-bit processor.  It will be a true
extension of Intel 486 architecture to 64-bits.  And so we are very
interested in the AMD chip.
     
     Grendel has asked that we post the crash reports. 

Source code #1:

Exiting due to signal SIGFPE 
Division by Zero at eip=000040ef, x87 status=0120
eax=00000000 ebx=00000000 ecx=00000036
                    edx=00000005 esi=00000000 edi=00000038
ebp=000b981c esp=000b97e4 program=C:\directory\codename.EXE
cs: sel=00ef  base=83809000  limit=00acafff
ds: sel=00f7  base=83809000  limit=00acafff
es: sel=00f7  base=83809000  limit=00acafff
fs: sel=00cf  base=00013d10  limit=0000ffff
gs: sel=0107  base=00000000  limit=0010ffff
ss: sel=00f7  base=83809000  limit=00acafff
App stack: [000b9978..00039978]  Exceptn stack: [00039854..00037914]

Call frame traceback EIPs:
  0x000040ef
  0x00007d67
  0x00009373
  0x0001ce96

Source code #2:

Exiting due to signal SIGFPE 
Division by Zero at eip=000040af, x87 status=0120
eax=00000000 ebx=00000000 ecx=00000036
                   edx=00000005 esi=00000000 edi=00000038
ebp=000ce118 esp=000ce0e0 program=C:\directory\codename.EXE
cs: sel=00ef  base=83809000  limit=00f1afff
ds: sel=00f7  base=83809000  limit=00f1afff
es: sel=00f7  base=83809000  limit=00f1afff
fs: sel=00cf  base=00013d10  limit=0000ffff
gs: sel=0107  base=00000000  limit=0010ffff
ss: sel=00f7  base=83809000  limit=00f1afff
App stack: [000ce2a8..0004e2a8]  Exceptn stack: [0004e184..0004c244]

Call frame traceback EIPs:
  0x000040af
  0x0000de02
  0x00018929
  0x0002daea

     I will be off-line January20-February3
                        February23-March5
 

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019