From: GAMMELJL AT SLU DOT EDU Date: Mon, 17 Jan 2000 09:34:04 -0600 (CST) Subject: AMD assembly language To: djgpp AT delorie DOT com Message-id: <01JKT1KTY2XE8WY342@SLU.EDU> Organization: SAINT LOUIS UNIVERSITY St. Louis, MO X-VMS-To: IN%"djgpp AT delorie DOT com" MIME-version: 1.0 Reply-To: djgpp AT delorie DOT com The source codes in question were developed on a machine with an Intel Pentium Pro 200 mhz chip. The machine has 128mb of RAM. The djgpp on this machine was downloaded on December 15,1997. The version number is 2.01. A new malloc.c was downloaded January 2, 1998. The source codes were copied to a machine with an AMD K6 333 mhz processor. That machine has 92 mb of RAM. The djgpp on that machine was downloaded in December 19,1999 (unzipped from djdev202.zip). Presumably, this is the latest version of djgpp. The source codes (which are a few thousand lines in length) require only 2.4 mb of RAM (in addition to the size of the executa- bles, which may be as large as 0.700 mb). Therefore, it is unlikely that memory management has anything to do with the problems we encountered after copying the source codes from one machine to the other. The first problem we dealt with was what appears to us to be a change in the compiler (we always use gxx codename.cc -o codename.exe [-O2] [-w] where [] indicates "optional".) In the source codes developed on the Intel machine, it was assumed that the default type for sub- routines is "int". That is, in an include file containing sub- routines, one writes extern subroutinename(whatever arguments); subroutine(whatever arguments) {. . return 0; } and the source compiles and the executable runs correctly. On the AMD machine, the compiler insisted that the type (int) be stated explicitly: extern int subroutine name(whatever arguments); int subroutine(whatever arguments) {. . return 0; } The changes were tedious, but when the changes were made, the source codes compiled and the executables ran correctly on both machines. The second problem appeared to us to be a change in the allowable assembly language (whether due to a difference in the architecture of the AMD and Intel chips or due to a change in the assembler we do not know). On the Intel machine, a section of assembly language in the source code to which an argument was passed was done as follows: . //in the C++ part of the code . argument=something; //argument is a global variable mode(argument); . . where, in an include file containing the assembly language codes, #define mode(argument) \ __asm__ ( \ . \ . \ : \ : "a" (argument) \ : "eax", etc ); which means that the argument is to be passed in the register eax. This way of coding did not work on the AMD machine. We had to change to argument=something //in the C++ part of the code mode(); #define mode() \ __asm__ ( \ "movl _argument,%%eax\n\t" \ . \ : \ : \ : "eax", etc ); That is, the assembly language functions could not have an argument. Again, making these changes was tedious, but the resulting source code compiled correctly and ran correctly on both machines (as long as the optimizing -O2 switch was not called on--see fourth problem below). There was a third (very minor and probably not worth mentioning) problem related to assembly language programming on the AMD machine. Sometimes in assembly language one wants the origin of an array named arrayname, for instance in this assembly language statement, movl _arraynameorigin,%%ebx. arraynameorigin has to be set somewhere. On the intel machine, unsigned int arraynameorigin=arrayname; (this statement occurs in the C++ part of the code, not assembly language) results in a warning (arraynameorigigin has no cast), but on the AMD machine it results in an error. Apparently, the compiler has been changed. We do not understand why this should be an error. However, since arrayname is the origin of the array arrayname, and movl _arrayname,%%ebx works on both machines, this problem is just a curiosity having no importance. The fourth problem, which we have not overcome, is the fact that, after all of the above revisions, on the AMD machine the optimization switch -O2 does not work. With the -O2 switch on, the source codes compile (on both machines) with no reported errors. The executables run faster on the Intel machine by 30%-50%. But the executables crash on the AMD machine. The use of assembly language is crucial, resulting in exe- cutables which are FOUR TIMES FASTER than those created without the use of assembly language, for both the Intel and AMD processors. The -O2 switch, in the Intel case, results in another gain in speed of 30%-50%, a gain which we would like to get for the AMD processor if we can locate the source of this fourth problem and fix it. Here is some timing data: source assembly language -O2 ticks(91 ticks/sec) Intel AMD #1 no no 9735 4145 no yes 4775 2690 yes no 2155 1225 yes yes 1550 crash #2 no no 9280 3910 no yes 4530 2545 yes no 1780 985 yes yes 1365 crash The spectacular gains in speed when using assembly language are due to the use of the commands, adcl, mull, and divl. When the 64-bit merced chips appear, these commands will be dropped (that is what I call risc architecture, but others mean something else). The architecture of the merced will be similar to the alpha architecture (in my naive view). I understand that these commands will not be dropped when AMD produces its 64-bit processor. It will be a true extension of Intel 486 architecture to 64-bits. And so we are very interested in the AMD chip. Grendel has asked that we post the crash reports. Source code #1: Exiting due to signal SIGFPE Division by Zero at eip=000040ef, x87 status=0120 eax=00000000 ebx=00000000 ecx=00000036 edx=00000005 esi=00000000 edi=00000038 ebp=000b981c esp=000b97e4 program=C:\directory\codename.EXE cs: sel=00ef base=83809000 limit=00acafff ds: sel=00f7 base=83809000 limit=00acafff es: sel=00f7 base=83809000 limit=00acafff fs: sel=00cf base=00013d10 limit=0000ffff gs: sel=0107 base=00000000 limit=0010ffff ss: sel=00f7 base=83809000 limit=00acafff App stack: [000b9978..00039978] Exceptn stack: [00039854..00037914] Call frame traceback EIPs: 0x000040ef 0x00007d67 0x00009373 0x0001ce96 Source code #2: Exiting due to signal SIGFPE Division by Zero at eip=000040af, x87 status=0120 eax=00000000 ebx=00000000 ecx=00000036 edx=00000005 esi=00000000 edi=00000038 ebp=000ce118 esp=000ce0e0 program=C:\directory\codename.EXE cs: sel=00ef base=83809000 limit=00f1afff ds: sel=00f7 base=83809000 limit=00f1afff es: sel=00f7 base=83809000 limit=00f1afff fs: sel=00cf base=00013d10 limit=0000ffff gs: sel=0107 base=00000000 limit=0010ffff ss: sel=00f7 base=83809000 limit=00f1afff App stack: [000ce2a8..0004e2a8] Exceptn stack: [0004e184..0004c244] Call frame traceback EIPs: 0x000040af 0x0000de02 0x00018929 0x0002daea I will be off-line January20-February3 February23-March5