Date: Wed, 16 Aug 95 12:43 MDT From: mat AT ardi DOT com (Mat Hostetter) To: "A.Appleyard" Subject: Re: inline asm? [+ source for a Pentium cycle timer] References: Cc: djgpp AT sun DOT soe DOT clarkson DOT edu >>>>> "A" == A Appleyard writes: >(a) asm("xchgl %eax,__ax"); >(b) asm("xorl %eax,__ax"); asm("xorl __ax,%eax"); asm("xorl %eax,__ax"); >(c) asm("movl %eax,XXX"); asm("movl __ax,%eax"); asm("movl XXX,__ax"); (where XXX is some spare 32-bit register) OK, let's look at what you are trying to do. You want to save all the registers, load them all up from globals, call an interrupt, spill all the registers back to those same globals, and restore your original registers. There's no reason to save your real registers to the same array you use for your input/output registers. You could "pushal" them onto the stack and "popal" them when you are done. But if you are just writing interrupt glue, I'd just write the function completely in assembly and only save/restore those registers which the C calling convention requires (with a few pushl's/popl's, which are very fast on the Pentium and elsewhere). Note that although I tend to ramble on about efficient assembly code I *DO NOT* program in assembly much at all; C is almost always a better choice. x86 interrupt glue is a reasonable exception, though; you _can't_ write it in C, so you might as well write efficient assembly. Of course, you should always consider how important it is to optimize this code. Is saving a few cycles really that important? Nearly all asm programmers I've seen miss the "big picture" and concentrate on irrelevant things...it's an easy mistake to make. A> Where can I get a full list of available assembly instrictions A> on all PC versions including the Pentium? I dunno...Intel must offer such literature. A> What much of this may prove to boil down to is this: How can A> I tell djgpp's optimizer to keep its paws off a particular A> short part of my program?, so that the instructions in that A> part are compiled as I wrote them without being shuffled, or A> any of them omitted, or instructions from elsewhere A> interpolated. If you want to force sequences of asm to be adjacent you could do it with just one asm and some macros, e.g.: #define SAVE_REGS "pushal\n\t" #define RESTORE_REGS "popal\n\t" asm (SAVE_REGS "do my stuff" RESTORE_REGS); asm (SAVE_REGS "do different stuff" RESTORE_REGS); Of course, that doesn't work when those other asm sequences have operands and stuff. >> but I was timing this stuff with the Pentium's rdtsc >> instruction so A> What does rdtsc do? It stands for "read time stamp counter". It puts a 64-bit cycle count into %edx:%eax (or %eax:%edx, I forget). gas doesn't know about it, so I had to put in explicit .byte ops. Also, it's a real cycle count, so you probably want a little code in your timer to smooth out the erratic numbers you get when an interrupt comes in (it's more important under Linux than under DOS). This means you can't use it for really long sequences of code and get consistent results. It's also tricky to time the Pentium because the performance real Pentium code can't be evaluated one instruction at a time (nearby instructions matter a lot...also true for AGI stalls). I've appended the hacky rdtsc driver I was using to benchmark stuff...it will of course only work on the Pentium. I wrote it just for personal use, but I might as well share it since someone else might find it useful. -Mat #include #include #include static unsigned long time_event (void) { unsigned long time_high, time_low; asm (".byte 0x0F, 0x31\n\t" /* rdtsc, gas doesn't know about it */ "pushl %%edx\n\t" "pushl %%eax\n\t" /* Put code to time here. */ ".byte 0x0F, 0x31\n\t" /* rdtsc, gas doesn't know about it */ "popl %%edi\n\t" "popl %%esi\n\t" "subl %%edi,%%eax\n\t" "sbbl %%esi,%%edx" : "=d" (time_high), "=a" (time_low) : : "%ebx", "%ecx", /* Listed so timed code can clobber them. */ "%esi", "%edi", "cc"); assert (!time_high); return time_low; } #define MAX_CYCLES 1000 #define ENOUGH_SAMPLES 5000 int main (void) { unsigned long freq[MAX_CYCLES]; unsigned long t; int i; /* Smooth out interrupt- and multitasking-induced fluctuations * by running many samples until some particular cycle count * shows up a large number of times. */ do { t = time_event (); } while (t >= MAX_CYCLES || ++freq[t] < ENOUGH_SAMPLES); printf ("%lu cycles, including some overhead.\n", t); return EXIT_SUCCESS; }