www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1995/08/16/15:35:57

Date: Wed, 16 Aug 95 12:43 MDT
From: mat AT ardi DOT com (Mat Hostetter)
To: "A.Appleyard" <A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk>
Subject: Re: inline asm? [+ source for a Pentium cycle timer]
References: <A990FFA7265 AT fs2 DOT mt DOT umist DOT ac DOT uk>
Cc: djgpp AT sun DOT soe DOT clarkson DOT edu

>>>>> "A" == A Appleyard <A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk> writes:

    >(a) asm("xchgl %eax,__ax");
    >(b) asm("xorl %eax,__ax"); asm("xorl __ax,%eax"); asm("xorl %eax,__ax");
    >(c) asm("movl %eax,XXX"); asm("movl __ax,%eax"); asm("movl XXX,__ax");
         (where XXX is some spare 32-bit register)

OK, let's look at what you are trying to do.  You want to save all the
registers, load them all up from globals, call an interrupt, spill all
the registers back to those same globals, and restore your original
registers.

There's no reason to save your real registers to the same array you
use for your input/output registers.  You could "pushal" them onto the
stack and "popal" them when you are done.  But if you are just writing
interrupt glue, I'd just write the function completely in assembly and
only save/restore those registers which the C calling convention
requires (with a few pushl's/popl's, which are very fast on the
Pentium and elsewhere).

Note that although I tend to ramble on about efficient assembly code I
*DO NOT* program in assembly much at all; C is almost always a better
choice.  x86 interrupt glue is a reasonable exception, though; you
_can't_ write it in C, so you might as well write efficient assembly.

Of course, you should always consider how important it is to optimize
this code.  Is saving a few cycles really that important?  Nearly all
asm programmers I've seen miss the "big picture" and concentrate on
irrelevant things...it's an easy mistake to make.

    A> Where can I get a full list of available assembly instrictions
    A> on all PC versions including the Pentium?

I dunno...Intel must offer such literature.

    A>   What much of this may prove to boil down to is this: How can
    A> I tell djgpp's optimizer to keep its paws off a particular
    A> short part of my program?, so that the instructions in that
    A> part are compiled as I wrote them without being shuffled, or
    A> any of them omitted, or instructions from elsewhere
    A> interpolated.

If you want to force sequences of asm to be adjacent you could do it
with just one asm and some macros, e.g.:

#define SAVE_REGS	"pushal\n\t"
#define RESTORE_REGS	"popal\n\t"

asm (SAVE_REGS
     "do my stuff"
     RESTORE_REGS);

asm (SAVE_REGS
     "do different stuff"
     RESTORE_REGS);

Of course, that doesn't work when those other asm sequences have
operands and stuff.

    >> but I was timing this stuff with the Pentium's rdtsc
    >> instruction so
    A>   What does rdtsc do? 

It stands for "read time stamp counter".  It puts a 64-bit cycle count
into %edx:%eax (or %eax:%edx, I forget).  gas doesn't know about it,
so I had to put in explicit .byte ops.  Also, it's a real cycle count,
so you probably want a little code in your timer to smooth out the
erratic numbers you get when an interrupt comes in (it's more
important under Linux than under DOS).  This means you can't use it
for really long sequences of code and get consistent results.  It's
also tricky to time the Pentium because the performance real Pentium
code can't be evaluated one instruction at a time (nearby instructions
matter a lot...also true for AGI stalls).

I've appended the hacky rdtsc driver I was using to benchmark
stuff...it will of course only work on the Pentium.  I wrote it just
for personal use, but I might as well share it since someone else
might find it useful.

-Mat


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>


static unsigned long
time_event (void)
{
  unsigned long time_high, time_low;

  asm (".byte 0x0F, 0x31\n\t"   /* rdtsc, gas doesn't know about it */
       "pushl %%edx\n\t"
       "pushl %%eax\n\t"

/* Put code to time here. */

       ".byte 0x0F, 0x31\n\t"   /* rdtsc, gas doesn't know about it */
       "popl %%edi\n\t"
       "popl %%esi\n\t"
       "subl %%edi,%%eax\n\t"
       "sbbl %%esi,%%edx"
       : "=d" (time_high), "=a" (time_low)
       :
       : "%ebx", "%ecx",  /* Listed so timed code can clobber them. */
         "%esi", "%edi", "cc");

  assert (!time_high);

  return time_low;
}


#define MAX_CYCLES 1000
#define ENOUGH_SAMPLES 5000


int
main (void)
{
  unsigned long freq[MAX_CYCLES];
  unsigned long t;
  int i;

  /* Smooth out interrupt- and multitasking-induced fluctuations
   * by running many samples until some particular cycle count
   * shows up a large number of times.
   */
  do
    {
      t = time_event ();
    }
  while (t >= MAX_CYCLES || ++freq[t] < ENOUGH_SAMPLES);

  printf ("%lu cycles, including some overhead.\n", t);

  return EXIT_SUCCESS;
}

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019