www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1992/08/20/15:05:29

Date: Thu, 20 Aug 92 14:43:41 EDT
From: Lars Jonas Olsson <ljo AT r2d2 DOT EEAP DOT CWRU DOT Edu>
To: djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: device memory access speed

Now that physical memory access works for memory locations above 1MB (thanks DJ)
I have been testing the speed of the memory access to a frame grabber
(dt2851) mapped with 512kB of 16 bit memory starting at location 0xA00000 
on the at-bus.

I tried reading and writing 10 512x512 images to the frame grabber on
a 486DX25, with 4MB Ram and the first 4MB cached.
It turns out that the speed of reading and writing is almost the same
so I'll only discuss writing here.

The basic code is

unsigned char buf[512];
int starttime = clock();
for(i=0;i<10;i++)
  for(row=0; row<512;row++)
   wline(row, buf, 512);
printf("Time for 10 writes is %f\n", (clock() - starttime)*1e-6);

My first wline implementation was:

wline(int row, const unsigned char *buf, int len)
{
  char *fb_ptr = (unsigned char *)(fb_mem + row * 512)
  memcpy(fb_ptr, (const char *)buf, len);
}

Where fb_mem = (0xE0000000L + 0xA00000) is an external.

With this algorithm the speed varied between 3.8 and 4 s or 650kB/s. If I modify 
this to copy to another external c array (extern char fb_ptr[512];) the speed is 
0.35 s or 7.5MB/s (to provide a comparison).

I next tried:

wline(int row, const unsigned char *buf, int len)
{
  unsigned short *fb_ptr = (unsigned short *)(fb_mem + row * 512); 
  unsigned short *buf_ptr = (unsigned buf *)buf;
  int col;
  for(col = 0; col < len; col+=2)
   *fb_ptr++ = *buf_ptr++;
}

Now the time is between 1.9 and 2 s. which is about 1.3MB/s. It is pretty clear
that the speedup of a factor two is due to the fact that memcpy is doing 8bit
access to memory while the second routine uses 16 bit access. I also tried
a third variant using int pointers but the speed was the same as for the
second algorithm.

The second algorithm can also have problems with alignment, but I'm ignoring that
right now.

Now some questions:

o I have heard that the peak speed of the at-bus is supposed to be 2MB/s is that 
  correct?

o How many wait-states does at-bus have? Is it dependent on the card?

o What is the preferred way of implementing memcpy for 16 bit at-bus memory use?
  (The supplied memcpy in djdev108.zip is in assembler and clearly very fast for
  non at-bus operation.)

o Is there a fast at-bus memcpy in one of the graphics libraries? In funcs.doc 
  the function libgxx_bcopy is mentioned but I can't find it in any library.

	Jonas Olsson

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019