Date: Thu, 20 Aug 92 14:43:41 EDT From: Lars Jonas Olsson To: djgpp AT sun DOT soe DOT clarkson DOT edu Subject: device memory access speed Now that physical memory access works for memory locations above 1MB (thanks DJ) I have been testing the speed of the memory access to a frame grabber (dt2851) mapped with 512kB of 16 bit memory starting at location 0xA00000 on the at-bus. I tried reading and writing 10 512x512 images to the frame grabber on a 486DX25, with 4MB Ram and the first 4MB cached. It turns out that the speed of reading and writing is almost the same so I'll only discuss writing here. The basic code is unsigned char buf[512]; int starttime = clock(); for(i=0;i<10;i++) for(row=0; row<512;row++) wline(row, buf, 512); printf("Time for 10 writes is %f\n", (clock() - starttime)*1e-6); My first wline implementation was: wline(int row, const unsigned char *buf, int len) { char *fb_ptr = (unsigned char *)(fb_mem + row * 512) memcpy(fb_ptr, (const char *)buf, len); } Where fb_mem = (0xE0000000L + 0xA00000) is an external. With this algorithm the speed varied between 3.8 and 4 s or 650kB/s. If I modify this to copy to another external c array (extern char fb_ptr[512];) the speed is 0.35 s or 7.5MB/s (to provide a comparison). I next tried: wline(int row, const unsigned char *buf, int len) { unsigned short *fb_ptr = (unsigned short *)(fb_mem + row * 512); unsigned short *buf_ptr = (unsigned buf *)buf; int col; for(col = 0; col < len; col+=2) *fb_ptr++ = *buf_ptr++; } Now the time is between 1.9 and 2 s. which is about 1.3MB/s. It is pretty clear that the speedup of a factor two is due to the fact that memcpy is doing 8bit access to memory while the second routine uses 16 bit access. I also tried a third variant using int pointers but the speed was the same as for the second algorithm. The second algorithm can also have problems with alignment, but I'm ignoring that right now. Now some questions: o I have heard that the peak speed of the at-bus is supposed to be 2MB/s is that correct? o How many wait-states does at-bus have? Is it dependent on the card? o What is the preferred way of implementing memcpy for 16 bit at-bus memory use? (The supplied memcpy in djdev108.zip is in assembler and clearly very fast for non at-bus operation.) o Is there a fast at-bus memcpy in one of the graphics libraries? In funcs.doc the function libgxx_bcopy is mentioned but I can't find it in any library. Jonas Olsson