Date: Wed, 22 Mar 1995 15:07:24 +0900 From: raraki AT human DOT waseda DOT ac DOT jp (Ryuichiro Araki) To: A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk, turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp Subject: Re: A quick way to copy n bytes Cc: DJGPP AT sun DOT soe DOT clarkson DOT edu >>>>> Stephen Turnbull writes: > /*-----*//* fast move s[0:n-1]=t[0:n-1] */ > void str_cpy(void*s,void*t,int n){ > asm("pushl %esi"); asm("pushl %edi"); asm("cld"); > asm("movl 8(%ebp),%edi"); asm("movl 12(%ebp),%esi"); > asm("movl 16(%ebp),%ecx"); asm("rep"); asm("movsb"); asm("popl %edi"); > asm("popl %esi");} > /*-----*/ > /* This has given me good service and should run a bit quicker than a C */ > /* version, as it uses the `rep' repeat instruction */ > >This looks remarkably like memcpy.s in the standard DJGPP library, but >it doesn't take advantage of a couple of optimizations included in the >DJGPP distribution version. Why are we reinventing the wheel? When optimizations are enabled, gcc outputs the following inline code for memcpy(void *dest, const void *src, size_t cnt) if cnt is a constant: ----------------------------------------------------------------------- #include #define COUNT 47 void foo(void){ char dest[COUNT], src[COUNT]; memcpy(dest, src, COUNT); } .file "constant.c" gcc2_compiled.: ___gnu_compiled_c: .text .align 4 .globl _foo _foo: subl $96,%esp pushl %edi pushl %esi leal 56(%esp),%edi leal 8(%esp),%esi cld movl $11,%ecx rep movsl movsw movsb popl %esi popl %edi addl $96,%esp ret ----------------------------------------------------------------------- I guess this output code is smart enough. But if cnt is not a constant but a variable: ----------------------------------------------------------------------- #include #define COUNT 47 void foo(void){ char dest[COUNT], src[COUNT]; int cnt = COUNT; memcpy(dest, src, cnt); } .file "variable.c" gcc2_compiled.: ___gnu_compiled_c: .text .align 4 .globl _foo _foo: subl $96,%esp leal 48(%esp),%edx movl %esp,%eax pushl $47 pushl %eax pushl %edx call _memcpy addl $12,%esp addl $96,%esp ret ----------------------------------------------------------------------- memcpy() won't be compiled as the inline code any more. This means that optimizing memory/string functions in standard library is still effective in improving performance of the executable built with djgpp, even if current version of gcc is capable of generating smart inline codes for such functions when the number of bytes to be processed is a constant. The following code is my own memcpy() written in gas: --------------------------------------------------------------- .data .text .globl _memcpy .align 4,144 _memcpy: pushl %esi movl %edi,%edx movl 8(%esp),%edi /* dst */ movl 12(%esp),%esi /* src */ movl 16(%esp),%ecx /* cnt */ movl %ecx,%eax /* DWORD move */ shrl $2,%ecx /* ecx / 4 */ andl $3,%eax /* eax % 4 */ cld rep movsl movl %eax,%ecx /* copy remainder */ rep movsb popl %esi movl %edx,%edi movl 4(%esp),%eax /* return value */ ret --------------------------------------------------------------- This code uses movsl, and thus somewhat faster than the original memcpy.s. The drawback is that it might not work correctly if the objects overlap. But in ANSI C, memcpy() doesn't necessarily guarrantee correct behavior with overlapping objects. In such case, one should use memmove() instead (I suppose DJ's original memcpy.s is indeed a memmove() code). I've ever sent such memory/string function sources written in gas to DJ looong ago (possibly in Fall, 1991), but he didn't seem to prefer them. If somebody wants to get gas sources of my memory/string functions (for 14 functions), please let me know. ---- raraki(Ryuichiro Araki) raraki AT human DOT waseda DOT ac DOT jp