Date: Wed, 22 Mar 95 00:12 MST From: mat AT ardi DOT com (Mat Hostetter) To: raraki AT human DOT waseda DOT ac DOT jp (Ryuichiro Araki) Cc: A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk, turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp, DJGPP AT sun DOT soe DOT clarkson DOT edu Subject: Re: A quick way to copy n bytes References: <199503220607 DOT PAA27191 AT wutc DOT human DOT waseda DOT ac DOT jp> raraki writes: The following code is my own memcpy() written in gas: --------------------------------------------------------------- .data .text .globl _memcpy .align 4,144 _memcpy: pushl %esi movl %edi,%edx movl 8(%esp),%edi /* dst */ movl 12(%esp),%esi /* src */ movl 16(%esp),%ecx /* cnt */ movl %ecx,%eax /* DWORD move */ shrl $2,%ecx /* ecx / 4 */ andl $3,%eax /* eax % 4 */ cld rep movsl movl %eax,%ecx /* copy remainder */ rep movsb popl %esi movl %edx,%edi movl 4(%esp),%eax /* return value */ ret This is much better, but what if %esi and %edi are not aligned %4? Every single transfer might have an unaligned load and an unaligned store, which is slow. I fixed this in the memcpy and movedata for the current V2 alpha. They do movsb's until either %esi or %edi is long-aligned before doing movsl's (and hopefully both are aligned then). The code checks for small moves right away and just use movsb for them, skipping the alignment overhead. For what it's worth, I also modified memset to do aligned stosl's when possible. -Mat