Date: Wed, 22 Mar 1995 17:42:26 +0900 From: raraki AT human DOT waseda DOT ac DOT jp (Ryuichiro Araki) To: mat AT ardi DOT com, raraki AT human DOT waseda DOT ac DOT jp Subject: Re: A quick way to copy n bytes Cc: A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk, DJGPP AT sun DOT soe DOT clarkson DOT edu, turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp >>>>> Mat Hostetter writes: > This is much better, but what if %esi and %edi are not aligned %4? > Every single transfer might have an unaligned load and an unaligned > store, which is slow. Right, right!! By adding a small code before movsl, I tried to make either %esi or %edi 4 bytes-aligned, too (possibly the way like yours, but I'm not sure that code was efficient enough) when I wrote my memcpy() and other gas codes. But the difference in performance was not so remarkable unless memcpy() transfers quite a bit of data at once. On the contrary, I have experienced the adverse effect of such a code when programs mainly transfer small data fragments (say, shorter than 8 - 10 bytes. This is likely with programs which mainly process short tokens, i.e., compilers, assemblers, etc.) probably due to the overhead of the prepending code . But the situation might be different on 486 and/or Pentium machines (I checked the performance only with *old* 386 machine long ago, and had no chance to do that on newer PCs, since I've deleted the old code:-< ). Somebody suggested to me trying 16 bytes alignment on 486/Pentium, since cache line size of the internal cache in these processors is 16 bytes and thus 16 bytes alignment might reduce cache misses. How about this idea, Mat? > I fixed this in the memcpy and movedata for the current V2 alpha. > They do movsb's until either %esi or %edi is long-aligned before doing > movsl's (and hopefully both are aligned then). The code checks for > small moves right away and just use movsb for them, skipping the > alignment overhead. Cool. I've not look into the current V2 alpha yet. I'll examine how your code works well. Thank you for valuable information. > For what it's worth, I also modified memset to do aligned stosl's when > possible. I did, too:-) --- raraki(Ryuichiro Araki) raraki AT human DOT waseda DOT ac DOT jp