www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1995/03/22/04:57:20

Date: Wed, 22 Mar 1995 17:42:26 +0900
From: raraki AT human DOT waseda DOT ac DOT jp (Ryuichiro Araki)
To: mat AT ardi DOT com, raraki AT human DOT waseda DOT ac DOT jp
Subject: Re: A quick way to copy n bytes
Cc: A DOT APPLEYARD AT fs2 DOT mt DOT umist DOT ac DOT uk, DJGPP AT sun DOT soe DOT clarkson DOT edu,
turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp

>>>>> Mat Hostetter <mat AT ardi DOT com> writes:

> This is much better, but what if %esi and %edi are not aligned %4?
> Every single transfer might have an unaligned load and an unaligned
> store, which is slow.

Right, right!!  By adding a small code before movsl, I tried to make either
%esi or %edi 4 bytes-aligned, too (possibly the way like yours, but I'm not
sure that code was efficient enough) when I wrote my memcpy() and other gas 
codes.  But the difference in performance was not so remarkable unless 
memcpy() transfers quite a bit of data at once.  On the contrary, I have 
experienced the adverse effect of such a code when programs mainly transfer 
small data fragments (say, shorter than 8 - 10 bytes.  This is likely with
programs which mainly process short tokens, i.e., compilers, assemblers, etc.)
probably due to the overhead of the prepending code . But the situation 
might be different on 486 and/or Pentium machines (I checked the performance 
only with *old* 386 machine long ago, and had no chance to do that on newer 
PCs, since I've deleted the old code:-< ).  

Somebody suggested to me trying 16 bytes alignment on 486/Pentium, since
cache line size of the internal cache in these processors is 16 bytes and 
thus 16 bytes alignment might reduce cache misses.  How about this
idea, Mat?

> I fixed this in the memcpy and movedata for the current V2 alpha.
> They do movsb's until either %esi or %edi is long-aligned before doing
> movsl's (and hopefully both are aligned then).  The code checks for
> small moves right away and just use movsb for them, skipping the
> alignment overhead.

Cool.  I've not look into the current V2 alpha yet.  I'll examine how 
your code works well.  Thank you for valuable information.

> For what it's worth, I also modified memset to do aligned stosl's when
> possible.

I did, too:-)

    ---
    raraki(Ryuichiro Araki)
    raraki AT human DOT waseda DOT ac DOT jp

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019