www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/03/07/14:36:29

Xref: news2.mv.net comp.os.msdos.djgpp:1717
From: korpela AT islay DOT ssl DOT berkeley DOT edu (Eric J. Korpela)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: ASM code & Random
Date: 7 Mar 1996 01:47:22 GMT
Organization: Cal Berkeley-- Space Sciences Lab
Lines: 57
Message-ID: <4hlf7a$ofg@agate.berkeley.edu>
References: <TCPSMTP DOT 16 DOT 3 DOT 6 DOT -15 DOT 44 DOT 16 DOT 2983759767 DOT 73238 AT pegasuz DOT com>
NNTP-Posting-Host: islay.ssl.berkeley.edu
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

In article <TCPSMTP DOT 16 DOT 3 DOT 6 DOT -15 DOT 44 DOT 16 DOT 2983759767 DOT 73238 AT pegasuz DOT com>,
 <battle DOT axe AT PEGASUZ DOT COM> wrote:
>A9>This is the code:
>
>asm("
>  pusha
>  movl $0xa0000,%edi
>  movl _virt,%esi       # virt declared somewhere
>  movl $32000,%ecx
>  movw _dos_seg,%es
>  cld
>  rep movsw
>  popa
>")
>
>       How's this for more speed?  The REP command will repeat the
>string command immediately following it (MOVSW).  I also shortened all
>those push/pop operations into one pusha/popa command.. like 6 less
>operations plus the loop should be a bit faster..

Some points to consider.....

You shouldn't be using "rep movsw" you should be using "rep movsl"  
Moving 32 bit chunks will probably be faster than 16 bit chunks even
over an ISA channel.

On a 486 "rep movsl" should be the fastest method, but not so on a pentium.
On a pentium use a load store loop with 2 32bit loads and 2 32bit stores
like... (Don't quote me on syntax here. I'm no genius when I don't have a
reference handy.)  It's a bit harder with far pointers, but can be done.
(put the segment override byte in front of the last two movl's)  I think
segment overrides cost cycles, though.

       leal _virt,%esi
       leal _vid_nearptr,%edi
       movl 8191,%ecx
loop1: movl (%edi,%ecx,8),%edx
       movl 4(%edi,%ecx,8),%eax
       movl %edx,(%esi,%ecx,8)
       movl %eax,4(%esi,%ecx,8)
       decl %ecx
       jge  loop1
       
On a final not, the pusha and popa don't save any time.  It still takes a
cycle per longword pushed, and can't be paired.  Instead, let gcc know
which registers are used, and let it save them if necessary.  There's
not much point in pushing %esi if it only held garbage that gcc was
going to discard anyway.  Look up the syntax of asm to see how it's done.

Eric

On a 486, the "rep mov
-- 
Eric Korpela                        |  An object at rest can never be
korpela AT ssl DOT berkeley DOT edu            |  stopped.
<a href="http://www.cs.indiana.edu/finger/mofo.ssl.berkeley.edu/korpela/w">
Click here for more info.</a>

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019