www.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/03/08/01:08:36

Xref: news2.mv.net comp.os.msdos.djgpp:1723
From: brennan AT mack DOT rt66 DOT com (Brennan "Mr. Wacko" Underwood)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: ASM code & Random
Date: 7 Mar 1996 14:15:13 -0700
Organization: None, eh?
Lines: 120
Message-ID: <4hnjl1$1l0@mack.rt66.com>
References: <1996Mar5 DOT 164831 AT zipi DOT fi DOT upm DOT es> <4hn17p$bfs AT lyra DOT csx DOT cam DOT ac DOT uk>
NNTP-Posting-Host: mack.rt66.com
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

In article <4hn17p$bfs AT lyra DOT csx DOT cam DOT ac DOT uk>,
M.D. Mackey <mdm1004 AT cus DOT cam DOT ac DOT uk> wrote:
>a920101 AT zipi DOT fi DOT upm DOT es writes:
>
>>Hello everybody!
>>	Can you please take a look at this code and tell me if it can be
>>  done faster??? I use it to copy a 'virtual screen' to the VGA in my game,
>>  but it is too slow... If this can't be done faster, i will have to use
>>  some kind of tweaked mode with 2 pages and do page-flipping using VGA
>>  hardware... but i have no time now to get into it!
>
>>This is the code:
>
>>asm("
>>  pushw %es
>>  pushl %edi
>>  pushl %esi
>>  pushl %ecx
>>  movl $0xa0000,%edi
>>  movl _virt,%esi  # virt declared somewhere
>>  movl $32000,%ecx
>>  movw _dos_seg,%es
>>COPY:
>>  movsw
>>  dec %ecx
>>  jnz COPY
>>  popl %ecx
>>  popl %esi
>>  popl %edi
>>  popw %es
>>")
>
>Eeeewwgh.
>
>Several things:
>
>1) Jumps are expensive: use the 'rep' prefix. It'll speed things up by 
>   a factor of 2.
>2) Avoid word-sized (2-byte) instructions wherever possible: they cost 
>   a _lot_ on the Pentium and will slow the 486 down a bit too.
>
>Try:
>
>asm("
>  pushw %es
>  pushl %edi
>  pushl %esi
>  pushl %ecx
>  movl $0xa0000,%edi
>  movl _virt,%esi  # virt declared somewhere
>  movl $16000,%ecx
>  movw _dos_seg,%es
>  rep
>  movsl
>  popl %ecx
>  popl %esi
>  popl %edi
>  popw %es
>")

Problems:
* 0xa0000 isn't AT 0xa0000, necessarily. And it's by default protected.
* need to use %%eax for %eax due to format of asm statement.
* es always = ds, I believe. For DJGPP, at least.
* pushing and popping the vars doesn't help GCC's optimizer at all

Do this:

__djgpp_nearptr_enable() somewhere before (WARNING: disables memory protection)

unsigned char *screen = 0xa0000 + __djgpp_conventional_base();

memcpy(screen, virt, 64000);

That's it. GCC will automatically inline memcpy's to rep movsl's if the length
is fixed.

You need to offset any DOS area memory loc with __djgpp_conventional_base().

If you insist on your own asm, you only need this:

__djgpp_nearptr_enable() somewhere before (WARNING: disables memory protection)

unsigned char *screen = 0xa0000 + __djgpp_conventional_base();
asm volatile (
"cld\n\t"
"rep\n\t"
"movsl"
: : "D" (screen), "S" (virt), "c" (64000/4) : "edi", "esi", "ecx");

screen with be autoloaded into edi
virt will be autoloaded into esi
16000 will go into ecx

Use "a" for eax, "b" for ebx, "d" for "edx".

The reason to use the above method is that now GCC knows screen is to go
into edi, it can often arrange for it to already be there. And if a register
is not clobbered, it will take advantage of already having that value loaded
if it can.
Of course, this means if you forget to list a variable as clobbered, you
will experience weird bugs.

asm ( "statement" : outputRegs : inputRegs : clobberedRegs );

For the above purpose I recommend just using "memcpy()", though.

I've been thinking of trying to get GCC to convert memset() to rep stosl
as well, but I don't know how to go about doing it.

I keep putting off writing that asm tut for my DJGPP page, but in the
last few days, I seem to have written it 3 times... ;)


--brennan
p.s. You will get a huge slowdown if you are copying from or to memory
locations not on dword boundaries. 0xa0000 is on one, and if you allocate
your virtual buffer with malloc, it will align it for you.
-- 
brennan AT rt66 DOT com  |  Do what you'll wish you had done.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019