Xref: news-dnh.mv.net comp.os.msdos.djgpp:546 Path: news-dnh.mv.net!mv!news10.sprintlink.net!news.sprintlink.net!howland.reston.ans.net!agate!library.ucla.edu!news.bc.net!unixg.ubc.ca!freenet.vancouver.bc.ca!rdc From: rdc AT freenet DOT vancouver DOT bc DOT ca (Robert Clark) Newsgroups: comp.os.msdos.djgpp Subject: Re: SSSsPPPpEEEeDDDd !!! Date: 23 Jun 1995 11:57:50 GMT Organization: Vancouver Regional FreeNet Lines: 79 References: <1995Jun19 DOT 134819 DOT 15176 AT ludens> Nntp-Posting-Host: localhost To: djgpp AT sun DOT soe DOT clarkson DOT edu Dj-Gateway: from newsgroup comp.os.msdos.djgpp I'm giving you the benefit of the doubt that this is a djGCC/GAS question and not a post to the wrong group ... xxx AT ludens DOT elte DOT hu wrote: : I'm an asm-programmer, who has some problem. : mov edx,xxx : mov ebx,yyy ;abs(xxx-yyy) is big (i.e.>5000) : mov ecx,100000 : mov al,1 : align 16 : c1: : mov [edx],al : mov [ebx],al ;xxxx : inc edx : inc ebx : dec ecx : jnz c1 : This code runs very slow. : Remove that line which marked ;xxxx! (sorry, I do NOT speak...) : Run this! It's fast. : It's OK, but what's the matter with the original code? : Why does it run so slowly???? gcc optimization can only get so tricky, then it's your turn ... You are over accessing the buss ... Try: [use '386 (not '86) code and un-roll (and split-up) loop] mov esi,xxx ;abs(xxx) is big (i.e.>5000) mov ecx,100000 / 4 ; 1/4 the amount mov eax,01010101h ; combine ; mov eax,00000001h ; OR _is_ this what you really wanted? align 32 c1: mov [esi],eax ; move _4_ bytes at once inc esi dec ecx jnz c1 mov edi,yyy + (100000 / 4) ;abs(yyy) is big (i.e.>5000) mov ecx,100000 / 4 ; 1/4 the amount std repnz movsd ; move esi[] -> edi[] ecx times (backward) This optimization is 'off-the top of my head' and IS _untested_ ! It only takes 13 lines of code (as opposed to your 12) and uses DOUBLEWORDS (since you DID say your program was 'p-mode') _YOU_ (and I don't know why GCC did not do this for you) might FIRST wish to try the OBVIOUS re-write that avoids accessing the buss too often; I've ONLY included the relevant lines NOT the 'whole' section this time. c1: mov [edx],al inc edx mov [ebx],al inc ebx dec ecx The second example is simpler and should only take 1 second to do using QEDIT (CTRL-Y, DNARROW, CTRL-U), the first should be faster (remember the first IS NOT tested, the second IS obvious {"can't fail"} ... You could also write code using the loopnz instruction but I'll spare you. Since I'm posting anyways (a FAMOUS quote of Bill !) let me mention that I'm pleased to only get 10-15 e-mails a day now since I left the djgpp-l and moved over to the news ... -- Robert Clark RDC AT freenet DOT vancouver DOT bc DOT ca UNIX(r) System V Release 4.0 [142.103.106.2] http://www.freenet.vancouver.bc.ca incoming:// 49 15 00 N 123 07 00 W .